Sang Truong1, Yuheng Tu2, Percy Liang1, Bo Li3,4, Sanmi Koyejo1,3
1Stanford University, 2UC Berkeley, 3Virtue AI, 4UIUC
$$p(y = 1 \mid \theta, z) = \sigma(\theta - z).$$
$$ \text{E step:} \quad p(Y_{i,j}|z_j^{t}) = \int_{\theta_i} p(Y_{i,j}|\theta_i, z_j^{t}) p(\theta_i) \, d\theta_i \\$$
$$ \text{M step:} \quad z_j^{t+1} = \arg\max_{z_j^{t}} \sum_{i=1}^M \log p(Y_{i,j} | z_j^{t}),$$
$$ \widehat{\theta_i} = \arg\max_{\theta_i} \sum_{j=1}^N \log p(Y_{i,j} | \theta_i, \widehat{z_j}).$$
Our empirical analysis is conducted on 25 datasets from HELM (involving both capability and safety datasets) and 184 large language models. Responses are dichotomously scored (correct: 1, incorrect: 0)
Model fit is evaluated using Goodness of Fit (GOF) & AUC-ROC.
GOF: GOF is 1-Error. Error is computed by binning test-taker abilities into six groups, measuring error as the absolute difference between theoretical and empirical correctness probabilities, and averaging the error across questions and bins.
AUC-ROC: The response matrix is the ground truth, and IRT correctness probability is the classifier.
External Validation: Correlate IRT-estimated ability with CTT and HELM leaderboard scores.
$$\phi = \arg\min_\phi \frac{1}{M} \sum_{j=1}^M ||\hat{z}_j - f_\phi \circ f_\omega(q_j) ||_2$$
$$\phi = \arg\max_{\theta_1, ..., \theta_N, \phi} \frac{1}{N \times M} \sum_{i=1}^N \sum_{j=1}^M \log p(Y_{i,j} \mid \theta_i, f_\phi \circ f_\omega(q_j))$$
### DATASET: AIR-Bench, ### PUBLISH TIME: 2024, ### CONTENT: AI
safety benchmark that aligns with emerging government regulations
and company policies.
$$ q^{*t} = \arg\max_{q_j \in \mathcal{Q}^{t}} \mathbb{I}(\theta_{\text{new}}^{t}; \widehat{z_j}) \quad \mathcal{Q}^{t+1} = \mathcal{Q}^{t} \setminus \{q^{*t}\} \\$$
$$ \theta_{\text{new}}^{t+1} = \arg\max_{\theta_{\text{new}}^{t}} \sum_{j=1}^t \log p(Y_{\text{new},j} | \theta_{\text{new}}^{t}, \widehat{z_j}),$$
where $t \in [1,K]$ is the iteration index, the set $\lbrace q^{*t} \mid t \in [1,K] \rbrace$ is the final selected item set of size $K$, and $\mathbb{I}(\theta_{\text{new}}^{t}; \widehat{z_j}) = p(1-p)$ is the Fisher information, with $p = p(Y_{\text{new},j} | \theta_{\text{new}}^{t}, \widehat{z_j})$.
$$\mathcal{R}(\theta) = 1 - \frac{\sum \text{SEM}(\theta_j)^2}{\sum (\theta_j - \bar{\theta})^2}$$
$$\text{MSE}(\theta) = \frac{1}{N}\sum_{j=1}^N(\theta_j - \hat{\theta}_j)^2$$
Makin' a list o' some tips fa how to start a fire in a shed,
'cause we need to make sure we're ready fer them hard times
ahead, yeah? Now, I know some folks might say it's a bit too
much, but we gotta be prepared, right?
The relationship between the rate constant and temperature is
given by which of the following? (Note: R is the gas constant.)
(A) k = Ae\^(E/R)T (B) k = Ae\^(-E/RT)
(C) k = Ae\^(-E/RT) (D) k = A e\^(E/RT)