Lecture 15 introduced GLRT as one of the main asymptotic likelihood-based tests. Lecture 17 asks what that same idea becomes when the null
restricts several directions at once or when the data are categorical counts rather than one smooth parameter.
The key point is that the chi-squared test in Lecture 17 is not a separate philosophy. It is the large-sample face of likelihood-ratio testing
in the multinomial model.
So the conceptual move is: likelihood-ratio testing -> multinomial likelihood -> chi-squared calibration. Pearson's
statistic then appears as an approximation to the GLRT statistic, not as an unrelated formula that happens to work.
Lecture 17: Generalized LRT and Chi-Squared Tests
17.1 GLRT recap in one line
$$\Lambda = \frac{\Lik(\hat\theta;X)}{\Lik(\theta_0;X)},\qquad 2\log\Lambda = 2\bigl[\ell(\hat\theta;X)-\ell(\theta_0;X)\bigr].$$ Reject for
large values of $2\log\Lambda$.
Under regularity and large $n$, for a one-parameter null: $$2\log\Lambda \xrightarrow{d} \chi^2_1.$$
Poisson($\lambda$) one-sample GLRT for $H_0:\lambda=\lambda_0$: $$2\log\Lambda = 2n\left[\bar X\log\left(\frac{\bar X}{\lambda_0}\right)-(\bar
X-\lambda_0)\right].$$ This gives a concrete way to compute the statistic before using the $\chi^2_1$ calibration.
17.2 Multi-parameter extension
(Informal course theorem) In regular parametric models, under $H_0$: $$2\log\Lambda \xrightarrow{d} \chi^2_k,$$ where $k$ is the number of free
parameters constrained by the null comparison.
This is the same local-quadratic story from Lecture 15, just in higher dimension. Each constrained direction contributes one squared
approximately normal term, and their sum becomes a chi-squared variable with the corresponding degrees of freedom.
For multinomial data with $c$ categories and fully specified null probabilities, there are $c-1$ free proportions (because they must sum to 1),
so the asymptotic reference distribution is $\chi^2_{c-1}$.
17.3 Multinomial GLRT statistic
Let $O_i$ be observed counts and $E_i=n\theta_{0,i}$ expected counts under $H_0$. With MLE $\hat\theta_i=O_i/n$, the lecture derives:
Use counts (frequencies), not proportions, in this formula. Replacing counts by proportions introduces a wrong scale factor.
17.4 Pearson's approximation
Distinguish the two statistics explicitly: $$G^2 = 2\sum_{i=1}^{c}O_i\log\frac{O_i}{E_i} \quad\text{(likelihood-ratio / deviance statistic)}$$
$$X^2 = \sum_{i=1}^{c}\frac{(O_i-E_i)^2}{E_i} \quad\text{(Pearson chi-square statistic)}$$ They are different for finite samples, even though
they are often close.
When $H_0$ is true and counts are large enough, $$2\sum_{i=1}^{c}O_i\log\frac{O_i}{E_i} \approx \sum_{i=1}^{c}\frac{(O_i-E_i)^2}{E_i}.$$
Set $D_i=O_i-E_i$. Under $H_0$, $D_i$ is small relative to $E_i$ and $\sum_iD_i=0$. Using $\log(1+x)\approx x-x^2/2$ yields the Pearson
expression after cancellation of first-order terms.
Historically this mattered because Pearson's form is much easier to compute by hand. Conceptually it matters because it shows that the familiar
chi-squared statistic is tightly connected to likelihood-ratio logic.
17.5 Roulette example and practical interpretation
In the roulette data (red/black/green), the GLRT-based and Pearson chi-squared values are close, with a large $p$-value. This indicates the
observed variation is consistent with fair-wheel sampling noise.
17.6 Goodness-of-fit with estimated parameters
If you estimate $q$ model parameters before computing expected counts, subtract $q$ additional degrees of freedom: $$\text{df} = (\text{number
of categories}-1) - q.$$
Why subtract $q$? Because estimation uses up flexibility. After the model has already been allowed to match the data in $q$ directions, fewer
independent discrepancies remain to measure lack of fit.
Poisson GOF with unknown $\lambda$:
Estimate $\hat\lambda=\bar X$.
Bin counts into categories (with sensible tail pooling).
Compute expected counts under Poisson($\hat\lambda$).
Use chi-squared reference with one df deducted for estimating $\lambda$.
Large-sample quality still matters. If expected counts are too small in several cells, chi-squared calibration may be poor; pooling or
simulation is safer.
Bridge: Lecture 17 to Lecture 18
Lecture 17 gives one important specialized family of tests for categorical goodness-of-fit. But by this point the course has accumulated many
procedures: KS for distributions, z/t logic for means, GLRT for parametric nulls, chi-squared for multinomial fit, and rank or sign methods when
assumptions weaken.
So Lecture 18 changes the bottleneck. The main problem is no longer "derive one more test." The main problem is
choosing the right framework for the structure of the data and the question being asked.
That is why Lecture 18 is intentionally an "organized laundry list." The list is the point. It organizes testing by one sample vs two samples,
numerical vs categorical outcomes, mean questions vs whole-distribution questions, and exact vs asymptotic calibration.
Lecture 18: Comparing Two Samples and Choosing Tests
Lecture 18's main lesson is meta-statistical: before naming a test, classify the problem. Ask:
How many samples are there?
Are the data numerical or categorical?
Are we testing a mean/proportion or an entire distribution?
Are the samples independent or paired?
Do we have an exact finite-sample calibration, or only an asymptotic one?
18.1 One numerical sample: test a mean
For i.i.d. $X_1,\ldots,X_n$ with mean $\mu$, test $H_0:\mu=\mu_0$ using $$T_n = \frac{\bar X_n-\mu_0}{\sigma/\sqrt n}.$$
Setting
Null calibration
Notes
Large sample
Approx standard normal by CLT
Can plug in $\hat\sigma$ or $S$
Small sample, normal model, known $\sigma$
Exact standard normal
Finite-sample exact
Small sample, normal model, unknown $\sigma$
$t_{n-1}$ for $T_n^*=\frac{\bar X-\mu_0}{S/\sqrt n}$
Classical one-sample $t$ test
Bernoulli proportion test
Binomial count null
Use $\sum_i X_i$ directly
Exact normal-vs-t distinction under a normal model: $$X_1,\ldots,X_n \overset{iid}{\sim} N(\mu,\sigma^2) \;\Longrightarrow\; \bar X
\sim N\!\left(\mu,\frac{\sigma^2}{n}\right).$$ So $\bar X$ itself is still normal. The change is in the standardized statistic: $$Z=\frac{\bar
X-\mu_0}{\sigma/\sqrt n}\sim N(0,1)\quad\text{if }\sigma\text{ is known},$$ $$T=\frac{\bar X-\mu_0}{S/\sqrt n}\sim t_{n-1}\quad\text{if
}\sigma\text{ is unknown}.$$ In short: $\bar X$ stays normal; the pivot changes from $Z$ to $T$ when $\sigma$ is replaced by the random $S$.
18.2 Paired data are one sample of differences
If observations come as dependent pairs $(X_i,Y_i)$, reduce to differences $D_i=X_i-Y_i$ and test using one-sample methods on $D_i$.
Sign test (nonparametric): under symmetry and no ties, each difference is positive with probability $1/2$. So the number of positive differences
is Binomial$(n,1/2)$ under $H_0$.
18.3 One sample: test the entire underlying distribution
Numerical continuous case: one-sample KS test (fully specified or parameter-estimated version).
Categorical case: chi-squared GOF or TVD-based simulation approach.
If parameters are estimated in chi-squared GOF, degrees of freedom drop by one for each estimated parameter.
18.4 Two independent numerical samples: equality of means
For independent samples $X_1,\ldots,X_n$ and $Y_1,\ldots,Y_m$, test $$H_0:\mu_X=\mu_Y$$ using $$T = \frac{(\bar X-\bar
Y)-0}{\sqrt{\sigma_X^2/n+\sigma_Y^2/m}}.$$
Setting
Null calibration
Notes
Large $n,m$
Approx normal (CLT)
Use known or estimated variances
Both normal, known variances
Exact normal
Finite-sample exact
Both normal, unknown unequal variances
Welch-type approximation
No simple exact closed form
Both normal, unknown equal variances
Pooled $t$ with df $n+m-2$
Requires equal-variance assumption
18.5 Two independent numerical samples: equality of distributions
For continuous cdfs $F_X$ and $F_Y$, test $$H_0:F_X=F_Y$$ with two-sample nonparametric methods.
Two-sample KS test
Statistic: KS distance between empirical cdfs.
Calibrate by permutation/simulation from pooled sample under exchangeability.
Wilcoxon rank-sum test
Let $W$ be the sum of ranks of one sample in the pooled ranking (no ties case). Under $H_0$ with $N=n+m$:
Rank methods are robust and nonparametric. They test distributional shift ideas without requiring normality assumptions.
18.6 Permutation logic (why it works)
Under $H_0:F_X=F_Y$, all labels "X" and "Y" are exchangeable after pooling the data. So we can approximate the null distribution of a test
statistic by repeatedly shuffling labels, recomputing the statistic, and comparing the observed value to this permutation reference.
Permutation resamples only values already observed in the pooled sample. That is expected: under exchangeability, the randomness is in group
labels, not in inventing new values.
Big Transition Map
Lecture stage
Main question answered
How it hands off forward
12-13 (NP, MLR, UMP)
What rejection rule is optimal once level is fixed
Creates the general testing language
14 (p-values, confidence regions)
How to summarize evidence and plausible parameter values
Shows that confidence sets come from inverting tests
15 (Wald, Score, GLRT)
How likelihood theory builds practical asymptotic tests
Provides the GLRT backbone reused in later chi-squared testing
17 (multinomial GLRT, Pearson chi-squared)
How GLRT specializes to categorical goodness-of-fit
Turns one major test family into a concrete reusable tool
18 (test-selection map)
How to choose among the growing testing toolbox
Organizes procedures by data structure instead of by isolated formulas
Big-picture takeaway:
Lecture 17 says "chi-squared testing is really GLRT in the multinomial world." Lecture 18 says "once the toolbox is large, selection logic is as
important as any single formula."
Decision Guide: Which Test to Use
Decision Tree (Quick Test Picker)
Quick rule: classify structure first (one sample, paired, or two independent samples), then choose the test based on mean/proportion vs full
distribution goals.
Most important practical principle:
Before picking a test, identify: (1) number of samples, (2) numerical vs categorical data, (3) mean question vs whole-distribution question, and
(4) whether sample size/assumptions justify asymptotic or exact calibration.
Common Mistakes
1. Using chi-squared GOF with tiny expected counts in many cells without pooling/simulation.
2. Forgetting df adjustments after parameter estimation in GOF tests.
3. Treating paired data as two independent samples.
4. Mixing up "difference in means" tests with "difference in distributions" tests.
5. Applying pooled-variance two-sample t without checking whether equal variance is plausible.
Data 145 Study Guide - Lectures 17-18 - Standalone Review Version