Data 145: Evidence and Uncertainty

Comprehensive Study Guide - Lectures 17 through 18 - Spring 2026
Instructors: Ani Adhikari, William Fithian

Bridge: Likelihood Tests to Lecture 17
Lecture 17: Generalized LRT and Chi-Squared Tests
Bridge: Lecture 17 to Lecture 18
Lecture 18: Comparing Two Samples and Choosing Tests
Big Transition Map
Decision Guide: Which Test to Use
Master Summary and Formula Sheet
Common Mistakes

Bridge: Likelihood Tests to Lecture 17

Lecture 15 introduced GLRT as one of the main asymptotic likelihood-based tests. Lecture 17 asks what that same idea becomes when the null restricts several directions at once or when the data are categorical counts rather than one smooth parameter.

The key point is that the chi-squared test in Lecture 17 is not a separate philosophy. It is the large-sample face of likelihood-ratio testing in the multinomial model.

So the conceptual move is: likelihood-ratio testing -> multinomial likelihood -> chi-squared calibration. Pearson's statistic then appears as an approximation to the GLRT statistic, not as an unrelated formula that happens to work.

Lecture 17: Generalized LRT and Chi-Squared Tests

17.1 GLRT recap in one line

$$\text{LR}(\mathbf{X}) = \frac{\Lik(\hat\theta;X)}{\Lik(\theta_0;X)},\qquad 2\log\text{LR}(\mathbf{X}) = 2\bigl[\ell(\hat\theta;X)-\ell(\theta_0;X)\bigr].$$ Reject for large values of $2\log\text{LR}(\mathbf{X})$.

Under regularity and large $n$, for a one-parameter null: $$2\log\text{LR}(\mathbf{X}) \xrightarrow{d} \chi^2_1.$$

$\text{Pois}(\lambda)$ one-sample GLRT for $H_0:\lambda=\lambda_0$: $$2\log\text{LR}(\mathbf{X}) = 2n\left[\bar X\log\left(\frac{\bar X}{\lambda_0}\right)-(\bar X-\lambda_0)\right].$$ This gives a concrete way to compute the statistic before using the $\chi^2_1$ calibration.

17.2 Multi-parameter extension

(Informal course theorem) In regular parametric models, under $H_0$: $$2\log\text{LR}(\mathbf{X}) \xrightarrow{d} \chi^2_k,$$ where $k$ is the number of free parameters constrained by the null comparison.

This is the same local-quadratic story from Lecture 15, just in higher dimension. Each constrained direction contributes one squared approximately normal term, and their sum becomes a chi-squared variable with the corresponding degrees of freedom.

For multinomial data with $c$ categories and fully specified null probabilities, there are $c-1$ free proportions (because they must sum to 1), so the asymptotic reference distribution is $\chi^2_{c-1}$.

17.3 Multinomial GLRT statistic

Let $O_i$ be observed counts and $E_i=n\theta_{0,i}$ expected counts under $H_0$. With MLE $\hat\theta_i=O_i/n$, the lecture derives:

$$\text{LR}(\mathbf{O}) = \prod_{i=1}^{c}\left(\frac{O_i}{E_i}\right)^{O_i}, \qquad 2\log\text{LR}(\mathbf{O}) = 2\sum_{i=1}^{c}O_i\log\frac{O_i}{E_i}.$$

Use counts (frequencies), not proportions, in this formula. Replacing counts by proportions introduces a wrong scale factor.

17.4 Pearson's approximation

Distinguish the two statistics explicitly: $$G^2 = 2\sum_{i=1}^{c}O_i\log\frac{O_i}{E_i} \quad\text{(likelihood-ratio / deviance statistic)}$$ $$X^2 = \sum_{i=1}^{c}\frac{(O_i-E_i)^2}{E_i} \quad\text{(Pearson chi-square statistic)}$$ They are different for finite samples, even though they are often close.

When $H_0$ is true and counts are large enough, $$2\sum_{i=1}^{c}O_i\log\frac{O_i}{E_i} \approx \sum_{i=1}^{c}\frac{(O_i-E_i)^2}{E_i}.$$

Set $D_i=O_i-E_i$. Under $H_0$, $D_i$ is small relative to $E_i$ and $\sum_iD_i=0$. Using $\log(1+x)\approx x-x^2/2$ yields the Pearson expression after cancellation of first-order terms.

Historically this mattered because Pearson's form is much easier to compute by hand. Conceptually it matters because it shows that the familiar chi-squared statistic is tightly connected to likelihood-ratio logic.

17.5 Roulette example and practical interpretation

In the roulette data (red/black/green), the GLRT-based and Pearson chi-squared values are close, with a large $p$-value. This indicates the observed variation is consistent with fair-wheel sampling noise.

17.6 Goodness-of-fit with estimated parameters

If you estimate $q$ model parameters before computing expected counts, subtract $q$ additional degrees of freedom: $$\text{df} = (\text{number of categories}-1) - q.$$

Why subtract $q$? Because estimation uses up flexibility. After the model has already been allowed to match the data in $q$ directions, fewer independent discrepancies remain to measure lack of fit.

Poisson GOF with unknown $\lambda$:

Estimate $\hat\lambda=\bar X$.
Bin counts into categories (with sensible tail pooling).
Compute expected counts under $\text{Pois}(\hat\lambda)$.
Use chi-squared reference with one df deducted for estimating $\lambda$.

Large-sample quality still matters. If expected counts are too small in several cells, chi-squared calibration may be poor; pooling or simulation is safer.

Bridge: Lecture 17 to Lecture 18

Lecture 17 gives one important specialized family of tests for categorical goodness-of-fit. But by this point the course has accumulated many procedures: KS for distributions, z/t logic for means, GLRT for parametric nulls, chi-squared for multinomial fit, and rank or sign methods when assumptions weaken.

So Lecture 18 changes the bottleneck. The main problem is no longer "derive one more test." The main problem is choosing the right framework for the structure of the data and the question being asked.

That is why Lecture 18 is intentionally an "organized laundry list." The list is the point. It organizes testing by one sample vs two samples, numerical vs categorical outcomes, mean questions vs whole-distribution questions, and exact vs asymptotic calibration.

Lecture 18: Comparing Two Samples and Choosing Tests

Lecture 18's main lesson is meta-statistical: before naming a test, classify the problem. Ask:

How many samples are there?
Are the data numerical or categorical?
Are we testing a mean/proportion or an entire distribution?
Are the samples independent or paired?
Do we have an exact finite-sample calibration, or only an asymptotic one?

18.1 One numerical sample: test a mean

For i.i.d. $X_1,\ldots,X_n$ with mean $\mu$, test $H_0:\mu=\mu_0$ using $$T_n = \frac{\bar X_n-\mu_0}{\sigma/\sqrt n}.$$

Setting	Null calibration	Notes
Large sample	Approx standard normal by CLT	Can plug in $\hat\sigma$ or $S$
Small sample, normal model, known $\sigma$	Exact standard normal	Finite-sample exact
Small sample, normal model, unknown $\sigma$	$t_{n-1}$ for $T_n^*=\frac{\bar X-\mu_0}{S/\sqrt n}$	Classical one-sample $t$ test
Bernoulli proportion test	Binomial count null	Use $\sum_i X_i$ directly

Exact normal-vs-t distinction under a normal model: $$X_1,\ldots,X_n \overset{iid}{\sim} N(\mu,\sigma^2) \;\Longrightarrow\; \bar X \sim N\!\left(\mu,\frac{\sigma^2}{n}\right).$$ So $\bar X$ itself is still normal. The change is in the standardized statistic: $$Z=\frac{\bar X-\mu_0}{\sigma/\sqrt n}\sim N(0,1)\quad\text{if }\sigma\text{ is known},$$ $$T=\frac{\bar X-\mu_0}{S/\sqrt n}\sim t_{n-1}\quad\text{if }\sigma\text{ is unknown}.$$ In short: $\bar X$ stays normal; the pivot changes from $Z$ to $T$ when $\sigma$ is replaced by the random $S$.

18.2 Paired data are one sample of differences

If observations come as dependent pairs $(X_i,Y_i)$, reduce to differences $D_i=X_i-Y_i$ and test using one-sample methods on $D_i$.

Sign test (nonparametric): under symmetry and no ties, each difference is positive with probability $1/2$. So the number of positive differences is $\text{Binom}(n,1/2)$ under $H_0$.

18.3 One sample: test the entire underlying distribution

Numerical continuous case: one-sample KS test (fully specified or parameter-estimated version).
Categorical case: chi-squared GOF or TVD-based simulation approach.

If parameters are estimated in chi-squared GOF, degrees of freedom drop by one for each estimated parameter.

18.4 Two independent numerical samples: equality of means

For independent samples $X_1,\ldots,X_n$ and $Y_1,\ldots,Y_m$, test $$H_0:\mu_X=\mu_Y$$ using $$T = \frac{(\bar X-\bar Y)-0}{\sqrt{\sigma_X^2/n+\sigma_Y^2/m}}.$$

Setting	Null calibration	Notes
Large $n,m$	Approx normal (CLT)	Use known or estimated variances
Both normal, known variances	Exact normal	Finite-sample exact
Both normal, unknown unequal variances	Welch-type approximation	No simple exact closed form
Both normal, unknown equal variances	Pooled $t$ with df $n+m-2$	Requires equal-variance assumption

18.5 Two independent numerical samples: equality of distributions

For continuous cdfs $F_X$ and $F_Y$, test $$H_0:F_X=F_Y$$ with two-sample nonparametric methods.

Two-sample KS test

Statistic: KS distance between empirical cdfs.
Calibrate by permutation/simulation from pooled sample under exchangeability.

Wilcoxon rank-sum test

Let $W$ be the sum of ranks of one sample in the pooled ranking (no ties case). Under $H_0$ with $N=n+m$:

$$E(W)=\frac{n(N+1)}{2}, \qquad \Var(W)=\frac{nm(N+1)}{12}.$$

Rank methods are robust and nonparametric. They test distributional shift ideas without requiring normality assumptions.

18.6 Permutation logic (why it works)

Under $H_0:F_X=F_Y$, all labels "X" and "Y" are exchangeable after pooling the data. So we can approximate the null distribution of a test statistic by repeatedly shuffling labels, recomputing the statistic, and comparing the observed value to this permutation reference.

Permutation resamples only values already observed in the pooled sample. That is expected: under exchangeability, the randomness is in group labels, not in inventing new values.

Big Transition Map

Lecture stage	Main question answered	How it hands off forward
12-13 (NP, MLR, UMP)	What rejection rule is optimal once level is fixed	Creates the general testing language
14 (p-values, confidence regions)	How to summarize evidence and plausible parameter values	Shows that confidence sets come from inverting tests
15 (Wald, Score, GLRT)	How likelihood theory builds practical asymptotic tests	Provides the GLRT backbone reused in later chi-squared testing
17 (multinomial GLRT, Pearson chi-squared)	How GLRT specializes to categorical goodness-of-fit	Turns one major test family into a concrete reusable tool
18 (test-selection map)	How to choose among the growing testing toolbox	Organizes procedures by data structure instead of by isolated formulas

Big-picture takeaway:
Lecture 17 says "chi-squared testing is really GLRT in the multinomial world." Lecture 18 says "once the toolbox is large, selection logic is as important as any single formula."

Decision Guide: Which Test to Use

Decision Tree (Quick Test Picker)

Quick rule: classify structure first (one sample, paired, or two independent samples), then choose the test based on mean/proportion vs full distribution goals.

Question type	Data type / structure	Default tools
Mean equals target?	One numerical sample	z/t tests; CLT large-sample; exact/approx finite-sample variants
Paired change?	One sample of paired differences	One-sample test on differences; sign test if nonparametric
Fits known cdf?	One numerical sample	One-sample KS
Fits categorical distribution?	One categorical sample	Chi-squared GOF; TVD with simulation for small samples
Means equal across groups?	Two independent numerical samples	Two-sample z/t logic (Welch or pooled settings)
Distributions equal across groups?	Two independent numerical samples	Two-sample KS, Wilcoxon rank-sum, permutation logic

Master Summary and Formula Sheet

Lecture 17 core formulas

Concept	Formula	Comment
GLRT statistic	$2\log\text{LR}(\mathbf{X}) = 2[\ell(\hat\theta)-\ell(\theta_0)]$	Reject for large values
Multinomial GLRT	$2\sum_i O_i\log(O_i/E_i)$	Counts, not proportions
Pearson approximation	$\sum_i (O_i-E_i)^2/E_i$	Large-sample approximation to GLRT
Df with estimated params	$(c-1)-q$	$q$ estimated parameters

Lecture 18 core formulas

Concept	Formula	Comment
One-sample mean test	$T=(\bar X-\mu_0)/(\sigma/\sqrt n)$	z/t variants by assumptions
Two-sample mean test	$T=((\bar X-\bar Y)-0)/\sqrt{\sigma_X^2/n+\sigma_Y^2/m}$	CLT or normal-model variants
Pooled variance (equal-variance case)	$S_p^2=\frac{(n-1)S_X^2+(m-1)S_Y^2}{n+m-2}$	Used in pooled two-sample t
Wilcoxon rank-sum moments	$E(W)=\frac{n(N+1)}2$, $\Var(W)=\frac{nm(N+1)}{12}$	$N=n+m$

Most important practical principle:
Before picking a test, identify: (1) number of samples, (2) numerical vs categorical data, (3) mean question vs whole-distribution question, and (4) whether sample size/assumptions justify asymptotic or exact calibration.

Common Mistakes

1. Using chi-squared GOF with tiny expected counts in many cells without pooling/simulation.

2. Forgetting df adjustments after parameter estimation in GOF tests.

3. Treating paired data as two independent samples.

4. Mixing up "difference in means" tests with "difference in distributions" tests.

5. Applying pooled-variance two-sample t without checking whether equal variance is plausible.

Data 145 Study Guide - Lectures 17-18 - Standalone Review Version