Data 145 Study Guide - Lectures 23-24

Comprehensive Study Guide - Lectures 23 through 24 - Spring 2026
Instructors: Ani Adhikari, William Fithian

Recall Script: What These Lectures Are Really About

The most useful way to remember Lectures 23 and 24 is not as a list of separate tests. The better story is: normal data can be rotated into orthogonal coordinates; those coordinates split into nuisance, signal, and residual pieces; and the classical test distributions are just ways to compare the signal size to the noise size.

The whole lecture sequence is a machine: $$\text{normal vector}\longrightarrow\text{orthogonal projections}\longrightarrow \text{signal/noise ratio}\longrightarrow z,\chi^2,t,\text{ or }F.$$

How to read this page:
first learn what $\chi^2$, t, and F are as mathematical objects; then see why the canonical model produces exactly those objects; then learn the general model-space rotation $Z=Q^TY$; then use the geometric pictures as the memory hook; finally translate one-sample t, two-sample t, ANOVA, and regression into the same canonical language.

So the goal is not to memorize many formulas independently. The goal is to recognize the same pattern every time: find the tested direction, find the residual directions, decide whether $\sigma^2$ is known, and decide whether the signal is one-dimensional or multidimensional.

Review / Background: The Distribution Objects $\chi^2$, t, and F

1.1 The three objects before the model

Before talking about linear models, pin down the three probability objects. Each one answers a slightly different "how large is this signal?" question.

The memory version is: $$\chi^2=\text{squared Gaussian length},\qquad t=\frac{\text{signal}}{\text{estimated noise}},\qquad F=\frac{\text{signal sum of squares per signal df}}{\text{residual sum of squares per residual df}}.$$ This is exactly what linear-model tests will produce.

If $T\sim t_d$, then $$T^2\sim F_{1,d}.$$ This is immediate because $Z^2\sim \chi^2_1$, so $$T^2=\frac{Z^2/1}{V/d}.$$ A two-sided t-test is therefore the same rejection rule as the corresponding one-degree-of-freedom F-test.

The unknown $\sigma$ disappears in t and F ratios. If $\tilde Z=\sigma Z$ and $\tilde V=\sigma^2V$, then $$\frac{\tilde Z}{\sqrt{\tilde V/d}}=\frac{\sigma Z}{\sqrt{\sigma^2V/d}}=\frac{Z}{\sqrt{V/d}}.$$ This cancellation is why t and F statistics are pivotal when the variance is unknown.

1.2 What the degrees of freedom $d$ means

If $V\sim\chi^2_d$, then by definition $$V=U_1^2+\cdots+U_d^2$$ for $d$ independent standard normal variables. So $d$ is not a decorative adjustment: it is the number of independent Gaussian directions being squared.

Dimension of a Gaussian subspace = degrees of freedom of the chi-squared built from that subspace. Dividing by $d$ turns total squared variation into average squared variation per direction.

1.3 The normal fact that makes everything work

If $Z\sim N_n(0,\sigma^2I_n)$ and $Q$ is orthogonal, meaning $Q^TQ=I_n$, then $$QZ\sim N_n(0,\sigma^2I_n).$$ The spherical normal distribution is unchanged by rotations and reflections.

Use the affine transformation formula for multivariate normals. If $Y=QZ$, then $$Y\sim N_n(Q0,\;Q(\sigma^2I_n)Q^T)=N_n(0,\sigma^2QQ^T)=N_n(0,\sigma^2I_n).$$ The covariance stays spherical because $Q$ is orthogonal.

If $V_1$ and $V_2$ are orthogonal subspaces of dimensions $d_1$ and $d_2$, and $Z\sim N_n(0,\sigma^2I_n)$, then $$\|P_{V_1}Z\|^2\sim \sigma^2\chi^2_{d_1},\qquad \|P_{V_2}Z\|^2\sim \sigma^2\chi^2_{d_2},$$ and these squared projection lengths are independent.

Choose an orthonormal basis adapted to the subspaces. After rotating into that basis, the coordinates are still independent $N(0,\sigma^2)$ variables. Projection lengths are just sums of squares of disjoint coordinate blocks.

Object	Mathematical definition	Conceptual meaning
$\chi^2_d$	If $U_1,\ldots,U_d\overset{iid}{\sim}N(0,1)$, then $\sum_{i=1}^dU_i^2\sim\chi^2_d$.	Squared length of a $d$-dimensional standardized Gaussian vector.
$t_d$	If $Z\sim N(0,1)$, $V\sim\chi^2_d$, and $Z\perp V$, then $Z/\sqrt{V/d}\sim t_d$.	One signed normal coordinate divided by an independent estimated noise scale.
$F_{d_1,d_2}$	If $V_1\sim\chi^2_{d_1}$, $V_2\sim\chi^2_{d_2}$, and $V_1\perp V_2$, then $(V_1/d_1)/(V_2/d_2)\sim F_{d_1,d_2}$.	Ratio of two average squared Gaussian lengths.

A chi-squared distribution does not appear just because a variable is centered at 0. It appears because we take a sum of squares of independent standardized Gaussian coordinates.

1.4 Translation dictionary

This dictionary is the bridge from probability objects to the canonical model. Every later example is just a different way of deciding which projection is nuisance, which projection is signal, and which projection is residual noise.

Geometric object	Statistical role	Distribution under the null
Projection onto tested direction	Signal	Normal if 1D, chi-squared length if multidimensional
Projection onto residual directions	Noise / variance estimate	$\sigma^2\chi^2_{d_r}$ length squared
Projection onto nuisance directions	Unrestricted mean under both hypotheses	Accounted for, but not evidence for the target signal
Ratio of signal to residual scale	Test statistic when $\sigma^2$ unknown	t if 1D signal, F if multidimensional signal

Now let's go into the main model.
The distributions above are the ingredients. The canonical model is the recipe that tells us why those ingredients show up in hypothesis tests. It will label each coordinate as nuisance, signal, or residual, and then the right distribution will almost choose itself.

The Canonical Model: Nuisance, Signal, Residual

2.1 The organizing model

The canonical model is the clean coordinate system we wish every testing problem already came in. The observed vector is split into three orthogonal blocks: $$Z=\begin{bmatrix}Z_0\\Z_1\\Z_r\end{bmatrix} \sim N_n\!\left( \begin{bmatrix}\mu_0\\\mu_1\\0\end{bmatrix}, \sigma^2I_n \right),\qquad d_0+d_1+d_r=n.$$ We test $$H_0:\mu_1=0\qquad\text{vs}\qquad H_1:\mu_1\neq0.$$

This is the important "we can use this!" moment: the residual block is known to be pure noise. Its mean is 0 and it has the same noise variance $\sigma^2$ as the signal block. So if $\sigma^2$ is known, we scale by it directly; if $\sigma^2$ is unknown, we use the residual block to estimate the noise level.

The whole test is now a comparison: $$\text{How large is the signal block }Z_1\text{ compared with the noise scale?}$$ Everything else is just deciding whether the noise scale is known and whether the signal is a signed coordinate or a multidimensional length.

2.2 Why $Z_r$ estimates variance

Since the residual block has known mean 0, $$Z_r\sim N_{d_r}(0,\sigma^2I_{d_r}).$$ Written by coordinates, this means $$Z_{r,1},\ldots,Z_{r,d_r}\overset{iid}{\sim}N(0,\sigma^2).$$ Dividing each coordinate by $\sigma$ standardizes it: $$\frac{Z_{r,1}}{\sigma},\ldots,\frac{Z_{r,d_r}}{\sigma}\overset{iid}{\sim}N(0,1).$$

Now expand the squared length: $$\|Z_r\|^2=Z_{r,1}^2+\cdots+Z_{r,d_r}^2.$$ Dividing by $\sigma^2$ can be pushed inside the sum: $$\frac{\|Z_r\|^2}{\sigma^2} = \frac{Z_{r,1}^2+\cdots+Z_{r,d_r}^2}{\sigma^2} = \left(\frac{Z_{r,1}}{\sigma}\right)^2+\cdots+ \left(\frac{Z_{r,d_r}}{\sigma}\right)^2 = \sum_{j=1}^{d_r}\left(\frac{Z_{r,j}}{\sigma}\right)^2.$$ Since this is a sum of $d_r$ squared standard normals, $$\frac{\|Z_r\|^2}{\sigma^2}\sim\chi^2_{d_r}.$$

Taking expectation gives $E[\|Z_r\|^2]=d_r\sigma^2$, so $$\hat\sigma^2=\frac{\|Z_r\|^2}{d_r}.$$ This is "average squared residual length per residual direction."

Nuisance and residual are not the same thing. $Z_0$ also contains randomness, but its mean is unknown, so its raw squared length includes unknown mean structure. $Z_r$ has mean 0, so its squared length is interpretable as pure noise.

2.3 The two questions to ask before any test

Once the canonical blocks are understood, the rest of the test is determined by two questions.

This is the full canonical intuition in one sentence: $Z_0$ is fit/accounted for but not used as evidence, $Z_1$ is the signal being tested, $Z_r$ supplies noise if needed, and the dimension of $Z_1$ decides whether we keep a signed coordinate or use a squared norm.

The subscript $r$ means residual block, not one specific coordinate. If $d_r=1$, then $Z_r$ has one coordinate and $\sqrt{\|Z_r\|^2/d_r}=|Z_r|$. If $d_r>1$, then $Z_r$ is a vector of several residual coordinates, and $\|Z_r\|^2/d_r$ is the average squared residual coordinate.

Read the table as an algebraic summary of the story. Known variance means the noise scale is fixed. Unknown variance means the residual block supplies the scale. A one-dimensional signal keeps its sign and gives z or t. A multidimensional signal has no single sign, so we use squared length and get $\chi^2$ or F. If $d_1=1$, the squared-length F version is just $T^2$, so the signed t statistic is usually more informative.

General Linear Models: Rotate Into the Canonical Model

3.1 From general coordinates to canonical coordinates

In a general linear model, observe $$Y\sim N_n(\theta,\sigma^2I_n),$$ where the mean vector lies in a model subspace $\mathcal H\subseteq\mathbb R^n$. Test nested subspaces $$H_0:\theta\in\mathcal H_0\qquad \text{vs.}\qquad H_1:\theta\in\mathcal H\setminus\mathcal H_0,$$ with $\mathcal H_0\subseteq\mathcal H$.

This is the big-picture version of the rotation. The full model space $\mathcal H$ contains all mean vectors allowed by the larger model. The null space $\mathcal H_0$ contains the mean vectors allowed if the null hypothesis is true. Points in $\mathcal H\setminus\mathcal H_0$ are alternatives; the orthogonal part $\mathcal H\cap\mathcal H_0^\perp$ is the signal subspace we test after accounting for nuisance directions.

Block	Name	Why it matters
$Z_0\in\mathbb R^{d_0}$	Nuisance	Its mean $\mu_0$ is unknown under both $H_0$ and $H_1$, so it is not evidence for or against the target hypothesis.
$Z_1\in\mathbb R^{d_1}$	Signal	Its mean is forced to be 0 under $H_0$ and allowed to move under $H_1$.
$Z_r\in\mathbb R^{d_r}$	Residual noise	Its mean is known to be 0 under both hypotheses, so its squared length estimates $\sigma^2$.

Question	If yes	If no
Do we know $\sigma^2$?	Use the known $\sigma$ as the noise scale.	Use $Z_r$ to estimate noise with $\hat\sigma^2=\\|Z_r\\|^2/d_r$.
Is the signal one-dimensional?	Keep the signed coordinate $Z_1$ and use z or t.	Use the squared length $\\|Z_1\\|^2$ and use $\chi^2$ or F.

Signal dimension	$\sigma^2$ known	$\sigma^2$ unknown
$d_1=1$	$z=Z_1/\sigma\sim N(0,1)$	$t=Z_1/\sqrt{\\|Z_r\\|^2/d_r}\sim t_{d_r}$
$d_1>1$	$\\|Z_1\\|^2/\sigma^2=\sum_{k=1}^{d_1}(Z_{1,k}/\sigma)^2\sim\chi^2_{d_1}$	$F=\dfrac{\\|Z_1\\|^2/d_1}{\\|Z_r\\|^2/d_r}\sim F_{d_1,d_r}$

Model-space dictionary:
$\mathcal H$: full model space; $\mathcal H_0$: null/nuisance space; $\mathcal H\setminus\mathcal H_0$: alternative region; $\mathcal H\cap\mathcal H_0^\perp$: orthogonal tested signal space; $\mathcal H^\perp$: residual/noise space.

Dimensions determine the canonical blocks: $$d_0=\dim(\mathcal H_0),\qquad d_1=\dim(\mathcal H)-\dim(\mathcal H_0),\qquad d_r=n-\dim(\mathcal H).$$

3.2 The rotation recipe

Choose orthonormal bases:

$Q_0$: basis for $\mathcal H_0$.
$Q_1$: basis for $\mathcal H\cap\mathcal H_0^\perp$.
$Q_r$: basis for $\mathcal H^\perp$.

Stack them: $$Q=[Q_0\mid Q_1\mid Q_r].$$ Then $Q$ is orthogonal and $Z=Q^TY$ is in canonical coordinates.

The rotated mean is $$E[Z]=E[Q^TY]=Q^T\theta = \begin{bmatrix} Q_0^T\theta\\ Q_1^T\theta\\ Q_r^T\theta \end{bmatrix} = \begin{bmatrix} \mu_0\\ \mu_1\\ 0 \end{bmatrix}.$$ The residual block is 0 in mean because every allowed model mean $\theta\in\mathcal H$ is perpendicular to $\mathcal H^\perp$. The tested block decides the hypothesis: $$H_0:\theta\in\mathcal H_0\Longleftrightarrow \mu_1=Q_1^T\theta=0,\qquad H_1:\theta\in\mathcal H\setminus\mathcal H_0\Longleftrightarrow \mu_1\neq0.$$

Because $Q$ is orthogonal, $$Z=Q^TY\sim N_n\!\left( \begin{bmatrix}\mu_0\\\mu_1\\0\end{bmatrix}, \sigma^2I_n \right).$$ That is exactly the canonical block model from Section 2: nuisance, signal, residual.

The choice of basis inside each subspace is not important. The test only depends on projection lengths such as $\|Q_1^TY\|^2$ and $\|Q_r^TY\|^2$, which are intrinsic geometric quantities.

The Geometry: Ratios, Angles, and Rotations

Now move from the general rotation recipe to the picture. This section is about visualization: axes, ratios, angles, and rotations. In canonical coordinates, the signal coordinate is one axis and the residual coordinate is a perpendicular axis. In original data coordinates, the same idea may look tilted, so Section 3's $Q$ rotation turns it into the signal/residual split.

4.1 The $n=2$ canonical picture

Start with the picture, not the formula. Put the tested coordinate $Z_1$ on the horizontal axis and the residual coordinate $Z_2$ on the vertical axis. Under the null, the cloud is centered at the origin. Under the alternative, the center slides horizontally, because only the signal coordinate changes.

Observe $$Z\sim N_2\!\left(\begin{bmatrix}\mu_1\\0\end{bmatrix},\sigma^2I_2\right),$$ and test $$H_0:\mu_1=0 \qquad \text{vs} \qquad H_1:\mu_1\neq 0.$$ The coordinate $Z_1$ is the tested signal direction. The coordinate $Z_2$ is pure residual noise with mean 0 under both hypotheses.

An observed point is evidence against $H_0$ when it points too strongly in the $Z_1$ direction relative to the residual direction. In the unknown-variance case, that comparison is angular: $$\tan(\theta)=\frac{Z_2}{Z_1},\qquad \frac{Z_1}{Z_2}=\cot(\theta),\qquad \frac{Z_1^2}{Z_2^2}=\cot^2(\theta).$$ This tiny two-dimensional picture is the seed of the whole four-test table.

4.2 Angle reading: ratios become cotangents

The diagram below is the visual version of the algebra above. The observed point has two coordinates: its horizontal signal projection $Z_1$ and its vertical residual projection $Z_2$. Comparing $Z_1^2/Z_2^2$ is the same as comparing the squared cotangent of the point's angle from the signal axis.

With unknown variance, absolute scale is not reliable, but angle is. The statistic $|Z_1|/|Z_2|$ measures how closely the observed vector points in the signal direction. A vector nearly aligned with the $Z_1$-axis is surprising under the rotationally symmetric null.

In the canonical $Z_1,Z_2$ coordinates, the observed point determines an angle $\theta$ from the signal axis. Since $\tan(\theta)=Z_2/Z_1$, the ratio $Z_1/Z_2=\cot(\theta)=1/\tan(\theta)$. Squaring gives the $F_{1,1}$ form $Z_1^2/Z_2^2=\cot^2(\theta)$.

A useful way to remember the unknown-variance test: $$\frac{Z_1}{Z_2}=\cot(\theta),\qquad \frac{Z_1^2}{Z_2^2}=\cot^2(\theta).$$ The classical t statistic uses $Z_1/|Z_2|$ for the denominator scale, but the squared test is the same: $$T^2=\frac{Z_1^2}{Z_2^2}.$$ Large $|T|$ means the observed vector is more horizontal than expected under the rotationally symmetric null.

4.3 The $n=2$ one-sample t-test is a rotation

For $X_1,X_2\overset{iid}{\sim}N(\mu,\sigma^2)$, the mean vector is $\mu(1,1)^T$. The signal direction is the diagonal line $X_1=X_2$, not the original $X_1$ axis. A 45-degree rotation turns that diagonal into the canonical signal axis.

Nothing new is happening probabilistically. We are only changing coordinates. The observed point $X$ stays fixed in the plane, but the axes are rotated so that one new axis points along the mean direction and the other new axis points along pure residual variation.

Use $$Q=\frac{1}{\sqrt2}\begin{bmatrix}1 & -1\\1 & 1\end{bmatrix},\qquad Z=Q^TX.$$ Since $X\sim N_2(\mu\mathbf1,\sigma^2I_2)$, the affine transformation formula gives $$Z=Q^TX\sim N_2(Q^T\mu\mathbf1,\sigma^2Q^TQ).$$ Because $Q$ is orthogonal and $Q^T\mathbf1=(\sqrt2,0)^T$, this becomes $$Z\sim N_2\!\left(\begin{bmatrix}\sqrt2\,\mu\\0\end{bmatrix},\sigma^2I_2\right).$$ Reading off the coordinates, $$Z=\begin{bmatrix}(X_1+X_2)/\sqrt2\\(X_2-X_1)/\sqrt2\end{bmatrix}.$$

This is the $d_1=1$, $\sigma^2$ unknown row of the canonical table. In this $n=2$ example the residual block has only one coordinate, so $$Z_r=Z_2,\qquad d_r=1,$$ and the table predicts $$T=\frac{Z_1}{\sqrt{\|Z_r\|^2/d_r}}=\frac{Z_1}{\sqrt{Z_2^2}}=\frac{Z_1}{|Z_2|}.$$

Equivalently, the rotated axes are the orthonormal basis vectors $$q_1=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix},\qquad q_2=\frac{1}{\sqrt2}\begin{bmatrix}-1\\1\end{bmatrix}.$$ The new coordinates are projections: $$Z_1=q_1^TX,\qquad Z_2=q_2^TX.$$ So $Z_1$ is "how far $X$ points along the 45-degree signal line," and $Z_2$ is "how far $X$ points along the perpendicular residual line."

The factor $\sqrt2$ is just normalization. The raw signal direction $(1,1)$ has length $\sqrt2$, so the unit signal vector is $(1,1)/\sqrt2$. Projection coordinates are taken against unit vectors, which is why the signal projection is $Z_1=(X_1+X_2)/\sqrt2$ rather than $X_1+X_2$.

The rotated signal coordinate is $$Z_1=\frac{X_1+X_2}{\sqrt2}=\sqrt2\,\bar X,$$ and the residual coordinate is $$Z_2=\frac{X_2-X_1}{\sqrt2}.$$ So the one-sample t-test with $n=2$ is exactly the $n=2$ canonical model in disguised coordinates.

The transformation $Z=Q^TX$ rotates the coordinate system so the diagonal mean direction becomes the $Z_1$ signal axis. The observed point is the same point in the plane; only the axes change. In the new axes, $Z_1=(X_1+X_2)/\sqrt2$ is the signal projection and $Z_2=(X_2-X_1)/\sqrt2$ is the residual projection. This is the rotated version of the previous canonical picture.

In your handwritten picture, this is the relationship $$\left|\frac{\sqrt2\,\bar X}{S}\right|=|\cot(\theta)|,$$ where $\theta$ is the angle from the 45-degree signal line. The point $X$ is compared to the diagonal signal line in the original coordinates; after rotation, that same comparison becomes the ratio $Z_1/|Z_2|$ in canonical coordinates. The signed statistic keeps the sign of the signal projection $Z_1$.

So the mean subtraction did not disappear. With two points, the sample mean is the midpoint, and the two centered deviations are just opposite halves of the gap between the observations. That is why the residual scale can be written using $|X_1-X_2|$.

4.4 General $n$ one-sample t-test

Let $$X_1,\ldots,X_n\overset{iid}{\sim}N(\mu,\sigma^2),\qquad X\sim N_n(\mu\mathbf1,\sigma^2I_n).$$ The signal subspace is the line spanned by $\mathbf1=(1,\ldots,1)^T$.

This is the same move as the $n=2$ rotation, except the residual part is no longer one perpendicular line. The signal line still has dimension $d_1=1$, but the residual space $\mathbf1^\perp$ has dimension $d_r=n-1$.

Let $$q_1=\frac{\mathbf1}{\sqrt n}.$$ This is a unit vector because $\|\mathbf1\|=\sqrt n$. Extend $q_1$ to an orthonormal basis $$Q=[q_1\mid Q_r],$$ where the columns of $Q_r$ span the residual space $\mathbf1^\perp$. Define the rotated coordinates $$\begin{bmatrix}Z_1\\Z_r\end{bmatrix}=Q^TX,\qquad Z_1=q_1^TX=\sqrt n\,\bar X,\qquad Z_r=Q_r^TX.$$

This is the lecture's affine-transformation step. Since $X\sim N_n(\mu\mathbf1,\sigma^2I_n)$, $$Q^TX\sim N_n(Q^T\mu\mathbf1,\sigma^2Q^TQ).$$ The covariance simplifies because $Q$ is orthogonal: $$\sigma^2Q^TQ=\sigma^2I_n.$$ The mean simplifies because $$Q^T\mathbf1=\begin{bmatrix}q_1^T\mathbf1\\Q_r^T\mathbf1\end{bmatrix} =\begin{bmatrix}\sqrt n\\\mathbf{0}_{n-1}\end{bmatrix}.$$ The top entry is $\sqrt n$ because $q_1=\mathbf1/\sqrt n$, and the residual entries are 0 because the columns of $Q_r$ are perpendicular to $\mathbf1$. Therefore $$\begin{bmatrix}Z_1\\Z_r\end{bmatrix}=Q^TX\sim N_n\!\left(\begin{bmatrix}\sqrt n\,\mu\\\mathbf{0}_{n-1}\end{bmatrix},\sigma^2I_n\right).$$

Now use the rotated coordinates to recognize the usual sample variance. Orthogonal decomposition gives $$\|X\|^2=Z_1^2+\|Z_r\|^2.$$ Since $Z_1=\sqrt n\,\bar X$, $$\|Z_r\|^2=\sum_{i=1}^nX_i^2-n\bar X^2=\sum_{i=1}^n(X_i-\bar X)^2=(n-1)S^2.$$

The first proof explains why $Z_r$ is a pure-noise block with mean 0. The second proof explains why the length of that block is the familiar centered sum of squares. That is exactly where the result is used: it turns the canonical denominator $\|Z_r\|^2$ into the classical denominator $(n-1)S^2$.

Under $H_0:\mu=0$, $$\frac{Z_1}{\sigma}=\frac{\sqrt n\,\bar X}{\sigma}\sim N(0,1),$$ and $$\frac{\|Z_r\|^2}{\sigma^2}=\frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1},$$ independently. Therefore $$T=\frac{Z_1}{\sqrt{\|Z_r\|^2/(n-1)}}=\frac{\sqrt n\,\bar X}{S}\sim t_{n-1}.$$

The classical facts $\bar X\perp S^2$ and $(n-1)S^2/\sigma^2\sim\chi^2_{n-1}$ are not isolated miracles. They come from independence of orthogonal Gaussian projections: $\bar X$ lives in the signal line, and $S^2$ lives in the residual hyperplane.

Recall Checkpoint: Tests and Intervals

Section 2.3 already gave the four-test table. This short section is here as the recall checkpoint: when you are solving a problem, you should be able to rebuild the table from the two questions without memorizing it row by row.

Recall checkpoint:
known $\sigma^2$ + one signal coordinate gives z; known $\sigma^2$ + many signal coordinates gives $\chi^2$; unknown $\sigma^2$ + one signal coordinate gives t; unknown $\sigma^2$ + many signal coordinates gives F.

For the unknown-variance tests, we still need $d_r>0$. No residual degrees of freedom means no independent residual estimate of $\sigma^2$.

5.1 Confidence intervals by inversion

Lecture 14's test-confidence interval duality returns here. For the common one-dimensional unknown-variance case, testing $H_0:\mu_1=\mu_1^0$ uses $$\frac{Z_1-\mu_1^0}{\hat\sigma}\sim t_{d_r}.$$ The non-rejected values form the interval $$\mu_1\in Z_1\pm \hat\sigma\,t_{d_r,1-\alpha/2}.$$

Known variance gives the corresponding normal interval: $$\mu_1\in Z_1\pm \sigma z_{\alpha/2}.$$ Unknown variance replaces $\sigma$ by $\hat\sigma$ and replaces the normal cutoff by the t cutoff.

Applications: Identify the Blocks, Then Test

For each familiar test, resist the urge to start from the final statistic. Instead, identify the null subspace, the full model subspace, the tested signal directions, and the residual directions. The statistic then drops out from the canonical table.

6.1 Equal-variance two-sample t-test

Group 1 has $X_1,\ldots,X_m\overset{iid}{\sim}N(\mu,\sigma^2)$ and group 2 has $Y_1,\ldots,Y_n\overset{iid}{\sim}N(\nu,\sigma^2)$. Let $N=m+n$ and stack the observations into $$W=(X_1,\ldots,X_m,Y_1,\ldots,Y_n)^T.$$ Test $H_0:\mu=\nu$.

The full model space is $$\mathcal H=\Span(a,b),$$ where $a$ is 1 on group 1 and 0 on group 2, while $b$ is 0 on group 1 and 1 on group 2. The null space is $$\mathcal H_0=\Span(\mathbf1_N).$$ Therefore $$d_0=1,\qquad d_1=1,\qquad d_r=N-2.$$

Identify the blocks:
$\mathcal H_0$ is the nuisance direction where both groups share one mean, $\mathcal H\cap\mathcal H_0^\perp$ is the one-dimensional contrast direction, and $\mathcal H^\perp$ is within-group residual noise. Since $d_1=1$ and $\sigma^2$ is unknown, choose the t row of the canonical table.

Contrast direction

The signal direction is the contrast between group means. Define $$c=\left(\underbrace{\frac1m,\ldots,\frac1m}_{m},\underbrace{-\frac1n,\ldots,-\frac1n}_{n}\right)^T.$$ Then $$c^TW=\bar X-\bar Y,\qquad \|c\|^2=\frac1m+\frac1n.$$ The unit signal vector is $$q_1=\frac{c}{\sqrt{1/m+1/n}}.$$

Residual direction and pooled variance

The residual projection has squared length $$\|Z_r\|^2=\sum_{i=1}^m(X_i-\bar X)^2+\sum_{j=1}^n(Y_j-\bar Y)^2.$$ Hence $$\|Z_r\|^2=(m-1)S_X^2+(n-1)S_Y^2,$$ with $d_r=N-2$.

The canonical $d_1=1$, unknown-variance test becomes the classical pooled two-sample t-test: $$T=\frac{Z_1}{S_p}=\frac{\bar X-\bar Y}{S_p\sqrt{1/m+1/n}}\sim t_{N-2}\quad\text{under }H_0.$$

This is the equal-variance two-sample t-test. The pooled variance estimate is justified by the common variance assumption. If the group variances are not plausibly equal, the Welch test from earlier lectures is the safer default.

6.2 Where one-way ANOVA fits

One-way ANOVA compares $G$ group means under a common normal variance assumption. If group $g$ has observations $$Y_{g,1},\ldots,Y_{g,n_g} \overset{iid}{\sim}N(\mu_g,\sigma^2),\qquad g=1,\ldots,G,$$ then the null and alternative are $$H_0:\mu_1=\mu_2=\cdots=\mu_G \qquad\text{vs}\qquad H_1:\text{not all }\mu_g\text{ are equal}.$$ Stack all observations into $$Y=(Y_{1,1},\ldots,Y_{1,n_1},Y_{2,1},\ldots, Y_{G,n_G})^T\sim N_N(\theta,\sigma^2I_N),\qquad N=\sum_{g=1}^G n_g.$$

The usual one-way ANOVA model is built on three assumptions: observations are independent within and across groups, the conditional distributions are approximately normal, and all groups share the same variance $\sigma^2$. A practical first check is to compare the group spreads with box plots or residual plots; if one group is much more variable than the others, the pooled-variance F-test can be misleading.

For the one-way ANOVA null, the mean vector can only lie on the grand-mean line: $$\mathcal H_0=\Span(\mathbf1_N),\qquad d_0=\dim(\mathcal H_0)=1.$$ The full model allows one constant level within each group. If $e_g$ is the group-$g$ indicator vector, for example $e_1=(1,\ldots,1,0,\ldots,0)^T$, then $$\mathcal H=\Span(e_1,\ldots,e_G),\qquad \dim(\mathcal H)=G.$$ Therefore $$H_0:\theta\in\mathcal H_0,\qquad H_1:\theta\in\mathcal H\setminus\mathcal H_0,$$ and $$d_1=\dim(\mathcal H)-\dim(\mathcal H_0)=G-1,\qquad d_r=N-G.$$

Identify the blocks:
the grand-mean line is nuisance, the $G-1$ independent group contrasts are signal, and the within-group deviations are residual noise. Since the signal is multidimensional and $\sigma^2$ is unknown, choose the F row.

A useful correction to keep in mind: ANOVA is absolutely doing projections. The projection step creates the observed squared lengths $\text{SSB}$ and $\text{SSW}$. The F distribution is the reference distribution that tells us whether the ratio of those lengths is unusually large under $H_0$.

Projection view from the discussion section

The null and full models are nested: $\mathcal H_0\subseteq\mathcal H$. Projecting $Y$ onto $\mathcal H_0$ fits one grand mean, while projecting $Y$ onto $\mathcal H$ fits one mean for each group:

$$P_{\mathcal H_0}Y=\bar Y_{\cdot\cdot}\mathbf1_N,$$ $$P_{\mathcal H}Y=(\underbrace{\bar Y_{1\cdot},\ldots,\bar Y_{1\cdot}}_{n_1}, \underbrace{\bar Y_{2\cdot},\ldots,\bar Y_{2\cdot}}_{n_2},\ldots, \underbrace{\bar Y_{G\cdot},\ldots,\bar Y_{G\cdot}}_{n_G})^T.$$

The key projection identity is $$\|Y-P_{\mathcal H_0}Y\|^2=\|Y-P_{\mathcal H}Y\|^2+\|P_{\mathcal H}Y-P_{\mathcal H_0}Y\|^2.$$ In words: $$\text{variation not explained by the null}=\text{variation not explained by the full model}+\text{variation explained by full but not null}.$$

The reason this is a clean sum of squares is orthogonality. The vector $Y-P_{\mathcal H}Y$ lives in $\mathcal H^\perp$ as residual noise, while $P_{\mathcal H}Y-P_{\mathcal H_0}Y$ lives in the tested group-contrast space $\mathcal H\cap\mathcal H_0^\perp$. These directions are perpendicular, so their squared lengths add.

Discussion view: what are the two RSS values doing?

The cleanest way to remember ANOVA is to ask what each model is allowed to explain. The null model is only allowed to fit one grand mean. The full model is allowed to fit one mean per group. So $\RSS_0-\RSS$ is not a mysterious extra formula; it is the amount of error saved when the model is allowed to move from one shared mean to separate group means.

In one-way ANOVA, the null residual sum of squares decomposes orthogonally into between-group signal and within-group residual noise.

The algebra behind total = between + within

$$Y_{gi}-\bar Y_{\cdot\cdot} = \underbrace{(\bar Y_{g\cdot}-\bar Y_{\cdot\cdot})}_{\text{between-group part}}+ \underbrace{(Y_{gi}-\bar Y_{g\cdot})}_{\text{within-group part}}.$$

Squaring and summing gives a cross-term, but it vanishes within each group: $$\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{g\cdot})=0.$$ Therefore $$\sum_{g=1}^G\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{\cdot\cdot})^2 = \sum_{g=1}^G n_g(\bar Y_{g\cdot}-\bar Y_{\cdot\cdot})^2+ \sum_{g=1}^G\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{g\cdot})^2.$$ In ANOVA language, $$\text{SST}=\text{SSB}+\text{SSW}.$$ In nested-model language, $$\RSS_0=(\RSS_0-\RSS)+\RSS.$$

The factor $n_g$ in the between-group term matters. A group mean far from the grand mean is stronger evidence when it comes from many observations, because the fitted group-mean vector has that same shift repeated $n_g$ times.

The null model fits one shared grand mean: $$\RSS_0=\sum_{g=1}^G\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{\cdot\cdot})^2.$$ This leftover error contains both the group-mean mismatch and the within-group noise. The full model fits a separate mean for each group: $$\RSS=\sum_{g=1}^G\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{g\cdot})^2.$$ This is only within-group noise. Therefore $$\RSS_0-\RSS=\sum_{g=1}^G n_g(\bar Y_{g\cdot}-\bar Y_{\cdot\cdot})^2,$$ which is the between-group signal.

This is exactly the canonical decomposition: $$\RSS_0=\|Z_1\|^2+\|Z_r\|^2,\qquad \RSS=\|Z_r\|^2,\qquad \RSS_0-\RSS=\|Z_1\|^2.$$ So ANOVA compares the between-group signal length to the within-group residual-noise length.

Quantity	Model being fit	Plain-language meaning	Canonical block
$\RSS_0$	Null model: one grand mean	Everything left unexplained if all groups are forced to have the same mean.	$\\|Z_1\\|^2+\\|Z_r\\|^2$
$\RSS$	Full model: one mean per group	Only the leftover within-group scatter after fitting each group mean.	$\\|Z_r\\|^2$
$\RSS_0-\RSS$	Improvement from null to full	The between-group signal: how much fit improves by allowing group means to differ.	$\\|Z_1\\|^2$

Plain-language F ratio:
$$F= \frac{\text{between-group variation per signal dimension}} {\text{within-group variation per residual dimension}}.$$ Under $H_0$, both pieces estimate the same noise variance $\sigma^2$, so the ratio should be near 1. A large value means the group means are farther apart than the within-group noise would suggest.

Under $H_0$, $$\frac{\text{SSB}}{\sigma^2}\sim\chi^2_{G-1},\qquad \frac{\text{SSW}}{\sigma^2}\sim\chi^2_{N-G},$$ and the two pieces are independent because they are squared lengths of orthogonal Gaussian projections. That is why $$F=\frac{\text{MSB}}{\text{MSW}}\sim F_{G-1,N-G}.$$

If $\RSS_0$ is the total sum of squares around the grand mean and $\RSS$ is the within-group residual sum of squares, then $$F=\frac{(\RSS_0-\RSS)/(G-1)}{\RSS/(N-G)}=\frac{\text{SSB}/(G-1)}{\text{SSW}/(N-G)}\sim F_{G-1,N-G}\quad\text{under }H_0.$$ This is one-way ANOVA written in the same nested-subspace language as regression.

Practice bridge: why manual ANOVA matches software

A practical ANOVA calculation is just a computational version of the projection picture. A software routine such as `stats.f_oneway` is not using a different idea: it computes the same between-group and within-group sums of squares, forms the same F ratio, and then reads a right-tail probability from the same $F_{G-1,N-G}$ reference distribution.

The manual route is: compute total variation around the grand mean, compute within-group variation around each group mean, take the difference as between-group variation, then form $$F_{\text{obs}}=\frac{\text{SSB}/(G-1)}{\text{SSW}/(N-G)}.$$ The software route and the manual route agree because they are computing this same statistic and using the same right tail: $$p=P(F_{G-1,N-G}\geq F_{\text{obs}}).$$

Conceptually, the p-value is asking: if all groups truly had the same mean, how often would random within-group noise create a between-group projection this large compared with the within-group projection? A small p-value says this particular signal length is too large to comfortably explain as noise.

For example, a p-value around $0.009$ would reject the equal-means null at the 5% level. The important study takeaway is not the arithmetic itself, but the interpretation: the observed between-group projection is large relative to the within-group projection. ANOVA says there is evidence that not all group means are equal, but it does not identify which group comparisons are responsible.

The ANOVA F-test is an omnibus mean test. It can say that the vector of group-mean contrasts is unusually long, but it does not by itself say which group differs from which. It also tests means under the normal/common-variance model; it is not a full test that all empirical distributions are identical. When the boxplot shows strong skewness or unequal spreads, a rank-based method such as Kruskal-Wallis is often a more robust follow-up question.

The two-sample equal-variance t-test is the $G=2$ special case of one-way ANOVA. There is only one group-contrast direction, so $d_1=G-1=1$ and the ANOVA F statistic equals the square of the pooled two-sample t statistic: $$F=T^2.$$

6.3 Linear regression as projection

In regression, $$Y_i=x_i^T\beta+\epsilon_i,\qquad \epsilon_i\overset{iid}{\sim}N(0,\sigma^2),$$ or in matrix form $$Y\sim N_n(X\beta,\sigma^2I_n).$$ Assume the design matrix $X$ has full column rank $d$.

The model space is $$\mathcal H=\Col(X),\qquad \dim(\mathcal H)=d.$$ The fitted values are the orthogonal projection of $Y$ onto this column space: $$\hat Y=X\hat\beta=P_{\mathcal H}Y,\qquad P_{\mathcal H}=X(X^TX)^{-1}X^T.$$

Regression is not a separate universe from ANOVA. The fitted values are the projection onto the model subspace, and the residuals are the projection onto the orthogonal complement. Tests ask whether adding selected directions to the model subspace produces a signal length that is large relative to residual noise.

Practice bridge: least squares is the canonical residual

A common practice exercise is to verify that the canonical residual length $\|Z_r\|^2$ is the same object as the ordinary least-squares RSS. This is the key bridge: least squares is not an extra procedure bolted onto the canonical model. It is exactly the projection of $Y$ onto the model space.

This is why subtracting the mean shows up in regression with an intercept. It is Gram-Schmidt in disguise: remove the nuisance direction $\mathbf1$ from $x$, then test the remaining predictor direction. The slope t-test is just a signed version of "how much does $Y$ point in that cleaned-up signal direction?"

The residual vector is $$r=Y-\hat Y=(I-P_{\mathcal H})Y=P_{\mathcal H^\perp}Y,$$ and $$\RSS=\|r\|^2=\|P_{\mathcal H^\perp}Y\|^2.$$ Since $\dim(\mathcal H^\perp)=n-d$, $$\frac{\RSS}{\sigma^2}\sim \chi^2_{n-d},\qquad \hat\sigma^2=\frac{\RSS}{n-d}.$$

6.4 Regression F-test for a subset of coefficients

Partition the design as $X=[X_0\mid X_1]$, where $X_0$ contains predictors kept under the null and $X_1$ contains predictors being tested. The null hypothesis is $$H_0:\beta_1=0,$$ so $$\mathcal H_0=\Col(X_0),\qquad \mathcal H=\Col(X).$$

ANOVA source	Sum of squares	Degrees of freedom	Mean square
Between groups / signal	$\text{SSB}=\RSS_0-\RSS=\sum_g n_g(\bar Y_{g\cdot}-\bar Y_{\cdot\cdot})^2$	$G-1$	$\text{MSB}=\text{SSB}/(G-1)$
Within groups / residual	$\text{SSW}=\RSS=\sum_g\sum_i(Y_{gi}-\bar Y_{g\cdot})^2$	$N-G$	$\text{MSW}=\text{SSW}/(N-G)$
Total around grand mean	$\text{SST}=\RSS_0=\sum_g\sum_i(Y_{gi}-\bar Y_{\cdot\cdot})^2$	$N-1$	Not used as the denominator for the F-test

Model feature	Canonical interpretation	Least-squares meaning
No intercept	The model space is the line generated by the predictor direction.	The fitted values are the closest point on that line; RSS is the leftover squared distance.
With intercept	The constant vector is a nuisance direction; the predictor signal is what remains after accounting for that constant direction.	The fitted values are the closest point in the intercept-plus-predictor plane; RSS is the squared distance left outside the plane.

Identify the blocks:
$\Col(X_0)$ is nuisance, the extra part of $\Col(X)$ not already explained by $\Col(X_0)$ is signal, and $\Col(X)^\perp$ is residual noise. Therefore $d_1$ is the number of added independent predictor directions and $d_r=n-d$.

Let $$d_0=\rank(X_0),\qquad d_1=d-d_0,\qquad d_r=n-d.$$ Let $\RSS_0$ be the residual sum of squares for the null model and $\RSS$ for the full model. Since $\mathcal H_0\subseteq\mathcal H$, $$\RSS_0\geq \RSS.$$

The improvement $\RSS_0-\RSS$ is the squared length of the tested signal projection: it is how much residual error drops when we add the tested predictors. The full-model RSS estimates the remaining noise.

Same canonical bridge: $$\RSS_0=\|Z_1\|^2+\|Z_r\|^2,\qquad \RSS=\|Z_r\|^2,\qquad \RSS_0-\RSS=\|Z_1\|^2.$$ The null model cannot explain the tested predictor directions, so that signal is counted as leftover error in $\RSS_0$. The full model can explain those directions, so only residual noise remains in $\RSS$.

Plain-language regression F ratio:
$$F= \frac{\text{variation explained by tested predictors per tested direction}} {\text{remaining residual variation per residual direction}}.$$ This is the same signal-over-noise comparison as ANOVA, just with predictor directions instead of group-mean directions.

The regression F-statistic is $$F=\frac{(\RSS_0-\RSS)/d_1}{\RSS/d_r}\sim F_{d_1,d_r}\quad\text{under }H_0.$$ Reject for large values. This asks whether the tested predictors improve fit more than would be expected from noise alone.

The F-test is a whole-subspace test. When $d_1>1$, it does not choose one signed direction the way a t-test does; it squares and adds the projections over all tested directions. That is why ANOVA can detect "some group mean pattern is present" without saying which contrast caused it. The geometry gives the observed length, and the F distribution calibrates how surprising that length is under the null.

6.5 Individual coefficient t-test and confidence interval

For a single coefficient test $H_0:\beta_j=0$, the tested subspace has $d_1=1$. The canonical F-test is equivalent to a t-test: $$T=\frac{\hat\beta_j}{\widehat{\text{SE}}(\hat\beta_j)}\sim t_{n-d}\quad\text{under }H_0.$$

This is the one-dimensional version of the regression subset test. The signal direction is the part of predictor $j$ that remains after the nuisance predictors have been projected out, so the t statistic is a signed signal-over-noise ratio.

The estimated standard error is $$\widehat{\text{SE}}(\hat\beta_j)=\hat\sigma\sqrt{[(X^TX)^{-1}]_{jj}}.$$ The confidence interval is $$\hat\beta_j\pm t_{n-d,1-\alpha/2}\,\widehat{\text{SE}}(\hat\beta_j).$$

Geometrically, the signal direction for $\beta_j$ is the part of predictor column $X_j$ that remains after removing its projection onto all other predictor columns. The coefficient t-statistic measures how much $Y$ points in that unique direction relative to residual noise.

Recall Map

$i$	$x_i$	$w_i$	$y_i$
1	$-2$	$2$	$4$
2	$-1$	$-1$	$-1.5$
3	$0$	$-2$	$0$
4	$1$	$-1$	$3.5$
5	$2$	$2$	$4$

Checklist for any testing problem:
identify the nuisance subspace, identify the tested signal subspace, identify the residual subspace, decide whether $\sigma^2$ is known, and decide whether the tested signal is one-dimensional or multidimensional. Those five answers determine the row of the table.

Problem	Subspaces / dimensions	Statistic	Reference distribution
Known-variance 1D signal	$d_1=1$, $\sigma^2$ known	$Z_1/\sigma$	$N(0,1)$
Known-variance multidimensional signal	$d_1>1$, $\sigma^2$ known	$\\|Z_1\\|^2/\sigma^2$	$\chi^2_{d_1}$
Unknown-variance 1D signal	$d_1=1$, residual df $d_r$	$Z_1/\sqrt{\\|Z_r\\|^2/d_r}$	$t_{d_r}$
Unknown-variance multidimensional signal	$d_1>1$, residual df $d_r$	$(\\|Z_1\\|^2/d_1)/(\\|Z_r\\|^2/d_r)$	$F_{d_1,d_r}$
One-sample t-test	Signal $\Span(\mathbf1)$, residual $\mathbf1^\perp$	$\sqrt n\,\bar X/S$	$t_{n-1}$
Two-sample pooled t-test	$d_0=1$, $d_1=1$, $d_r=m+n-2$	$(\bar X-\bar Y)/(S_p\sqrt{1/m+1/n})$	$t_{m+n-2}$
Regression subset test	$d_1$ tested coefficients, $d_r=n-d$	$((\RSS_0-\RSS)/d_1)/(\RSS/d_r)$	$F_{d_1,d_r}$
One-way ANOVA	$d_0=1$, $d_1=G-1$, $d_r=N-G$	$((\RSS_0-\RSS)/(G-1))/(\RSS/(N-G))$	$F_{G-1,N-G}$
Regression single coefficient	$d_1=1$, $d_r=n-d$	$\hat\beta_j/\widehat{\text{SE}}(\hat\beta_j)$	$t_{n-d}$

Unifying sentence:
Every statistic above compares a tested projection to either a known variance scale or an independent residual variance estimate.

Exam readiness: what to cite vs. what to do

The lecture summary separates prerequisite facts from the new geometric skills. For exam practice, this distinction is useful: do not spend your energy re-proving old distribution facts, but do be ready to use them inside the canonical model.

A strong solution usually has this shape: identify the spaces, compute the dimensions, name the canonical blocks, then write the statistic. The algebra should serve that geometry, not replace it.

Formula Sheet

Distribution facts

Canonical model formulas

Application formulas

Common Mistakes

You can cite as given	You should be able to apply
Affine transformations of multivariate normals: if $Y=AZ+b$, then $Y\sim N(A\mu+b,A\Sigma A^T)$.	Use $Z=Q^TY$ to rotate a model into nuisance, signal, and residual coordinates.
For jointly normal variables, zero covariance implies independence.	Explain why orthogonal Gaussian projections are independent.
The reference definitions of $\chi^2$, t, and F distributions.	Choose the correct row of the four-test table from $d_1$, $d_r$, and whether $\sigma^2$ is known.
The OLS formula $\hat\beta=(X^TX)^{-1}X^TY$.	Translate regression into subspaces: $\mathcal H_0$, $\mathcal H$, $\mathcal H\cap\mathcal H_0^\perp$, and $\mathcal H^\perp$.
Standard sample variance and RSS formulas.	Recognize $\\|Z_r\\|^2=\RSS$ and $\\|Z_1\\|^2=\RSS_0-\RSS$ in nested regression and ANOVA problems.

Object	Formula	Use
Chi-squared	$\sum_{i=1}^dZ_i^2\sim\chi^2_d$	Squared length of a $d$-dimensional Gaussian noise vector
t	$Z/\sqrt{V/d}\sim t_d$	1D Gaussian signal over independent variance estimate
F	$(V_1/d_1)/(V_2/d_2)\sim F_{d_1,d_2}$	Signal sum of squares per signal df over residual sum of squares per residual df
t-F identity	$T^2\sim F_{1,d}$ if $T\sim t_d$	Links two-sided single-signal tests to F-tests

Quantity	Formula	Comment
Residual variance estimate	$\hat\sigma^2=\\|Z_r\\|^2/d_r$	Requires residual df $d_r$
Known-variance 1D signal	$Z_1/\sigma$	$N(0,1)$ under $H_0$
Known-variance signal length	$\\|Z_1\\|^2/\sigma^2$	$\chi^2_{d_1}$ under $H_0$
1D t statistic	$T=Z_1/\sqrt{\\|Z_r\\|^2/d_r}$	$t_{d_r}$ under $H_0$
F statistic	$F=(\\|Z_1\\|^2/d_1)/(\\|Z_r\\|^2/d_r)$	$F_{d_1,d_r}$ under $H_0$
1D t interval	$Z_1\pm \hat\sigma\,t_{d_r,1-\alpha/2}$	By test inversion

Application	Formula	Degrees of freedom
One-sample t	$T=\sqrt n\,\bar X/S$	$n-1$
Two-sample pooled t	$T=(\bar X-\bar Y)/(S_p\sqrt{1/m+1/n})$	$m+n-2$
Pooled variance	$S_p^2=((m-1)S_X^2+(n-1)S_Y^2)/(m+n-2)$	$m+n-2$
Regression F	$F=((\RSS_0-\RSS)/d_1)/(\RSS/(n-d))$	$d_1,\ n-d$
One-way ANOVA F	$F=((\RSS_0-\RSS)/(G-1))/(\RSS/(N-G))$	$G-1,\ N-G$
Regression coefficient t	$T=\hat\beta_j/(\hat\sigma\sqrt{[(X^TX)^{-1}]_{jj}})$	$n-d$

1. Forgetting that nuisance directions are not residual directions.
Nuisance means unknown/free mean under both hypotheses. Residual means known mean 0 under both hypotheses, so it can estimate $\sigma^2$.

2. Using the pooled two-sample t-test without the common-variance assumption.
The pooled test is the canonical equal-variance normal model. If variances differ, the geometry no longer gives this exact t distribution.

3. Confusing $d_1$ and $d_r$ in F-tests.
$d_1$ is the number of tested directions. $d_r$ is the residual degrees of freedom used to estimate $\sigma^2$.

4. Thinking the rotation matrix itself is the point.
The basis is a coordinate choice. The test depends on subspaces and projection lengths, not on the particular orthonormal basis chosen inside a subspace.

5. Forgetting why $\RSS_0-\RSS$ appears in regression.
It is the improvement in fit from adding the tested predictors, which equals the tested signal sum of squares.

6. Treating individual regression coefficients as marginal effects without considering other predictors.
The coefficient t-test is about the unique direction in $X_j$ left after projecting out the other columns of $X$.

7. Thinking an orthogonal transformation leaves the mean unchanged.
It preserves the spherical covariance shape, but it rotates the mean vector too. That is exactly why the one-sample t-test can be put into canonical form.

8. Thinking the alternative must change the residual coordinate.
In the canonical model, the alternative shifts the signal coordinate. The residual coordinate keeps mean 0 and supplies the variance estimate.

9. Thinking the $d$ in $\chi^2_d$ or $V/d$ is arbitrary.
It is the dimension of the relevant Gaussian subspace, also called the degrees of freedom.

10. Confusing the alternative region with the orthogonal signal subspace.
$\mathcal H\setminus\mathcal H_0$ describes which means are allowed under the alternative. The coordinate block $Z_1$ comes from the orthogonal signal space $\mathcal H\cap\mathcal H_0^\perp$, after the nuisance part has been accounted for.

Data 145: Evidence and Uncertainty

Table of Contents