Recall Script: What These Lectures Are Really About
The most useful way to remember Lectures 23 and 24 is not as a list of separate tests. The better story is: normal data can be rotated into
orthogonal coordinates; those coordinates split into nuisance, signal, and residual pieces; and the classical test distributions are just ways
to compare the signal size to the noise size.
The whole lecture sequence is a machine: $$\text{normal vector}\longrightarrow\text{orthogonal projections}\longrightarrow \text{signal/noise
ratio}\longrightarrow z,\chi^2,t,\text{ or }F.$$
How to read this page:
first learn what $\chi^2$, t, and F are as mathematical objects; then see why the canonical model produces exactly those objects; then learn the
general model-space rotation $Z=Q^TY$; then use the geometric pictures as the memory hook; finally translate one-sample t, two-sample t, ANOVA,
and regression into the same canonical language.
So the goal is not to memorize many formulas independently. The goal is to recognize the same pattern every time: find the tested direction,
find the residual directions, decide whether $\sigma^2$ is known, and decide whether the signal is one-dimensional or multidimensional.
Review / Background: The Distribution Objects $\chi^2$, t, and F
1.1 The three objects before the model
Before talking about linear models, pin down the three probability objects. Each one answers a slightly different "how large is this signal?"
question.
Object
Mathematical definition
Conceptual meaning
$\chi^2_d$
If $U_1,\ldots,U_d\overset{iid}{\sim}N(0,1)$, then $\sum_{i=1}^dU_i^2\sim\chi^2_d$.
Squared length of a $d$-dimensional standardized Gaussian vector.
$t_d$
If $Z\sim N(0,1)$, $V\sim\chi^2_d$, and $Z\perp V$, then $Z/\sqrt{V/d}\sim t_d$.
One signed normal coordinate divided by an independent estimated noise scale.
$F_{d_1,d_2}$
If $V_1\sim\chi^2_{d_1}$, $V_2\sim\chi^2_{d_2}$, and $V_1\perp V_2$, then $(V_1/d_1)/(V_2/d_2)\sim F_{d_1,d_2}$.
Ratio of two average squared Gaussian lengths.
The memory version is: $$\chi^2=\text{squared Gaussian length},\qquad t=\frac{\text{signal}}{\text{estimated noise}},\qquad F=\frac{\text{signal
sum of squares per signal df}}{\text{residual sum of squares per residual df}}.$$ This is exactly what linear-model tests will produce.
If $T\sim t_d$, then $$T^2\sim F_{1,d}.$$ This is immediate because $Z^2\sim \chi^2_1$, so $$T^2=\frac{Z^2/1}{V/d}.$$ A two-sided t-test is
therefore the same rejection rule as the corresponding one-degree-of-freedom F-test.
The unknown $\sigma$ disappears in t and F ratios. If $\tilde Z=\sigma Z$ and $\tilde V=\sigma^2V$, then $$\frac{\tilde Z}{\sqrt{\tilde
V/d}}=\frac{\sigma Z}{\sqrt{\sigma^2V/d}}=\frac{Z}{\sqrt{V/d}}.$$ This cancellation is why t and F statistics are pivotal when the variance is
unknown.
1.2 What the degrees of freedom $d$ means
If $V\sim\chi^2_d$, then by definition $$V=U_1^2+\cdots+U_d^2$$ for $d$ independent standard normal variables. So $d$ is not a decorative
adjustment: it is the number of independent Gaussian directions being squared.
$$E[V]=d\qquad\Longrightarrow\qquad E[V/d]=1.$$
Dimension of a Gaussian subspace = degrees of freedom of the chi-squared built from that subspace. Dividing by $d$ turns total squared variation
into average squared variation per direction.
1.3 The normal fact that makes everything work
If $Z\sim N_n(0,\sigma^2I_n)$ and $Q$ is orthogonal, meaning $Q^TQ=I_n$, then $$QZ\sim N_n(0,\sigma^2I_n).$$ The spherical normal distribution
is unchanged by rotations and reflections.
Use the affine transformation formula for multivariate normals. If $Y=QZ$, then $$Y\sim
N_n(Q0,\;Q(\sigma^2I_n)Q^T)=N_n(0,\sigma^2QQ^T)=N_n(0,\sigma^2I_n).$$ The covariance stays spherical because $Q$ is orthogonal.
If $V_1$ and $V_2$ are orthogonal subspaces of dimensions $d_1$ and $d_2$, and $Z\sim N_n(0,\sigma^2I_n)$, then $$\|P_{V_1}Z\|^2\sim
\sigma^2\chi^2_{d_1},\qquad \|P_{V_2}Z\|^2\sim \sigma^2\chi^2_{d_2},$$ and these squared projection lengths are independent.
Choose an orthonormal basis adapted to the subspaces. After rotating into that basis, the coordinates are still independent $N(0,\sigma^2)$
variables. Projection lengths are just sums of squares of disjoint coordinate blocks.
A chi-squared distribution does not appear just because a variable is centered at 0. It appears because we take a sum of
squares of independent standardized Gaussian coordinates.
1.4 Translation dictionary
This dictionary is the bridge from probability objects to the canonical model. Every later example is just a different way of deciding which
projection is nuisance, which projection is signal, and which projection is residual noise.
Geometric object
Statistical role
Distribution under the null
Projection onto tested direction
Signal
Normal if 1D, chi-squared length if multidimensional
Projection onto residual directions
Noise / variance estimate
$\sigma^2\chi^2_{d_r}$ length squared
Projection onto nuisance directions
Unrestricted mean under both hypotheses
Accounted for, but not evidence for the target signal
Ratio of signal to residual scale
Test statistic when $\sigma^2$ unknown
t if 1D signal, F if multidimensional signal
Now let's go into the main model.
The distributions above are the ingredients. The canonical model is the recipe that tells us why those ingredients show up in hypothesis tests.
It will label each coordinate as nuisance, signal, or residual, and then the right distribution will almost choose itself.
The Canonical Model: Nuisance, Signal, Residual
2.1 The organizing model
The canonical model is the clean coordinate system we wish every testing problem already came in. The observed vector is split into three
orthogonal blocks: $$Z=\begin{bmatrix}Z_0\\Z_1\\Z_r\end{bmatrix} \sim N_n\!\left( \begin{bmatrix}\mu_0\\\mu_1\\0\end{bmatrix}, \sigma^2I_n
\right),\qquad d_0+d_1+d_r=n.$$ We test $$H_0:\mu_1=0\qquad\text{vs}\qquad H_1:\mu_1\neq0.$$
Block
Name
Why it matters
$Z_0\in\mathbb R^{d_0}$
Nuisance
Its mean $\mu_0$ is unknown under both $H_0$ and $H_1$, so it is not evidence for or against the target hypothesis.
$Z_1\in\mathbb R^{d_1}$
Signal
Its mean is forced to be 0 under $H_0$ and allowed to move under $H_1$.
$Z_r\in\mathbb R^{d_r}$
Residual noise
Its mean is known to be 0 under both hypotheses, so its squared length estimates $\sigma^2$.
This is the important "we can use this!" moment: the residual block is known to be pure noise. Its mean is 0 and it has the same noise variance
$\sigma^2$ as the signal block. So if $\sigma^2$ is known, we scale by it directly; if $\sigma^2$ is unknown, we use the residual block to
estimate the noise level.
The whole test is now a comparison: $$\text{How large is the signal block }Z_1\text{ compared with the noise scale?}$$ Everything else is just
deciding whether the noise scale is known and whether the signal is a signed coordinate or a multidimensional length.
2.2 Why $Z_r$ estimates variance
Since the residual block has known mean 0, $$Z_r\sim N_{d_r}(0,\sigma^2I_{d_r}).$$ Written by coordinates, this means
$$Z_{r,1},\ldots,Z_{r,d_r}\overset{iid}{\sim}N(0,\sigma^2).$$ Dividing each coordinate by $\sigma$ standardizes it:
$$\frac{Z_{r,1}}{\sigma},\ldots,\frac{Z_{r,d_r}}{\sigma}\overset{iid}{\sim}N(0,1).$$
Now expand the squared length: $$\|Z_r\|^2=Z_{r,1}^2+\cdots+Z_{r,d_r}^2.$$ Dividing by $\sigma^2$ can be pushed inside the sum:
$$\frac{\|Z_r\|^2}{\sigma^2} = \frac{Z_{r,1}^2+\cdots+Z_{r,d_r}^2}{\sigma^2} = \left(\frac{Z_{r,1}}{\sigma}\right)^2+\cdots+
\left(\frac{Z_{r,d_r}}{\sigma}\right)^2 = \sum_{j=1}^{d_r}\left(\frac{Z_{r,j}}{\sigma}\right)^2.$$ Since this is a sum of $d_r$ squared standard
normals, $$\frac{\|Z_r\|^2}{\sigma^2}\sim\chi^2_{d_r}.$$
Taking expectation gives $E[\|Z_r\|^2]=d_r\sigma^2$, so $$\hat\sigma^2=\frac{\|Z_r\|^2}{d_r}.$$ This is "average squared residual length per
residual direction."
Nuisance and residual are not the same thing. $Z_0$ also contains randomness, but its mean is unknown, so its raw squared length includes
unknown mean structure. $Z_r$ has mean 0, so its squared length is interpretable as pure noise.
2.3 The two questions to ask before any test
Once the canonical blocks are understood, the rest of the test is determined by two questions.
Question
If yes
If no
Do we know $\sigma^2$?
Use the known $\sigma$ as the noise scale.
Use $Z_r$ to estimate noise with $\hat\sigma^2=\|Z_r\|^2/d_r$.
Is the signal one-dimensional?
Keep the signed coordinate $Z_1$ and use z or t.
Use the squared length $\|Z_1\|^2$ and use $\chi^2$ or F.
This is the full canonical intuition in one sentence: $Z_0$ is fit/accounted for but not used as evidence, $Z_1$ is the signal being tested,
$Z_r$ supplies noise if needed, and the dimension of $Z_1$ decides whether we keep a signed coordinate or use a squared norm.
Read the table as an algebraic summary of the story. Known variance means the noise scale is fixed. Unknown variance means the residual block
supplies the scale. A one-dimensional signal keeps its sign and gives z or t. A multidimensional signal has no single sign, so we use squared
length and get $\chi^2$ or F. If $d_1=1$, the squared-length F version is just $T^2$, so the signed t statistic is usually more informative.
General Linear Models: Rotate Into the Canonical Model
3.1 From general coordinates to canonical coordinates
In a general linear model, observe $$Y\sim N_n(\theta,\sigma^2I_n),$$ where the mean vector lies in a model subspace $\mathcal H\subseteq\mathbb
R^n$. Test nested subspaces $$H_0:\theta\in\mathcal H_0\qquad \text{vs.}\qquad H_1:\theta\in\mathcal H\setminus\mathcal H_0,$$ with $\mathcal
H_0\subseteq\mathcal H$.
This is the big-picture version of the rotation. The full model space $\mathcal H$ contains all mean vectors allowed by the larger model. The
null space $\mathcal H_0$ contains the mean vectors allowed if the null hypothesis is true. Points in $\mathcal H\setminus\mathcal H_0$ are
alternatives; the orthogonal part $\mathcal H\cap\mathcal H_0^\perp$ is the signal subspace we test after accounting for nuisance directions.
Model-space dictionary:
$\mathcal H$: full model space; $\mathcal H_0$: null/nuisance space; $\mathcal H\setminus\mathcal H_0$: alternative region; $\mathcal
H\cap\mathcal H_0^\perp$: orthogonal tested signal space; $\mathcal H^\perp$: residual/noise space.
$Q_1$: basis for $\mathcal H\cap\mathcal H_0^\perp$.
$Q_r$: basis for $\mathcal H^\perp$.
Stack them: $$Q=[Q_0\mid Q_1\mid Q_r].$$ Then $Q$ is orthogonal and $Z=Q^TY$ is in canonical coordinates.
The rotated mean is $$E[Z]=E[Q^TY]=Q^T\theta = \begin{bmatrix} Q_0^T\theta\\ Q_1^T\theta\\ Q_r^T\theta \end{bmatrix} = \begin{bmatrix} \mu_0\\
\mu_1\\ 0 \end{bmatrix}.$$ The residual block is 0 in mean because every allowed model mean $\theta\in\mathcal H$ is perpendicular to $\mathcal
H^\perp$. The tested block decides the hypothesis: $$H_0:\theta\in\mathcal H_0\Longleftrightarrow \mu_1=Q_1^T\theta=0,\qquad
H_1:\theta\in\mathcal H\setminus\mathcal H_0\Longleftrightarrow \mu_1\neq0.$$
Because $Q$ is orthogonal, $$Z=Q^TY\sim N_n\!\left( \begin{bmatrix}\mu_0\\\mu_1\\0\end{bmatrix}, \sigma^2I_n \right).$$ That is exactly the
canonical block model from Section 2: nuisance, signal, residual.
The choice of basis inside each subspace is not important. The test only depends on projection lengths such as $\|Q_1^TY\|^2$ and
$\|Q_r^TY\|^2$, which are intrinsic geometric quantities.
The Geometry: Ratios, Angles, and Rotations
Now move from the general rotation recipe to the picture. This section is about visualization: axes, ratios, angles, and rotations. In canonical
coordinates, the signal coordinate is one axis and the residual coordinate is a perpendicular axis. In original data coordinates, the same idea
may look tilted, so Section 3's $Q$ rotation turns it into the signal/residual split.
4.1 The $n=2$ canonical picture
Start with the picture, not the formula. Put the tested coordinate $Z_1$ on the horizontal axis and the residual coordinate $Z_2$ on the
vertical axis. Under the null, the cloud is centered at the origin. Under the alternative, the center slides horizontally, because only the
signal coordinate changes.
Observe $$Z\sim N_2\!\left(\begin{bmatrix}\mu_1\\0\end{bmatrix},\sigma^2I_2\right),$$ and test $$H_0:\mu_1=0 \qquad \text{vs} \qquad
H_1:\mu_1\neq 0.$$ The coordinate $Z_1$ is the tested signal direction. The coordinate $Z_2$ is pure residual noise with mean 0 under both
hypotheses.
An observed point is evidence against $H_0$ when it points too strongly in the $Z_1$ direction relative to the residual direction. In the
unknown-variance case, that comparison is angular: $$\tan(\theta)=\frac{Z_2}{Z_1},\qquad \frac{Z_1}{Z_2}=\cot(\theta),\qquad
\frac{Z_1^2}{Z_2^2}=\cot^2(\theta).$$ This tiny two-dimensional picture is the seed of the whole four-test table.
4.2 Angle reading: ratios become cotangents
The diagram below is the visual version of the algebra above. The observed point has two coordinates: its horizontal signal projection $Z_1$ and
its vertical residual projection $Z_2$. Comparing $Z_1^2/Z_2^2$ is the same as comparing the squared cotangent of the point's angle from the
signal axis.
With unknown variance, absolute scale is not reliable, but angle is. The statistic $|Z_1|/|Z_2|$ measures how closely the observed vector points
in the signal direction. A vector nearly aligned with the $Z_1$-axis is surprising under the rotationally symmetric null.
In the canonical $Z_1,Z_2$ coordinates, the observed point determines an angle $\theta$ from the signal axis. Since $\tan(\theta)=Z_2/Z_1$, the
ratio $Z_1/Z_2=\cot(\theta)=1/\tan(\theta)$. Squaring gives the $F_{1,1}$ form $Z_1^2/Z_2^2=\cot^2(\theta)$.
A useful way to remember the unknown-variance test: $$\frac{Z_1}{Z_2}=\cot(\theta),\qquad \frac{Z_1^2}{Z_2^2}=\cot^2(\theta).$$ The classical t
statistic uses $Z_1/|Z_2|$ for the denominator scale, but the squared test is the same: $$T^2=\frac{Z_1^2}{Z_2^2}.$$ Large $|T|$ means the
observed vector is more horizontal than expected under the rotationally symmetric null.
4.3 The $n=2$ one-sample t-test is a rotation
For $X_1,X_2\overset{iid}{\sim}N(\mu,\sigma^2)$, the mean vector is $\mu(1,1)^T$. The signal direction is the diagonal line $X_1=X_2$, not the
original $X_1$ axis. A 45-degree rotation turns that diagonal into the canonical signal axis.
Nothing new is happening probabilistically. We are only changing coordinates. The observed point $X$ stays fixed in the plane, but the axes are
rotated so that one new axis points along the mean direction and the other new axis points along pure residual variation.
Use $$Q=\frac{1}{\sqrt2}\begin{bmatrix}1 & -1\\1 & 1\end{bmatrix},\qquad
Z=Q^TX=\begin{bmatrix}(X_1+X_2)/\sqrt2\\(X_2-X_1)/\sqrt2\end{bmatrix}.$$ Then $$Z\sim
N_2\!\left(\begin{bmatrix}\sqrt2\,\mu\\0\end{bmatrix},\sigma^2I_2\right).$$
Equivalently, the rotated axes are the orthonormal basis vectors $$q_1=\frac{1}{\sqrt2}\begin{bmatrix}1\\1\end{bmatrix},\qquad
q_2=\frac{1}{\sqrt2}\begin{bmatrix}-1\\1\end{bmatrix}.$$ The new coordinates are projections: $$Z_1=q_1^TX,\qquad Z_2=q_2^TX.$$ So $Z_1$ is "how
far $X$ points along the 45-degree signal line," and $Z_2$ is "how far $X$ points along the perpendicular residual line."
The rotated signal coordinate is $$Z_1=\frac{X_1+X_2}{\sqrt2}=\sqrt2\,\bar X,$$ and the residual coordinate is $$Z_2=\frac{X_2-X_1}{\sqrt2}.$$
So the one-sample t-test with $n=2$ is exactly the $n=2$ canonical model in disguised coordinates.
The transformation $Z=Q^TX$ rotates the coordinate system so the diagonal mean direction becomes the $Z_1$ signal axis. The observed point is
the same point in the plane; only the axes change. In the new axes, $Z_1=(X_1+X_2)/\sqrt2$ is the signal projection and $Z_2=(X_2-X_1)/\sqrt2$
is the residual projection. This is the rotated version of the previous canonical picture.
In your handwritten picture, this is the relationship $$\frac{\sqrt2\,\bar X}{S}=\cot\!\bigl(\angle(X,\text{45-degree signal line})\bigr).$$ The
point $X$ is compared to the diagonal signal line in the original coordinates; after rotation, that same comparison becomes the ratio
$Z_1/|Z_2|$ in canonical coordinates.
The classical statistic is $$T_{\text{classical}}=\frac{\sqrt2\,\bar X}{S}.$$ The sample standard deviation still comes from subtracting the
sample mean. For $n=2$, $$\bar X=\frac{X_1+X_2}{2},$$ so $$X_1-\bar X=\frac{X_1-X_2}{2},\qquad X_2-\bar X=\frac{X_2-X_1}{2}.$$ Therefore
$$S^2=(X_1-\bar X)^2+(X_2-\bar X)^2 =\frac{(X_1-X_2)^2}{4}+\frac{(X_2-X_1)^2}{4} =\frac{(X_1-X_2)^2}{2},$$ so $$S=\frac{|X_1-X_2|}{\sqrt2}.$$
Since $Z_1=(X_1+X_2)/\sqrt2$ and $|Z_2|=|X_2-X_1|/\sqrt2$, $$T_{\text{classical}}=\frac{(X_1+X_2)/\sqrt2}{|X_1-X_2|/\sqrt2}=\frac{Z_1}{|Z_2|}.$$
So the mean subtraction did not disappear. With two points, the sample mean is the midpoint, and the two centered deviations are just opposite
halves of the gap between the observations. That is why the residual scale can be written using $|X_1-X_2|$.
4.4 General $n$ one-sample t-test
Let $$X_1,\ldots,X_n\overset{iid}{\sim}N(\mu,\sigma^2),\qquad X\sim N_n(\mu\mathbf1,\sigma^2I_n).$$ The signal subspace is the line spanned by
$\mathbf1=(1,\ldots,1)^T$.
Let $$q_1=\frac{\mathbf1}{\sqrt n},\qquad Z_1=q_1^TX=\sqrt n\,\bar X.$$ Extend $q_1$ to an orthonormal basis $Q=[q_1\mid Q_r]$ and define
$Z_r=Q_r^TX$. Then $$\begin{bmatrix}Z_1\\Z_r\end{bmatrix}\sim N_n\!\left(\begin{bmatrix}\sqrt n\,\mu\\0\end{bmatrix},\sigma^2I_n\right).$$
Why does the residual block have mean 0? Start from $E[X]=\mu\mathbf1$. Since every column of $Q_r$ is perpendicular to $\mathbf1$,
$$Q_r^T\mathbf1=0.$$ Therefore $$E[Z_r]=E[Q_r^TX]=Q_r^TE[X]=Q_r^T(\mu\mathbf1)=\mu Q_r^T\mathbf1=0.$$ The signal coordinate keeps the mean
because $q_1$ points exactly along $\mathbf1$: $$E[Z_1]=E[q_1^TX]=q_1^T(\mu\mathbf1)=\sqrt n\,\mu.$$ Finally, because $Q$ is orthogonal,
$$\operatorname{Var}(Q^TX)=Q^T(\sigma^2I_n)Q=\sigma^2I_n.$$ This is the proof of the canonical rotated form
$$\begin{bmatrix}Z_1\\Z_r\end{bmatrix}\sim N_n\!\left(\begin{bmatrix}\sqrt n\,\mu\\0\end{bmatrix},\sigma^2I_n\right).$$
Now use the rotated coordinates to recognize the usual sample variance. Orthogonal decomposition gives $$\|X\|^2=Z_1^2+\|Z_r\|^2.$$ Since
$Z_1=\sqrt n\,\bar X$, $$\|Z_r\|^2=\sum_{i=1}^nX_i^2-n\bar X^2=\sum_{i=1}^n(X_i-\bar X)^2=(n-1)S^2.$$
The first proof explains why $Z_r$ is a pure-noise block with mean 0. The second proof explains why the length of that block is the familiar
centered sum of squares. That is exactly where the result is used: it turns the canonical denominator $\|Z_r\|^2$ into the classical denominator
$(n-1)S^2$.
The number $n-1$ is the residual dimension: after fitting one mean direction, only $n-1$ independent noise directions remain.
Under $H_0:\mu=0$, $$\frac{Z_1}{\sigma}=\frac{\sqrt n\,\bar X}{\sigma}\sim N(0,1),$$ and
$$\frac{\|Z_r\|^2}{\sigma^2}=\frac{(n-1)S^2}{\sigma^2}\sim \chi^2_{n-1},$$ independently. Therefore
$$T=\frac{Z_1}{\sqrt{\|Z_r\|^2/(n-1)}}=\frac{\sqrt n\,\bar X}{S}\sim t_{n-1}.$$
The classical facts $\bar X\perp S^2$ and $(n-1)S^2/\sigma^2\sim\chi^2_{n-1}$ are not isolated miracles. They come from independence of
orthogonal Gaussian projections: $\bar X$ lives in the signal line, and $S^2$ lives in the residual hyperplane.
Recall Checkpoint: Tests and Intervals
Section 2.3 already gave the four-test table. This short section is here as the recall checkpoint: when you are solving a problem, you should be
able to rebuild the table from the two questions without memorizing it row by row.
Recall checkpoint:
known $\sigma^2$ + one signal coordinate gives z; known $\sigma^2$ + many signal coordinates gives $\chi^2$; unknown $\sigma^2$ + one signal
coordinate gives t; unknown $\sigma^2$ + many signal coordinates gives F.
For the unknown-variance tests, we still need $d_r>0$. No residual degrees of freedom means no independent residual estimate of $\sigma^2$.
5.1 Confidence intervals by inversion
Lecture 14's test-confidence interval duality returns here. For the common one-dimensional unknown-variance case, testing $H_0:\mu_1=\mu_1^0$
uses $$\frac{Z_1-\mu_1^0}{\hat\sigma}\sim t_{d_r}.$$ The non-rejected values form the interval $$\mu_1\in Z_1\pm
\hat\sigma\,t_{d_r,1-\alpha/2}.$$
Known variance gives the corresponding normal interval: $$\mu_1\in Z_1\pm \sigma z_{\alpha/2}.$$ Unknown variance replaces $\sigma$ by
$\hat\sigma$ and replaces the normal cutoff by the t cutoff.
Applications: Identify the Blocks, Then Test
For each familiar test, resist the urge to start from the final statistic. Instead, identify the null subspace, the full model subspace, the
tested signal directions, and the residual directions. The statistic then drops out from the canonical table.
6.1 Equal-variance two-sample t-test
Group 1 has $X_1,\ldots,X_m\overset{iid}{\sim}N(\mu,\sigma^2)$ and group 2 has $Y_1,\ldots,Y_n\overset{iid}{\sim}N(\nu,\sigma^2)$. Let $N=m+n$
and stack the observations into $$W=(X_1,\ldots,X_m,Y_1,\ldots,Y_n)^T.$$ Test $H_0:\mu=\nu$.
The full model space is $$\mathcal H=\Span(a,b),$$ where $a$ is 1 on group 1 and 0 on group 2, while $b$ is 0 on group 1 and 1 on group 2. The
null space is $$\mathcal H_0=\Span(\mathbf1_N).$$ Therefore $$d_0=1,\qquad d_1=1,\qquad d_r=N-2.$$
Identify the blocks:
$\mathcal H_0$ is the nuisance direction where both groups share one mean, $\mathcal H\cap\mathcal H_0^\perp$ is the one-dimensional contrast
direction, and $\mathcal H^\perp$ is within-group residual noise. Since $d_1=1$ and $\sigma^2$ is unknown, choose the t row of the canonical
table.
Contrast direction
The signal direction is the contrast between group means. Define
$$c=\left(\underbrace{\frac1m,\ldots,\frac1m}_{m},\underbrace{-\frac1n,\ldots,-\frac1n}_{n}\right)^T.$$ Then $$c^TW=\bar X-\bar Y,\qquad
\|c\|^2=\frac1m+\frac1n.$$ The unit signal vector is $$q_1=\frac{c}{\sqrt{1/m+1/n}}.$$
The signal coordinate is $$Z_1=q_1^TW=\frac{\bar X-\bar Y}{\sqrt{1/m+1/n}}.$$ Under $H_0$, $Z_1\sim N(0,\sigma^2)$.
Residual direction and pooled variance
The residual projection has squared length $$\|Z_r\|^2=\sum_{i=1}^m(X_i-\bar X)^2+\sum_{j=1}^n(Y_j-\bar Y)^2.$$ Hence
$$\|Z_r\|^2=(m-1)S_X^2+(n-1)S_Y^2,$$ with $d_r=N-2$.
The pooled variance estimate is $$S_p^2=\frac{(m-1)S_X^2+(n-1)S_Y^2}{N-2}.$$
The canonical $d_1=1$, unknown-variance test becomes the classical pooled two-sample t-test: $$T=\frac{Z_1}{S_p}=\frac{\bar X-\bar
Y}{S_p\sqrt{1/m+1/n}}\sim t_{N-2}\quad\text{under }H_0.$$
The corresponding confidence interval for $\mu-\nu$ is $$(\bar X-\bar Y)\pm t_{N-2,1-\alpha/2}\,S_p\sqrt{\frac1m+\frac1n}.$$
This is the equal-variance two-sample t-test. The pooled variance estimate is justified by the common variance assumption. If the group
variances are not plausibly equal, the Welch test from earlier lectures is the safer default.
6.2 Where one-way ANOVA fits
One-way ANOVA compares $k$ group means under a common normal variance assumption. If the group sizes are $n_1,\ldots,n_k$ and $N=\sum_{g=1}^k
n_g$, write the observations as $Y_{gi}$ for group $g$ and observation $i$. The full model space is spanned by the $k$ group indicator vectors.
For the one-way ANOVA null that all group means are equal: $$\mathcal H_0=\Span(\mathbf1_N),\qquad \dim(\mathcal H_0)=1,$$ while the full model
has dimension $k$. Therefore $$d_0=1,\qquad d_1=k-1,\qquad d_r=N-k.$$
Identify the blocks:
the grand-mean line is nuisance, the $k-1$ independent group contrasts are signal, and the within-group deviations are residual noise. Since the
signal is multidimensional and $\sigma^2$ is unknown, choose the F row.
The ANOVA F-test is the same canonical F-test. The tested signal is the between-group variation, and the residual noise is the within-group
variation.
The null model fits one shared grand mean: $$\RSS_0=\sum_{g=1}^k\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{\cdot\cdot})^2.$$ This leftover error contains
both the group-mean mismatch and the within-group noise. The full model fits a separate mean for each group:
$$\RSS=\sum_{g=1}^k\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_{g\cdot})^2.$$ This is only within-group noise. Therefore $$\RSS_0-\RSS=\sum_{g=1}^k n_g(\bar
Y_{g\cdot}-\bar Y_{\cdot\cdot})^2,$$ which is the between-group signal.
This is exactly the canonical decomposition: $$\RSS_0=\|Z_1\|^2+\|Z_r\|^2,\qquad \RSS=\|Z_r\|^2,\qquad \RSS_0-\RSS=\|Z_1\|^2.$$ So ANOVA
compares the between-group signal length to the within-group residual-noise length.
Plain-language F ratio:
$$F= \frac{\text{between-group variation per signal dimension}} {\text{within-group variation per residual dimension}}.$$ Under $H_0$, both
pieces estimate the same noise variance $\sigma^2$, so the ratio should be near 1. A large value means the group means are farther apart than
the within-group noise would suggest.
If $\RSS_0$ is the total sum of squares around the grand mean and $\RSS$ is the within-group residual sum of squares, then
$$F=\frac{(\RSS_0-\RSS)/(k-1)}{\RSS/(N-k)}\sim F_{k-1,N-k}\quad\text{under }H_0.$$ This is one-way ANOVA written in the same nested-subspace
language as regression.
6.3 Linear regression as projection
In regression, $$Y_i=x_i^T\beta+\epsilon_i,\qquad \epsilon_i\overset{iid}{\sim}N(0,\sigma^2),$$ or in matrix form $$Y\sim
N_n(X\beta,\sigma^2I_n).$$ Assume the design matrix $X$ has full column rank $d$.
The model space is $$\mathcal H=\Col(X),\qquad \dim(\mathcal H)=d.$$ The fitted values are the orthogonal projection of $Y$ onto this column
space: $$\hat Y=X\hat\beta=P_{\mathcal H}Y,\qquad P_{\mathcal H}=X(X^TX)^{-1}X^T.$$
Regression is not a separate universe from ANOVA. The fitted values are the projection onto the model subspace, and the residuals are the
projection onto the orthogonal complement. Tests ask whether adding selected directions to the model subspace produces a signal length that is
large relative to residual noise.
The residual vector is $$r=Y-\hat Y=(I-P_{\mathcal H})Y=P_{\mathcal H^\perp}Y,$$ and $$\RSS=\|r\|^2=\|P_{\mathcal H^\perp}Y\|^2.$$ Since
$\dim(\mathcal H^\perp)=n-d$, $$\frac{\RSS}{\sigma^2}\sim \chi^2_{n-d},\qquad \hat\sigma^2=\frac{\RSS}{n-d}.$$
6.4 Regression F-test for a subset of coefficients
Partition the design as $X=[X_0\mid X_1]$, where $X_0$ contains predictors kept under the null and $X_1$ contains predictors being tested. The
null hypothesis is $$H_0:\beta_1=0,$$ so $$\mathcal H_0=\Col(X_0),\qquad \mathcal H=\Col(X).$$
Identify the blocks:
$\Col(X_0)$ is nuisance, the extra part of $\Col(X)$ not already explained by $\Col(X_0)$ is signal, and $\Col(X)^\perp$ is residual noise.
Therefore $d_1$ is the number of added independent predictor directions and $d_r=n-d$.
Let $$d_0=\rank(X_0),\qquad d_1=d-d_0,\qquad d_r=n-d.$$ Let $\RSS_0$ be the residual sum of squares for the null model and $\RSS$ for the full
model. Since $\mathcal H_0\subseteq\mathcal H$, $$\RSS_0\geq \RSS.$$
The improvement $\RSS_0-\RSS$ is the squared length of the tested signal projection: it is how much residual error drops when we add the tested
predictors. The full-model RSS estimates the remaining noise.
Same canonical bridge: $$\RSS_0=\|Z_1\|^2+\|Z_r\|^2,\qquad \RSS=\|Z_r\|^2,\qquad \RSS_0-\RSS=\|Z_1\|^2.$$ The null model cannot explain the
tested predictor directions, so that signal is counted as leftover error in $\RSS_0$. The full model can explain those directions, so only
residual noise remains in $\RSS$.
Plain-language regression F ratio:
$$F= \frac{\text{variation explained by tested predictors per tested direction}} {\text{remaining residual variation per residual direction}}.$$
This is the same signal-over-noise comparison as ANOVA, just with predictor directions instead of group-mean directions.
The regression F-statistic is $$F=\frac{(\RSS_0-\RSS)/d_1}{\RSS/d_r}\sim F_{d_1,d_r}\quad\text{under }H_0.$$ Reject for large values. This asks
whether the tested predictors improve fit more than would be expected from noise alone.
6.5 Individual coefficient t-test and confidence interval
For a single coefficient test $H_0:\beta_j=0$, the tested subspace has $d_1=1$. The canonical F-test is equivalent to a t-test:
$$T=\frac{\hat\beta_j}{\widehat{\mathrm{SE}}(\hat\beta_j)}\sim t_{n-d}\quad\text{under }H_0.$$
This is the one-dimensional version of the regression subset test. The signal direction is the part of predictor $j$ that remains after the
nuisance predictors have been projected out, so the t statistic is a signed signal-over-noise ratio.
The estimated standard error is $$\widehat{\mathrm{SE}}(\hat\beta_j)=\hat\sigma\sqrt{[(X^TX)^{-1}]_{jj}}.$$ The confidence interval is
$$\hat\beta_j\pm t_{n-d,1-\alpha/2}\,\widehat{\mathrm{SE}}(\hat\beta_j).$$
Geometrically, the signal direction for $\beta_j$ is the part of predictor column $X_j$ that remains after removing its projection onto all
other predictor columns. The coefficient t-statistic measures how much $Y$ points in that unique direction relative to residual noise.
Recall Map
Checklist for any testing problem:
identify the nuisance subspace, identify the tested signal subspace, identify the residual subspace, decide whether $\sigma^2$ is known, and
decide whether the tested signal is one-dimensional or multidimensional. Those five answers determine the row of the table.
Problem
Subspaces / dimensions
Statistic
Reference distribution
Known-variance 1D signal
$d_1=1$, $\sigma^2$ known
$Z_1/\sigma$
$N(0,1)$
Known-variance multidimensional signal
$d_1>1$, $\sigma^2$ known
$\|Z_1\|^2/\sigma^2$
$\chi^2_{d_1}$
Unknown-variance 1D signal
$d_1=1$, residual df $d_r$
$Z_1/\sqrt{\|Z_r\|^2/d_r}$
$t_{d_r}$
Unknown-variance multidimensional signal
$d_1>1$, residual df $d_r$
$(\|Z_1\|^2/d_1)/(\|Z_r\|^2/d_r)$
$F_{d_1,d_r}$
One-sample t-test
Signal $\Span(\mathbf1)$, residual $\mathbf1^\perp$
$\sqrt n\,\bar X/S$
$t_{n-1}$
Two-sample pooled t-test
$d_0=1$, $d_1=1$, $d_r=m+n-2$
$(\bar X-\bar Y)/(S_p\sqrt{1/m+1/n})$
$t_{m+n-2}$
Regression subset test
$d_1$ tested coefficients, $d_r=n-d$
$((\RSS_0-\RSS)/d_1)/(\RSS/d_r)$
$F_{d_1,d_r}$
One-way ANOVA
$d_0=1$, $d_1=k-1$, $d_r=N-k$
$((\RSS_0-\RSS)/(k-1))/(\RSS/(N-k))$
$F_{k-1,N-k}$
Regression single coefficient
$d_1=1$, $d_r=n-d$
$\hat\beta_j/\widehat{\mathrm{SE}}(\hat\beta_j)$
$t_{n-d}$
Unifying sentence:
Every statistic above compares a tested projection to either a known variance scale or an independent residual variance estimate.
Formula Sheet
Distribution facts
Object
Formula
Use
Chi-squared
$\sum_{i=1}^dZ_i^2\sim\chi^2_d$
Squared length of a $d$-dimensional Gaussian noise vector
t
$Z/\sqrt{V/d}\sim t_d$
1D Gaussian signal over independent variance estimate
F
$(V_1/d_1)/(V_2/d_2)\sim F_{d_1,d_2}$
Signal sum of squares per signal df over residual sum of squares per residual df
1. Forgetting that nuisance directions are not residual directions.
Nuisance means unknown/free mean under both hypotheses. Residual means known mean 0 under both hypotheses, so it can estimate $\sigma^2$.
2. Using the pooled two-sample t-test without the common-variance assumption.
The pooled test is the canonical equal-variance normal model. If variances differ, the geometry no longer gives this exact t distribution.
3. Confusing $d_1$ and $d_r$ in F-tests.
$d_1$ is the number of tested directions. $d_r$ is the residual degrees of freedom used to estimate $\sigma^2$.
4. Thinking the rotation matrix itself is the point.
The basis is a coordinate choice. The test depends on subspaces and projection lengths, not on the particular orthonormal basis chosen inside a
subspace.
5. Forgetting why $\RSS_0-\RSS$ appears in regression.
It is the improvement in fit from adding the tested predictors, which equals the tested signal sum of squares.
6. Treating individual regression coefficients as marginal effects without considering other predictors.
The coefficient t-test is about the unique direction in $X_j$ left after projecting out the other columns of $X$.
7. Thinking an orthogonal transformation leaves the mean unchanged.
It preserves the spherical covariance shape, but it rotates the mean vector too. That is exactly why the one-sample t-test can be put into
canonical form.
8. Thinking the alternative must change the residual coordinate.
In the canonical model, the alternative shifts the signal coordinate. The residual coordinate keeps mean 0 and supplies the variance estimate.
9. Thinking the $d$ in $\chi^2_d$ or $V/d$ is arbitrary.
It is the dimension of the relevant Gaussian subspace, also called the degrees of freedom.
10. Confusing the alternative region with the orthogonal signal subspace.
$\mathcal H\setminus\mathcal H_0$ describes which means are allowed under the alternative. The coordinate block $Z_1$ comes from the orthogonal
signal space $\mathcal H\cap\mathcal H_0^\perp$, after the nuisance part has been accounted for.
Data 145 Study Guide - Lectures 23-24 - Standalone Review Version