Do you see the Matrix? Derivation of Linear chi^2 minimization

This blog post is primarily for my Ay117 students. However, if you've ever wondered where chi-squared minimization comes from, here's my derivation.
 Figure 1: Either a scene from The Matrix or the hallway in your astronomy building.
Yesterday in class we reviewed the concept of "chi-squared minimization," starting with Bayes' Theorem
$P(\{a\} | {d}) \propto P(\{d\} | {a}) P(\{a\})$
In other words, if we wish to assess the probability of a hypothesis that is expressed in terms of the parameters $\{a\}$ conditioned on our data $\{d\}$, we first calculate how likely we were to get our data under the hypothesis (first term on the right), and multiply this "likelihood" by our prior notion that a given set of parameters is representative of the truth.

Supposing that we have data that are independent from one another, and normally distributed, then our likelihood term can be written
$P(\{d\} | {a}) = \prod_i \frac{1}{\sqrt{2\pi \sigma^2}} \exp{\left[\frac{1}{2}\left( \frac{y_i - f(x_i)}{\sigma_i}\right)^2\right]}$
As for the priors, we'll make the fast and loose assumption that they are constant ($P(a_0) = P(a_1) = ... = {\rm const}$). It is computationally advantageous to compute the log-likelihood
$l = \ln{P(\{d\} | {a})} = C - \frac{1}{2} \sum_{i=0}^{N-1} \left[ \frac{y_i - f(x_i)}{\sigma_i}\right]^2 = C - \frac{1}{2} \chi^2$
Since our goal is to find the parameters that maximize the likelihood, this is equivalent to maximizing the log-likelihood, which is in turn equivalent to minimizing that $\chi^2$ thingy.

For the specific problem of fitting an Mth-degree polynomial, $f(x_i) = \sum_{j=0}^{M-1}a_j x_i^j$, and this results in a linear system of equations that can be solved for the best-fitting parameters.

In class, I got my notation all scrambled, and I neglected the measurement uncertainties $\sigma_i$. My bad! Here's what should have appeared on the board (worked out this morning over breakfast, so be sure to check my work!).

To be clear, the "weights" are $w_i = 1/\sigma_i^2$. Zooming in on the key part:

The first problem of the next Class Activity will be to write a function that takes abscissa and ordinate values, and the associated uncertainties, and computes the best-fitting coefficients for a polynomial of arbitrary dimension $M$.

An annual note to all the (NSF) haters

It's that time of year again: students have recently been notified about whether they received the prestigious NSF Graduate Student Research Fellowship. Known in the STEM community as "The NSF," the fellowship provides a student with three years of graduate school tuition and stipend, with the latter typically 5-10% above the standard institutional support for first- and second-year students. It's a sweet deal, and a real accellerant for young students to get their research career humming along smoothly because they don't need to restrict themselves to only advisors who have funding: the students fund themselves!
This is also the time of year that many a white dude executes what I call the "academic soccer flop." It looks kinda like this:

It typically sounds like this: "Congrats! Of course it's easier for you to win the NSF because you're, you know, the right demographic." Or worse: "She only won because she's Hispanic."…