Least squares is a form of approximation, and is used to make a prediction based on experimental data. 


Given a set of points P = (x_1, y_1) (x_2, y_2)...(x_n, y_n), and the deviation from the points d=y_i - (ax_i + b) minimize D=\sum_{i=1}^n (y_i - (ax_i+b))^2 by changing values a and b.

Finding the minimum involves talking a look a the derivates.

  • \frac{\delta D}{\delta a}=\sum_{i=1}^n 2 (y_i - (ax_i + b)) (-x_i)
  • \frac{\delta D}{\delta b}=\sum_{i=1}^n 2 (y_i - (ax_i + b)) (-1)

Removing unnecessary 2 constants (solving to approach 0), and simplifying:

  • \frac{\delta D}{\delta a}=\sum_{i=1}^n (y_i - (ax_i + b)) (-x_i) = \sum (a x_i^2 + x_i b + - x_i y_i)
  • \frac{\delta D}{\delta b}=\sum_{i=1}^n (y_i - (ax_i + b)) (-1) = \sum (a x_i + b - y_i)

This can be re-written as a 2x2 linear system:

  • a \sum x_i^2  + b \sum x_i = \sum x_i y_i
  • a \sum x_i + b n = \sum y

Note that some systems may be more complex, and may involve more than two paramaters.

Recursive least-squares algorithm

The Recursive least-squares formula is designed for real-time estimation, rather than performing a batch result each time an entry is added.[1]

\Theta = PB
P = [\sum_{i=1}^N (\phi(t) \phi^T(t))]^{-1} = (\Phi \Phi^T)^{-1}
P^{-1} = [\sim_{i=1}^N (\phi(t) \phi^T(t))] = (\Phi \Phi^T)
: <math>b = \sum{i=1}^N y(t)\phi(t)


  1. Identification, Estimation, and Learning - Lecture 2 - MIT OpenCourseware