Canonical example The variance of a set of numbers is defined as the sum of the squared distances from each point to the mean. So it would seem that you first need to calculate the mean, then go back and compute the squared differences from the mean. And yet the variance can be computed in one pass through the data. You’ll find two equivalent equations in statistics books: the one described above and another based on the sum of the data points and the sum of the data points squared. While this equation is theoretically correct, it is numerically unstable. Code that directly implements this equation can return a negative value for a quantity that is theoretically positive. I’ve seen this happen with real data, causing a program to crash when taking the square root of the variance to get the standard deviation. However, there is an algorithm that computes variance in one pass that is accurate and numerically stable. This algorithm was developed by B. P. Welford in 1962. I discuss…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.