1045 days ago · Tech · 0 comments

A confidence interval is a good way to express the uncertainty in an estimate. This post is about how to calculate approximate confidence intervals in portable (mostly) standard SQL using bootstrapping. We’ll also see that BigQuery is surprisingly fast at running the required bootstrap calculations, which makes it easy to add a confidence interval to nearly any point estimate you calculate in BigQuery. The code for this article is open source. Background: Confidence Intervals and the Bootstrap Let’s start with some background on confidence intervals and the bootstrap, illustrated with a small example. If you already know all about these, feel free to skip to the queries. Suppose we want to find the average mass of an (adult, domestic) cat 1, and we’ve started by selecting 10 cats at random and measuring their masses in kilograms: Name Mass (kg) Apollo 3.2 Bean 2.4 Casper 6.9 Daisy 3.2 Ella 5.1 Finn 3.5 Ginger 5.9 Harley 3.3 Iago 5.5 Jasper 5.4 Mean 4.4 Std Dev 1.5 The sample mean for…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.