How Accurate Is an IQ Score?
Every measurement has error, and IQ scores are no exception. The standard error of measurement (SEM) for a well-constructed adult IQ test such as the WAIS-IV is roughly 3 IQ points; for a 25-item online screener like this one, it is closer to 7. That means even a perfectly administered test produces a score that should be interpreted as a confidence interval, not a single number.
Every measurement has error, and IQ scores are no exception. The standard error of measurement (SEM) for a well-constructed adult IQ test such as the WAIS-IV is roughly 3 IQ points; for a 25-item online screener like this one, it is closer to 7. That means even a perfectly administered test produces a score that should be interpreted as a confidence interval, not a single number.
The 95% confidence interval is approximately ±2 SEM. So a measured IQ of 110 on a clinical test means 'somewhere between about 104 and 116 with 95% confidence', not 'exactly 110'. On a screener with SEM 7, the same measured score of 110 means 'somewhere between about 96 and 124' — a range that spans most of the average band. This is a real and important loss of precision compared to clinical testing.
Measurement error has several sources. Test-taker state accounts for some of it: sleep, hunger, anxiety, distraction, motivation, time of day, and recent caffeine intake all influence performance. Item sampling accounts for more: any short test samples only a tiny fraction of the cognitive ability space, and which items happen to appear can shift the score by several points either way. Administrative variation accounts for the rest: clinical tests minimize this by standardizing the testing environment, but online tests have essentially no control over conditions.
Test-retest reliability — the correlation between scores on two administrations of the same test, separated by a short interval — is the empirical measure of how stable scores are. For the WAIS-IV, test-retest reliability is around 0.95 for full-scale IQ, dropping to 0.85 for individual subtests and 0.75 to 0.85 for index scores. For a brief online screener, test-retest reliability is rarely reported but is likely in the 0.7 to 0.85 range based on comparable instruments.
The practical consequence: do not over-interpret small differences in IQ scores. A 5-point difference between two test administrations or two test-takers is well within the noise of any IQ measurement. A 15-point difference is more meaningful but still requires interpretation in context (test format, test conditions, age, language, etc.). Treat IQ scores as approximate rankings within broad bands, not as precise individual measurements.