Reliability
The consistency of a measurement instrument across repeated administrations or alternate forms. Expressed as a correlation coefficient between 0 and 1. Test-retest reliability for major adult IQ batteries is typically 0.85 to 0.95. Reliability bounds the maximum possible validity of a measure: a test cannot correlate more strongly with anything else than it correlates with itself.
The consistency of a measurement instrument across repeated administrations or alternate forms. Expressed as a correlation coefficient between 0 and 1. Test-retest reliability for major adult IQ batteries is typically 0.85 to 0.95. Reliability bounds the maximum possible validity of a measure: a test cannot correlate more strongly with anything else than it correlates with itself.
This term appears throughout the cognitive ability literature and across this site's articles. Understanding it is essential for interpreting any IQ score or cognitive subtest result. Modern psychometric textbooks (such as those by Anne Anastasi or Susan Embretson) cover the term in significant additional depth and document the empirical findings that justify its prominence in the field.
In the context of online IQ testing, the implications of this term are usually that the test-taker should be cautious about over-interpreting brief screener results. Most of the published precision claims for major IQ batteries do not transfer directly to short online instruments, and the relevant adjustments — wider confidence intervals, more conservative band assignments — are best made explicitly rather than ignored.
For further reading on this term, consult the related entries in this glossary and the deep-dive articles linked in the Related Reading section. The American Psychological Association's task force report 'Intelligence: Knowns and Unknowns' (1995) and its follow-ups remain the most authoritative summary at an accessible technical level.
Other glossary entries
Cattell-Horn-Carroll model (CHC)
The dominant contemporary framework for organizing cognitive ability research. Three strata: g at the top, ten broad abi…
Raven's Progressive Matrices
A nonverbal IQ test developed by John Raven in 1938, consisting of 60 multiple-choice items of increasing difficulty. Ea…
Mensa
An international high-IQ society, founded in 1946, requiring scores at or above the 98th percentile on a battery of appr…
Normal distribution
The bell-shaped probability distribution that describes the distribution of IQ scores in the population, by construction…
Standard error of measurement (SEM)
The expected variability in a measured score across repeated administrations of the same test, due to measurement error …
Item difficulty
The proportion of a reference sample that answers a particular item correctly. Easy items (proportion correct > 0.7) dis…