IQ Tests Are Inaccurate or Biased
Summaries Written by FARAgent (AI) on February 10, 2026 · Pending Verification
For a long time, many educated people held that IQ tests were a racket dressed up as science. In the 1920s, Walter Lippmann gave that view its most respectable form, arguing in The New Republic that the tests measured schoolroom tricks, speed with puzzles, and familiarity with middle class culture, not any deep, general intelligence. That was not a foolish suspicion. Early tests did contain crude items, the testing movement was entangled with hereditarian claims and eugenic politics, and promoters sometimes spoke as if a single score could settle a human being's worth. A reasonable critic could look at that mix and conclude that "intelligence testing" was oversold, culturally loaded, and far less scientific than its champions claimed.
What went wrong was the leap from "these tests have limits and can be abused" to "IQ is a pseudoscientific hoax." Over the decades, better psychometrics, longitudinal studies, and behavior genetics kept finding the same awkward fact: IQ scores are imperfect, but they are fairly stable, substantially heritable, and strongly predictive of school performance, job training, occupational status, and many life outcomes. The tests were revised, normed, and checked against real world results; the broad pattern held. Even phenomena often cited against IQ, such as the Flynn effect, did not erase the underlying finding that cognitive ability is measurable and consequential.
The old suspicion never quite died, because it served moral and political purposes. It reappeared whenever testing was tied to inequality, race, schooling, or merit, and in recent years it has been recycled in media claims that IQ is "fundamentally flawed" or merely a measure of privilege. But on the core question, the debate is largely over. Most experts now agree that the hoax theory was wrong: IQ tests are not the whole of human worth, and they can be biased in particular uses, but they do measure a real general cognitive trait with substantial predictive power.
- Walter Lippmann was the leading liberal pundit of his day and an advisor to President Wilson who wrote six articles in The New Republic in 1922 denouncing IQ testing as unproven and dangerous. He argued that the testers were dogmatically asserting intelligence was innate and hereditary, propping up fears of a permanent caste system in America. His essays framed the entire enterprise as a threat to democracy and upward mobility, and they shaped elite opinion for decades. The articles were widely read and cited by those uneasy about what the army tests seemed to reveal about the average American mind. [1][4][5]
- Lewis Terman was the Stanford psychologist who created the Stanford-Binet IQ test and defended it against Lippmann with a satirical response that highlighted its practical validity. He revised Binet's original tests using small samples of California children and adults, then declared there was nothing about an individual as important as his IQ. Terman stood by the mental-age concept even when larger data sets contradicted his norms. His work lent institutional weight to the idea that a single score captured something real and hereditary about human worth. [1][4][5]
- Lothrop Stoddard was the writer who misread the army test data to claim the average mental age of Americans was fourteen and then used that figure to predict the downfall of civilization. He drew on Terman's small-sample norms while ignoring the warnings in the official army volume edited by Robert Yerkes. His books turned the misinterpreted numbers into a popular jeremiad about national decline. The claim spread through repeated citation in polite circles and helped fuel the broader suspicion that IQ testing was both pseudoscientific and politically sinister. [4][5]
The New Republic promoted the assumption by publishing Lippmann's six attacking articles in 1922 along with Terman's defensive reply. The magazine gave the debate a prestigious platform and framed IQ testing as an overreach by psychologists who wanted to turn puzzle-solving into destiny. Its readership among policymakers and intellectuals carried the skepticism into the next generation. The exchange became a canonical reference for anyone arguing that the tests measured nothing fundamental. [1]
The US Army promoted and enforced IQ tests on 1.7 million men during World War I for placement purposes, lending the new field of psychology an aura of official credibility. The massive sample produced results that contradicted the small norms Terman had published, yet the data were widely misread by writers eager for dramatic conclusions. The army's involvement turned a laboratory curiosity into a national policy tool and sparked decades of argument about whether mental tests revealed fixed limits or merely current schooling. [4]
Stanford University hosted Terman's revision of Binet's tests on unrepresentative local samples and allowed those norms to stand as the adult standard for years. The institution's prestige helped embed the mental-age concept in American education and testing. Later the same university saw its middle school named after Terman stripped of the name amid campaigns that treated the entire psychometric tradition as tainted. [4][5]
The strongest case for the assumption rested on the obvious flaws in the earliest tests and the social consequences that seemed to follow from them. Walter Lippmann and others could point to cultural items that asked about a crying Dutch girl or the difference between a president and a king, which looked like clear markers of class and upbringing rather than native ability. Early promoters such as Francis Galton had tied mental testing to eugenics, and some testers spoke as if a single score settled a person's hereditary rank for life. These observations made it reasonable for thoughtful observers in the 1920s to worry that the tests were photographing the existing class structure and then declaring it immutable. The kernel of truth was that the first instruments were crude, the samples tiny, and the claims sometimes grandiose; a well-informed person at the time could fairly conclude that the enterprise was premature and politically loaded. [1][2][8]
Yet the assumption hardened into dogma long after better evidence arrived. Lippmann argued that the tests measured only puzzle-solving ability and had no connection to genuine life outcomes, a view that seemed plausible when the field was young. Students surveyed at Rutgers in later decades still repeated the same claims that the tests were biased, did not measure real intelligence, and existed mainly to justify group differences. The 2012 study by Adrian Owen and colleagues at the University of Western Ontario surveyed more than 100,000 people on twelve cognitive tasks and declared that three distinct components replaced any single IQ factor, a conclusion that mainstream outlets treated as the final disproof. Each of these lines of argument carried surface credibility from real limitations in early instruments or from egalitarian intuitions, but each was later undermined by data showing that the tests predicted education, occupation, health, and even lifespan with remarkable consistency. [3][6][8]
The association with eugenics supplied another durable foundation. Critics could cite the immoral beliefs of some originators and the forced sterilizations that followed, then conclude that the entire body of results must be morally and scientifically tainted. This argument felt compelling because the historical record was ugly and because polite society had decided that any hereditary explanation was suspect. Subsequent research, however, showed that the predictive power of IQ scores stood independent of the moral failings of early proponents, much as the Nazi doctors' discovery about smoking and cancer survived their crimes. The assumption persisted not because the evidence supported it but because it served larger narratives about class, race, and equality. [8]
Liberal pundits spread the denial through repeated media attacks that began with Lippmann's 1922 essays and never really stopped. The articles were echoed a century later as a popular conspiracy theory that dismissed psychometrics as rigged from the start. During the Great Awokening, social pressure turned the skepticism into orthodoxy; schools named after Terman were renamed despite the evidence that his tests worked. The pattern repeated in polite company where mentioning IQ scores reliably produced stern rebukes labeling the speaker elitist or worse. [1]
University curricula played a quiet but powerful role by simply omitting the subject. The psychology department at Rutgers University offered no classes on IQ or human intelligence, so its students graduated carrying the same misconceptions their predecessors had listed in surveys. This institutional silence ensured that each new cohort of teachers, clinicians, and policymakers entered the world unaware of the predictive data that had accumulated for decades. The gap in training became self-perpetuating. [3]
Mainstream outlets kept the assumption alive with fresh scientific-looking claims. The Independent ran headlines declaring IQ tests fundamentally flawed after the Owen study, quoting Roger Highfield of the Science Museum in London as saying the idea was disproved once and for all. The Guardian published Angela Saini's attacks that framed any discussion of eugenics or heredity as pseudoscience. Search engines amplified UN reports and academic papers on AI racial bias, giving the old skepticism a new technological costume. Each channel lent the appearance of fresh, authoritative rebuttal to an idea that had already been tested and retested in the real world. [6][10][11]
The US Army enacted widespread intelligence testing during World War I and assigned nearly 1.7 million men to roles based on scores that were equated to mental age. The policy gave the tests an official stamp and fed the public narrative that psychologists had uncovered fixed limits on human potential. Low scorers were funneled into certain duties, and the results were later cited as proof that the average American mind was alarmingly immature. The program lent credibility to the very enterprise it would later be used to discredit. [4]
Courts and schools built lasting rules around IQ cutoffs that ignored the Flynn effect. The Supreme Court decision in Atkins v. Virginia in 2002 barred the death penalty for the intellectually disabled but left the IQ threshold of 70 vulnerable to uncorrected old norms. Special education eligibility continued to rely on scores that fluctuated with each new test version, producing inconsistent identifications and sometimes denying services. These policies treated the numbers as fixed when the data showed they were moving. [7]
School naming policies later enforced the assumption by erasing the Terman name from a Palo Alto middle school on the grounds that any connection to eugenics was disqualifying. The decision rejected alternative names when ethnic objections arose, turning a local honor into a public ritual of repudiation. Post-World War II laws banned eugenic sterilizations and defunded related research after the entire framework had been declared pseudoscience. Each policy rested on the premise that the tests measured nothing real or that their origins rendered them unusable. [1][11]
The refusal to use IQ in hiring produced less productive workforces across job types because no other tool matched its ability to forecast performance, training success, and leadership. Organizations that ignored the data paid for it in inefficiency and missed talent. The pattern repeated in education and social policy where the assumption blocked recognition of cognitive differences that mattered for outcomes. [8]
Misapplication of the Flynn effect in capital cases led to improper death sentences and later conversions to life imprisonment. By 2008 more than eighty Atkins cases had been affected, with hundreds still active. Uncorrected scores also distorted special education eligibility, sometimes denying services when norms shifted by as much as 5.6 points. The human cost was measured in years of lost liberty or lost support. [7]
The broader cultural harm was an endless cycle of ignorance in psychology training, research, and public debate. Canceling the Terman school name erased recognition of genuine contributions and led to absurd disputes over replacement names. The assumption distorted academic language, pushed researchers toward euphemisms, and diverted attention from the g-factor that kept reappearing in the data. Honest scientists saw their careers penalized for stating what the tests actually showed. [1][3][8]
The assumption began to crack when the army's sample of 1.7 million men contradicted Terman's tiny norms for adult mental age. Robert Yerkes had warned in the official volume that the Stanford figures were unreliable, yet the caution was ignored by writers eager for a civilizational crisis. Lippmann seized on the statistical error and called the mental-age claim nonsense, but the larger lesson was that the early instruments had been built on sand. Larger data sets kept arriving. [4][5]
Over four generations the tests demonstrated their ability to predict education, occupation, health, and lifespan, directly refuting Lippmann's claim that they had no connection to life success. A Scottish study of more than 13,000 children found that IQ at age eleven correlated 0.81 with exam scores at age sixteen, near the theoretical maximum. Meta-analyses showed validity coefficients between 0.2 and 0.5 for job performance, with stronger prediction in complex roles. The predictive record accumulated until the old objections looked antique. [8][12]
Fraud exposures and direct replications finished the job. PubPeer and a Stanford freshman journalist revealed manipulated images in high-profile labs, breaking the spell that peer-reviewed work from elite institutions was automatically trustworthy. The Flynn effect meta-analysis of 285 studies gave a precise estimate of 2.31 points per decade and showed the gains were robust, forcing clinicians to confront that norms grew obsolete. By the time Steven Pinker could dismiss a New York Times essay on the subject as already settled a decade earlier, the scientific debate had ended. The assumption survived only in pockets of media and academia that had stopped reading the data. [2][7][9]
-
[1]
Is I.Q. a Conspiracy Theory?reputable_journalism
-
[2]
What Is Elon Musk’s I.Q.?reputable_journalism
-
[3]
Do genes matter more under capitalism?reputable_journalism
-
[4]
Debunking Intelligence Experts: Walter Lippmann Speaks Outreputable_journalism
-
[5]
The Mental Age Of Americansreputable_journalism
- [6]
-
[7]
The Flynn Effect: A Meta-analysispeer_reviewed
- [8]
-
[9]
Science Has a Major Fraud Problemreputable_journalism
-
[10]
ai racially biased - Bingreputable_journalism
- [11]
-
[12]
Does IQ Really Predict Job Performance?peer_reviewed
-
[13]
Bias in Psychological Assessmentpeer_reviewed
- [14]
- [15]
- Race-IQ Inquiry Must Be SilencedAcademia Civil Rights Criminal Justice Education Genetics History Media Bias Psychology Public Policy Science Technology
- Affirmative Action Causes No Reverse DiscriminationAI Academia Civil Rights Criminal Justice Education Media Bias Psychology Public Policy Technology
- Airport Profiling is Racial DiscriminationAI Academia Civil Rights Criminal Justice Education Media Bias Psychology Public Policy Technology
- Diversity is Our StrengthAI Academia Civil Rights Criminal Justice Education History Media Bias Public Policy Technology
- Implicit Bias Test Predicts DiscriminationAI Academia Civil Rights Criminal Justice Education Psychology Public Policy Science Technology