Peer Review Filters for Quality

False Assumption: Peer review reliably filters scientific manuscripts for methodological quality through objective inter-rater agreement among reviewers.

Summaries Written by FARAgent (AI) on March 03, 2026 · Pending Verification

For a long time, journals, funders, and universities treated peer review as the quality filter that made science science. The idea had an obvious appeal. Independent experts, reading the same manuscript, were supposed to spot weak methods, bad statistics, and overblown claims before publication. Editors called the process indispensable, and many reasonable people accepted that if several qualified reviewers looked at a paper, their judgments would converge on its scientific merit. That belief had a real kernel of truth: expert scrutiny can catch errors, and science does need criticism before findings harden into fact.

The trouble was that when researchers began measuring how much reviewers actually agreed, the numbers were poor. Studies from the 1980s and 1990s already found that reviewers often gave sharply different verdicts on the same manuscript or grant, and Bornmann's 2010 meta-analysis reported inter-rater reliability far below the sort of standards methodologists would demand for high-stakes individual decisions. Reviewers were not simply detecting objective quality; they were also reacting to novelty, school of thought, writing style, prestige cues, and their own priors. Editors had long known this in practice, which is why one reviewer could call a paper important and another could call it fatally flawed, but the public story still treated peer review as a dependable screen for methodological rigor.

The current debate has not ended, but growing evidence suggests the old confidence was too broad. An influential minority of researchers now argue that peer review is better understood as a noisy, biased sorting process than as a reliable instrument with strong reviewer-to-reviewer agreement. That does not mean review is useless, only that its authority was often overstated, especially when journals and policymakers spoke as if acceptance itself certified methodological quality. The live question now is not whether expert review should exist, but how much trust its verdicts deserve, and what other checks, replication, data transparency, post-publication criticism, should carry more of the load.

Status: A small but growing and influential group of experts think this was false

People Involved

Gregory Tassey served as the chief economist at the National Institute of Standards and Technology where he commissioned a major study on the economic costs of inadequate software testing infrastructure. He examined how long-standing assumptions about existing tools and methods had left developers and users exposed to repeated failures. His work documented billions in avoidable losses across sectors and pressed for better measurement standards. The report stood as an early warning that informal practices were more costly than anyone admitted. ^[2]
Lutz Bornmann worked as a researcher at the Max Planck Society when he led a large-scale meta-analysis that pulled together decades of scattered findings on peer review reliability. He and his colleagues examined 48 studies covering nearly 20,000 manuscripts and produced the first quantitative synthesis showing mean inter-rater reliability far below acceptable thresholds. The paper became a reference point for those questioning the process. It confirmed what earlier narrative reviews had only suggested. ^[3]
Robbie Fox edited The Lancet during the middle of the twentieth century and openly doubted whether peer review accomplished much at all. He joked that one could swap the piles of accepted and rejected papers or simply throw manuscripts down the stairs and achieve similar results. His skepticism remained largely ignored by the growing scientific establishment. The process continued to expand despite his warnings. ^[7]
Stephen Lock served as editor of the BMJ and decided to test the value of peer review by handling some papers himself without sending them out. He found almost no difference in outcomes compared with the full review process. The exercise illustrated how little empirical support existed for the standard practice. His findings were published but did little to slow the spread of the system. ^[7]
Granville J. Matheson conducted research at the Karolinska Institutet and developed methods to estimate reliability for new studies based on prior test-retest data. He warned that ignoring low reliability in healthy volunteer studies led to underpowered clinical research and needless exposure of participants to radiation. He created an R package to help researchers check feasibility before starting expensive projects. His approach offered a practical way to confront the problem. ^[8]

▶ Supporting Quotes (9)

““In the early stages of predictive or construct validation research, time and energy can be saved using instruments that have only modest reliability, e.g., .70…If important decisions are made with respect to specific test scores, a reliability of .90 is the bare minimum”— Is Peer Review Neutral?

“Prepared for Gregory Tassey, Ph.D. National Institute of Standards and Technology Acquisition and Assistance Division Building 101, Room A1000 Gaithersburg, MD 20899-0001”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“Conceived and designed the experiments: LB. ... Wrote the paper: LB HDD.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“Editors have described peer review as “indispensable for the progress of biomedical science.”1 They argue that peer review helps them distinguish between good and bad papers and between good and bad research, that it improves the presentation of what is being published, and even that it educates editors and authors.”— Evidence on peer review—scientific quality control or smokescreen?

“Robbie Fox, the great 20th century editor of the Lancet, who was no admirer of peer review, wondered whether anybody would notice if he were to swap the piles marked `publish' and `reject'. He also joked that the Lancet had a system of throwing a pile of papers down the stairs and publishing those that reached the bottom.”— Peer review: a flawed process at the heart of science and journals

“Stephen Lock when editor of the BMJ conducted a study in which he alone decided which of a consecutive series of papers submitted to the journal he would publish. He then let the papers go through the usual process of peer review. There was little difference between the papers he chose and those selected after the full process of peer review.”— Peer review: a flawed process at the heart of science and journals

“In this paper, I present how reliability can be used for study design, and introduce a new method for roughly approximating the reliability of an outcome measure for new samples based on the results of previous test-retest studies.”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“critics were claiming that papers coming out of Tessier-Lavigne’s lab contained manipulated and fraudulent data. And that it had been going on for years.”— Science Has a Major Fraud Problem

“Theo Baker received a tip about the school’s president, the neuroscientist Marc Tessier-Lavigne. Baker... had joined the staff of The Stanford Daily and was looking for a story he could dig into.”— Science Has a Major Fraud Problem

Organizations Involved

The National Institute of Standards and Technology shaped national conversations about software quality through its Program Office and commissioned a detailed economic analysis of testing shortfalls. The resulting report laid out how reliance on the waterfall model and commercial tools had created hidden costs across transportation, finance, and other sectors. It quantified losses from late-stage bug fixes, delayed market entry, and unreliable interoperability. The work highlighted how institutional assumptions had propped up inadequate infrastructure for years. ^[2]

Scientific journals across disciplines enforced peer review as the primary gatekeeper for publication and based decisions on rating scales that later analyses showed had low inter-rater reliability. PLOS ONE examined nearly 8,000 neuroscience submissions and found an IRR of only 0.193 when reviewers focused strictly on methodological quality. The pattern repeated across fields despite the institutional weight placed on these judgments. Journals continued to treat the process as essential even as evidence accumulated. ^[1][3]

The BMJ conducted its own experiments by inserting deliberate errors into manuscripts and sending them to reviewers. On average reviewers spotted only about a quarter of the major mistakes and nobody caught all of them. The journal also hosted international congresses that initially presented peer review as the gold standard while later publishing papers that documented its weaknesses. These efforts both reinforced and eventually questioned the assumption at scale. ^[5][7]

Stanford University maintained the public image of scientific integrity while one of its presidents oversaw a laboratory whose papers contained manipulated images and data that went undetected for years. The institution relied on the prestige of peer-reviewed output to uphold its reputation. When the problems surfaced through external scrutiny the university's initial response underscored how authority had substituted for verification. ^[10]

▶ Supporting Quotes (8)

“There also exists the IRR value of 0.193 [1] which was later found by a newer examination of 7,981 neuroscience manuscripts submitted for an initial round of review at the journal PLOS ONE.”— Is Peer Review Neutral?

“Prepared for National Institute of Standards and Technology Program Office Strategic Planning and Economic Analysis Group”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“Science rests on journal peer review. ... Quality control undertaken by peers in the traditional peer review of manuscripts for scientific journals is an essential part in most scientific disciplines to reach valid and reliable knowledge.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“The most recent insights into what, if anything, is achieved by peer review and how it might be improved were presented at the third such congress in Prague last autumn and were brought together in the July 15 issue of JAMA.”— Evidence on peer review—scientific quality control or smokescreen?

“Philosophical Transactions of the Royal Society is thought to be the first journal to formalize the peer review process in 1665.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“At the BMJ we did several studies where we inserted major errors into papers that we then sent to many reviewers. Nobody ever spotted all of the errors. Some reviewers did not spot any, and most reviewers spotted only about a quarter.”— Peer review: a flawed process at the heart of science and journals

“On a website called PubPeer, a forum for discussing scientific papers, critics were claiming that papers coming out of Tessier-Lavigne’s lab contained manipulated and fraudulent data.”— Science Has a Major Fraud Problem

“he arrived on the Stanford University campus in 2022 as a 17-year-old freshman, Theo Baker received a tip about the school’s president”— Science Has a Major Fraud Problem

The Foundation

For decades experts insisted that peer review reliably filtered scientific manuscripts for methodological quality through objective agreement among independent reviewers. They pointed to its long use in journals and funding agencies as proof that expert judgment produced consistent and trustworthy decisions. The process seemed sensible because senior scientists evaluated work in their fields and journals could reject obviously flawed submissions. A thoughtful observer in the late twentieth century would have seen the system as a reasonable safeguard against error especially after the postwar expansion of research made some form of gatekeeping necessary. The assumption carried a kernel of truth in that reviewers often agreed on obvious rejections yet that limited consensus was taken as evidence of broader reliability. ^[3][4][6]

Early studies reported low inter-rater reliability but these were often dismissed as limited or poorly structured. A meta-analysis later synthesized 48 studies involving 19,443 manuscripts and found a mean intraclass correlation of 0.34 for continuous measures and a mean Cohen's kappa of 0.17. Larger samples and more explicit rating instructions were associated with even lower reliability scores. The quantitative results confirmed what scattered findings had hinted at for years. ^[3][13]

Reviewers were believed to reach agreement based on shared expertise yet confirmation bias consistently favored papers that aligned with their own schools of thought. A meta-analysis of 51 experiments involving more than 18,000 participants showed an effect size of r = 0.245 for this tendency even in scientific judgments. The pattern suggested that personal and ideological preferences shaped evaluations more than objective criteria. ^[1]

The belief that blinding reviewers to author identity would improve objectivity seemed plausible based on early speculation and small studies. Randomised trials published in JAMA however found no significant improvement in review quality or detection of flaws. Similar claims about reviewer seniority or publication record predicting better reviews explained only about 8 percent of variance in outcomes. These findings chipped away at the idea that simple procedural tweaks could fix the core problem. ^[5]

▶ Supporting Quotes (25)

“In a meta-analysis [49] of studies estimating IRR, the mean estimate for the k = 44 papers using a continuous measure was only 0.34.”— Is Peer Review Neutral?

“In a meta-analysis of 51 experiments on partisan confirmation bias [13], the combined sample of >18,000 participants judged otherwise-identical information more favorably when they agreed with its conclusions. The was an r = +0.245 effect”— Is Peer Review Neutral?

“2.1 Historical Approach to Software Development................... 2-1”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“2.2.2 Commercial Software Testing Tools........................ 2-7”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“1.2 Software Quality Metrics ................................................... 1-6 1.2.1 What Makes a Good Metric ................................... 1-7 1.2.2 What Can be Measured ......................................... 1-8 1.2.3 Choosing Among Metrics....................................... 1-8”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“3.1.1 Integration and Interoperability Testing Issues......... 3-2”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“3.1.2 Automated Generation of Test Code....................... 3-3”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“3.1.3 Lack of a Rigorous Method for Determining When a Product Is Good Enough to Release........... 3-3”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“3.1.4 Lack of Readily Available Performance Metrics and Testing Procedures.......................................... 3-4”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2 = .34, mean Cohen's Kappa = .17) was low.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“The reliability of peer review of scientific documents and the evaluative criteria scientists use to judge the work of their peers are critically reexamined with special attention to the consistently low levels of reliability that have been reported.”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“Referees of grant proposals agree much more about what is unworthy of support than about what does have scientific value.”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“In the case of manuscript submissions this seems to depend on whether a discipline (or subfield) is general and diffuse (e.g., cross-disciplinary physics, general fields of medicine, cultural anthropology, social psychology) or specific and focused (e.g., nuclear physics, medical specialty areas, physical anthropology, and behavioral neuroscience).”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“Blinding reviewers to the author’s identity does not usefully improve the quality of reviews... In the most rigorous investigation of the question so far, published in the 15 July issue of JAMA, Van Rooyen et al were unable to confirm the effect of blinding on the quality of review.”— Evidence on peer review—scientific quality control or smokescreen?

“Others said that there might be special characteristics associated with high quality reviews, such as age, seniority, holding an academic post, or having published widely... Black et al found that the characteristics of reviewers, such as demographic factors, specialty, seniority, or academic appointments, had little association with the quality of the reviews they produced.”— Evidence on peer review—scientific quality control or smokescreen?

“Firstly, it acts as a filter to ensure that only high quality research is published, especially in reputable journals, by determining the validity, significance and originality of the study. Secondly, peer review is intended to improve the quality of manuscripts that are deemed suitable for publication.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“A scientific hypothesis or statement is generally not accepted by the academic community unless it has been published in a peer-reviewed journal. The Institute for Scientific Information (ISI) only considers journals that are peer-reviewed as candidates to receive Impact Factors.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“the level of agreement between reviewers on whether a paper should be published is little better than you'd expect by chance.”— Peer review: a flawed process at the heart of science and journals

“A systematic review of all the available evidence on peer review concluded that `the practice of peer review is based on faith in its effects, rather than on facts'.”— Peer review: a flawed process at the heart of science and journals

“Despite reporting of reliability in test-retest studies being common practice, when the reported reliability is low, little consideration is often given to these reliability estimates on the basis of insufficient inter-individual variability, i.e., it is assumed that there will be more variation in clinical comparison studies and that the reliability does not accurately reflect this.”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“Peer review is an integral part of science. Devised to ensure and enhance the quality of scientific work, it is a crucial step that influences the publication of papers, the provision of grants and, as a consequence, the career of scientists. In order to meet the challenges of this responsibility, a certain shared understanding of scientific quality seems necessary.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“Yet previous studies have shown that inter-rater reliability in peer reviews is relatively low. However, most of these studies did not take ill-structured measurement design of the data into account.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“For decades, scientists were above reproach.”— Science Has a Major Fraud Problem

“There is evidence that direct biases in classroom instruction, grading, and graduate-level admissions are either unobserved or at most small effects; however, mechanisms by which ideology might influence research have recently been demonstrated in controlled experiments.”— The ideological orientation of academic social science research 1960–2024

How It Spread

Academic norms spread the assumption through journal practices that assigned better reviewers to promising papers and allowed professional networks to influence ratings. Editors selected reviewers from within familiar circles which reinforced existing schools of thought and reduced the chance of genuine disagreement being treated as legitimate. The system rewarded conformity and penalized outliers in ways that were hard to measure at the time. ^[1]

Peer review gained traction after World War II as research output exploded and journals needed a standard way to manage submissions. It became embedded in assessments for academic posts grants and promotions with the Institute for Scientific Information tying journal impact factors to the process. Media outlets began treating peer-reviewed publication as a seal of credibility and placed such papers on front pages without further scrutiny. ^[6][7]

The assumption spread across disciplines from physics to psychology even though reliability varied sharply by field. In diffuse areas such as social psychology agreement was especially low yet the same procedures were applied uniformly. Funding agencies and conference organizers adopted the same model for grant decisions and paper selections which magnified its reach. ^[4][9]

Institutional trust and media reverence helped maintain the belief that peer-reviewed work from elite laboratories could be accepted at face value. When fraud later surfaced in high-profile cases it became clear that the system had relied heavily on the untested assumption of author honesty. ^[10]

▶ Supporting Quotes (11)

“journal editors apparently assign more-competent reviewers to more-promising manuscripts”— Is Peer Review Neutral?

“Figure 2-1 Waterfall Model......................................................................... 2-3”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“As stated in a British Academy report, “the essential principle of peer review is simple to state: it is that judgements about the worth or value of a piece of research should be made by those with demonstrated competence to make such a judgement … With publications, an author submits a paper to a journal … and peers are asked to offer a judgement as to whether it should be published. A decision is then taken, in the light of peer review, on publication”.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“In the former there is also much more agreement on rejection than acceptance, but in the latter both the wide differential in manuscript rejection rates and the high correlation between referee recommendations and editorial decisions suggests that reviewers and editors agree more on acceptance than on rejection.”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“Peer review—the process by which experts advise editors on the value of scientific manuscripts submitted for publication—is traditionally surrounded by an almost religious mystique. Published papers are an important part of most assessment systems that decide how academic posts and research grants are distributed.”— Evidence on peer review—scientific quality control or smokescreen?

“Peer review in the systematized and institutionalized form has developed immensely since the Second World War, at least partly due to the large increase in scientific research during this period. Peer review is now standard practice by most credible scientific journals.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“`But,' the news editor wanted to know, `was this paper peer reviewed?'. The implication was that if it had been it was good enough for the front page and if it had not been it was not.”— Peer review: a flawed process at the heart of science and journals

“In PET imaging, it is usually only young, healthy men who are recruited for test-retest studies.”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“Peer review is an integral part of science. Devised to ensure and enhance the quality of scientific work, it is a crucial step that influences the publication of papers, the provision of grants and, as a consequence, the career of scientists.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“For decades, scientists were above reproach. Not any more.”— Science Has a Major Fraud Problem

“characterizing the political content of hundreds of thousands of academic texts, especially in a consistent and fine-grained way, has historically been prohibitively expensive and time-intensive when relying on human coders.”— The ideological orientation of academic social science research 1960–2024

Resulting Policies

Journals required peer review for all publication decisions and used numerical quality scales despite evidence that inter-rater reliability fell well below thresholds needed for high-stakes individual judgments. These policies controlled for factors such as h-index and coauthor networks yet still revealed systematic biases related to professional proximity. The result was inconsistent outcomes for similar submissions and distorted incentives across research careers. ^[1][3]

Funding agencies and universities built assessment systems around peer-reviewed publications treating them as the primary proof of quality for promotions and grant distribution. This created a circular reliance in which the low-reliability process determined who received resources and who advanced. The policies were enacted with the expectation that expert consensus would ensure fairness. ^[5]

Conference organizers enforced the same review model for selecting submissions rating papers on multiple dimensions such as relevance and soundness. Acceptance decisions rested on these ratings even though multidimensional models showed only modest improvements over simpler approaches. The practice extended the assumption into new arenas without addressing its documented weaknesses. ^[9]

▶ Supporting Quotes (9)

“reviewers rated manuscript quality on a 1-4 scale”— Is Peer Review Neutral?

“National Institute of Standards and Technology Acquisition and Assistance Division”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“A decision is then taken, in the light of peer review, on publication.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“manuscript submissions this seems to depend on whether a discipline (or subfield) is general and diffuse”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“Published papers are an important part of most assessment systems that decide how academic posts and research grants are distributed. Peer review confers legitimacy not only on scientific journals and the papers they publish but on the people who publish them.”— Evidence on peer review—scientific quality control or smokescreen?

“The role of the editor is to select the most appropriate manuscripts for the journal, and to implement and monitor the peer review process.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“It is the method by which grants are allocated, papers published, academics promoted, and Nobel prizes won.”— Peer review: a flawed process at the heart of science and journals

“In contrast, there has been comparatively little consideration given to the reliability of the measures to be used in the study”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“A total of 443 reviews were analyzed. These reviews were provided by m = 130 reviewers for n = 145 submissions to an interdisciplinary conference.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

Harm Caused

Low inter-rater reliability below 0.34 fell short of standards used for individual decisions in other fields such as special education placements. This unreliability distorted research agendas by allowing some work to advance while blocking other valid efforts on essentially random grounds. Funding and career trajectories were affected for thousands of researchers over decades. ^[1]

Poor software testing infrastructure led to direct economic losses estimated in the tens of billions of dollars. End users experienced downtime and had to perform rework while developers spent more time fixing bugs in later stages than they would have under better practices. Time to market increased and competitive advantages were lost. ^[2]

Peer review wasted substantial time and money with the BMJ estimating costs of 100 to 1000 pounds per paper and many journals taking over a year to reach decisions. Academics spent hours on reviews that could have gone into their own research. Biases related to nationality gender specialty and positive results further skewed who got published and who advanced. ^[5][7]

Fraud and errors slipped through the system because peer review was never designed to detect deliberate misconduct and relied on the assumption of author honesty. High-profile cases damaged the careers of honest scientists who felt pressure to match the output of fraudulent labs. Resources were wasted on follow-up studies that rested on flawed foundations. ^[10][12]

▶ Supporting Quotes (18)

“If an IQ test with reliability like this was used to put children in special ed classes, there would be uproar. Frankly, any journal with reliability so low does not have any business being a gatekeeper of anything.”— Is Peer Review Neutral?

“In a sample [51] of 361 university professors... 73% encountered false criticisms (and 8% made changes in the article to conform to reviewers' comments they knew to be wrong)”— Is Peer Review Neutral?

“1.4.1 Failures due to Poor Quality................................. 1-11”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“1.4.2 Increased Software Development Costs................ 1-12”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“1.4.3 Increased Time to Market..................................... 1-12”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“1.4.4 Increased Market Transaction Costs...................... 1-13”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“According to Marsh, Bond, and Jayasinghe, the most important weakness of the peer review process is that the ratings given to the same submission by different reviewers typically differ. This results in a lack of inter-rater reliability (IRR).”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“quality control reliability”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“But is this belief more than just wishful thinking and self aggrandisement by editors and other beneficiaries of the peer review system? The question is all the more relevant because peer review is so time consuming, complex, expensive, and prone to abuse.”— Evidence on peer review—scientific quality control or smokescreen?

“Other contributors to JAMA’s issue on peer review illustrate the worrying number of biases by which peer review is beset, including nationality bias, language bias, specialty bias, and perhaps even gender bias, as well as the recognised bias toward the publication of positive results.”— Evidence on peer review—scientific quality control or smokescreen?

“Despite its wide-spread use by most journals, the peer review process has also been widely criticised due to the slowness of the process to publish new findings and due to perceived bias by the editors and/or reviewers... critics argue that the peer review process stifles innovation in experimentation, and acts as a poor screen against plagiarism.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“Unfortunately, the recent explosion in online only/electronic journals has led to mass publication of a large number of scientific articles with little or no peer review. This poses significant risk to advances in scientific knowledge and its future potential.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“Many journals, even in the age of the internet, take more than a year to review and publish a paper... I estimate that the average cost of peer review per paper for the BMJ... was of the order of £100, whereas the cost of a paper that made it right though the system was closer to £1000.”— Peer review: a flawed process at the heart of science and journals

“Peer review sometimes picks up fraud by chance, but generally it is not a reliable method for detecting fraud because it works on trust.”— Peer review: a flawed process at the heart of science and journals

“This is costly both in time and resources for researchers, and leads to the needless exposure of participants to radiation in PET research.”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“Our study also shows that the citation rate of accepted papers was positively associated with the relevance ratings made by reviewers from the same discipline as the paper they were reviewing. In addition, high novelty ratings from same-discipline reviewers were negatively associated with citation rate.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“For scientists genuinely trying to make world-changing discoveries, their careers can be hurt by insisting on doing honest and honorable work.”— Science Has a Major Fraud Problem

“Recent randomized experiments have been used to delineate a causal relationship between faculty ideology and academic research... research teams composed of pro-immigration researchers estimated more positive impacts of immigration on public support for social programs.”— The ideological orientation of academic social science research 1960–2024

Downfall

A manuscript-level analysis of data from PLOS ONE revealed that each degree of professional separation between reviewer and author decreased ratings by 0.107 points on average. This exposed network-based bias operating alongside or instead of quality judgments. Meta-regression models explained more than 86 percent of the variance in reliability scores suggesting that no journal had achieved consistently reliable review. ^[1]

The large meta-analysis of 70 coefficients from 48 studies quantified the low reliability once and for all and identified sample size and rating instructions as key determinants. Earlier narrative reviews had pointed in the same direction but lacked the statistical weight to shift opinion. The results made it harder to dismiss the problem as isolated or methodological. ^[3]

Randomised trials published in JAMA tested whether blinding or other procedural changes improved outcomes and found no significant effects. Analyses of reviewer characteristics such as age or statistical training showed only weak associations with review quality. These studies undermined the hope that minor reforms could salvage the assumption. ^[5]

The Sokal affair demonstrated how ideological alignment could override methodological scrutiny when a hoax paper was accepted by a journal without consulting experts in the relevant field. PubPeer and investigative journalism later exposed manipulated data in prominent laboratories that had passed peer review for years. Growing evidence suggests the assumption is flawed though debate continues about whether the process can be salvaged or requires more fundamental change. ^[10][14]

▶ Supporting Quotes (16)

“each—degree of separation that reviewers had with authors—decreased reviewers’ ratings of the authors’ manuscripts by .107 points on the 1-4 point scale.”— Is Peer Review Neutral?

“the meta-analysis [49] also finds that when an objective measure of study quality/transparency (i.e. whether an IRR study reported what kind of IRR it was measuring) is paired with sample sizes, the two moderators explained 86.6% of between-study variance”— Is Peer Review Neutral?

“3.2 Conceptual Economic Model............................................. 3-6”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“8. National Impact Estimates ................................................. 8-5”— The Economic Impacts of Inadequate Infrastructure for Software Testing

“In this study, we test whether the result of the narrative techniques used in the reviews – that there is a generally low level of IRR in peer reviews – can be confirmed using the quantitative technique of meta-analysis.”— A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

“Several suggestions are made for improving the reliability and quality of peer review. Further research is needed, especially in the physical sciences.”— The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

“McNutt et al were the first to use a randomised controlled trial... Van Rooyen et al were unable to confirm the effect of blinding on the quality of review. They randomised 527 consecutive manuscripts submitted to the BMJ... But when review quality was measured with a validated instrument, neither blinding nor masking was found to make an important editorial difference.”— Evidence on peer review—scientific quality control or smokescreen?

“In a separate analysis of data from the same trial, Black et al found that the characteristics of reviewers... had little association with the quality of the reviews they produced, explaining only 8% of review quality. A logistic regression analysis found that training in epidemiology and statistics, and younger age, were the only characteristics significantly associated with higher quality ratings.”— Evidence on peer review—scientific quality control or smokescreen?

“Despite its downfalls, there has not yet been a foolproof system developed to take the place of peer review, however, researchers have been looking into electronic means of improving the peer review process.”— Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

“They took 12 studies that came from prestigious institutions that had already been published in psychology journals... In only three cases did the journals realize that they had already published the paper, and eight of the remaining nine were rejected”— Peer review: a flawed process at the heart of science and journals

“This paper is also accompanied by an R package called relfeas (http://www.github.com/mathesong/relfeas), with which all the calculations presented can easily be applied.”— We need to talk about reliability: making better use of test-retest studies for study design and interpretation

“Inter-rater reliability was rather poor and there were no significant differences between evaluations from reviewers of the same scientific discipline as the papers they were reviewing versus reviewer evaluations of papers from disciplines other than their own.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“convergent and discriminant construct validity of the rating dimensions were low as well. Nevertheless, a multidimensional model yielded a better fit than a unidimensional model.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“no prior (quantitative) study has analyzed inter-rater reliability in an interdisciplinary field... Our findings demonstrate the urgent need for improvement of scientific peer review.”— Inter-rater reliability and validity of peer reviews in an interdisciplinary field

“Joe Nocera investigates the murky world of fraudulent research, and the sleuths exposing dishonest science.”— Science Has a Major Fraud Problem

Sources

[1]
Is Peer Review Neutral?opinion
Norman Angleton · werkat.substack.com · 2024-12-25
[2]
The Economic Impacts of Inadequate Infrastructure for Software Testingprimary_source
RTI · National Institute of Standards and Technology · 2002-05
[3]
A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinantspeer_reviewed
Lutz Bornmann, Rüdiger Mutz, Hans-Dieter Daniel · PLoS One · 2010-12-14
[4]
The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigationpeer_reviewed
D. V. Cicchetti · Behavioral and Brain Sciences · 1991-03
[5]
Evidence on peer review—scientific quality control or smokescreen?peer_reviewed
Sandra Goldbeck-Wood · BMJ · 1999-01-02
[6]
Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guidepeer_reviewed
Jacalyn Kelly, Tara Sadeghieh, Khosrow Adeli · EJIFCC · 2014-10-24
[7]
Peer review: a flawed process at the heart of science and journalspeer_reviewed
Richard Smith · J R Soc Med · 2006-04-01
[8]
We need to talk about reliability: making better use of test-retest studies for study design and interpretationpeer_reviewed
Granville J Matheson · PeerJ · 2019-05-24
[9]
Inter-rater reliability and validity of peer reviews in an interdisciplinary fieldpeer_reviewed
Scientometrics
[10]
Science Has a Major Fraud Problemreputable_journalism
Joe Nocera · The Free Press · 03.10.26
[11]
The ideological orientation of academic social science research 1960–2024peer_reviewed
Theory and Society · 2026-03-16
[12]
When Peer Review Fails: The Challenges of Detecting Fraudulent Sciencereputable_journalism
Enago Academy · 2024
[13]
A Reliability-Generalization Study of Journal Peer Reviewspeer_reviewed
PLOS One · 2010
[14]
Sokal affairreputable_journalism
Wikipedia · 2024

Related False Assumptions

Affirmative Action Causes No Reverse DiscriminationAcademia Economy Public Policy Publishing Technology
Black on White Crime Not a Major IssueAcademia Economy Public Policy Publishing Technology
Lab Studies Predict Real BehaviorAcademia Economy Public Policy Science Social Science
Race-IQ Inquiry Must Be SilencedAcademia Economy Public Policy Science Technology
Airport Profiling is Racial DiscriminationAcademia Economy Public Policy Technology

ID: FA-20260303-1131e38d · Created: 2026-03-03T08:23:06.317660 · Schema v3