Editor’s Note: This piece is part of an ongoing series of articles by Professor Bruce Gilley. To read the other articles in the series, click here.
Let’s start with the obvious. The plague of junk citations in modern academic research will not be curbed by digital or bureaucratic means. For every clever new software tool designed to detect anomalies—including self-citations, irrelevant citations, incorrect citations, sister-journal citations, or citations to scholars from politically correct groups—new digital workarounds will appear. The “citation delivery vehicle” mentioned in the last installment of this series is just one example.
Similarly, for every virtuous declaration of a passionate commitment to scientific integrity, journals will still rely on self-policing and editorial motivation. For example, the editors of Proceedings of the Royal Society A (Mathematical, Physical, and Engineering Sciences) promised in a 2020 editorial on citation malpractice to “revise advice to authors and to referees to remind them of the importance of adherence to high ethical standards.” But it is the absence of high ethical standards, or even a shared idea of what those entail, that got us into this mess. It is like reminding a serial killer of the importance of the sixth commandment.
To be sure, some simple rules can cut down on junk citations: each and every cited work needs to be discussed and explained if it is used as part of an argument; peer reviewers may not under any circumstances “suggest” additional citations; any self-citations must be absolutely essential; and a padding of citations to the work of scholars who are black or indigenous or women or who belong to the latest fad group is automatic grounds to dismiss the merits of a piece of research.
[Related: “How Junk Citations Have Discredited the Academy: Part 4”]
However, a deeper transformation is needed to solve the problem. In its papal bull on citation malpractice, Proceedings of the Royal Society A noted that “a large and growing factor in how the world makes decisions is the Dunning–Kruger effect,” in which scholars “seek to cherry-pick the science and dismiss science that is inconvenient to them.” Junk citations arose, they noted, because too many scholars are captured by the ideological premises of their work. This points both to the nature of the problem and to the obvious solution.
The Dunning–Kruger effect refers to a study published in 1999 by two (middle-aged, white, male, heterosexual, gender-conforming) scholars, David Dunning and Justin Kruger. In four laboratory surveys, groups of between 45 and 140 Cornell undergraduates were asked to evaluate their own abilities with respect to humor, logical reasoning, and English grammar. The famous conclusion of the study was that “incompetent individuals, compared with their more competent peers, will dramatically overestimate their ability and performance relative to objective criteria.”
Their study has been widely cited (incorrectly) to justify the tyranny of experts. But, if you read it, they note in the conclusion that “competence is not wholly dependent on knowledge or wisdom” within a given field. Indeed, “knowledge about the domain does not necessarily translate into competence” at all.
Obviously, an expert on Rembrandt cannot paint like Rembrandt. But there is a second disconnect between knowledge and competence: an expert on Rembrandt is not competent to decide whether her interpretation of his work is correct. Subject expertise does not confer scientific competence. Indeed, the more “expert” one becomes, the more wedded one is to one’s ideological frame. The only way to ensure the competence of the scientific endeavor is to have a multitude of competing scholars with deep disagreements on fundamental ideological or paradigmatic premises. Only then will scholars be forced to read and cite the work of others correctly, as well as challenge the citations used by others. An intellectual monoculture spawns bad science, and junk citations are merely one consequence.
[Related: “How Junk Citations Have Discredited the Academy: Part 3”]
I began this series by recalling a grant applicant who had lazily cited a research paper without telling the reader anything about the methods and specific findings of the paper, a classic junk citation. The reason for this junk citation was simple: the applicant shared the ideological premises of the paper’s author and had no incentive to actually read it, much less double-check the validity of the findings.
Similarly, in a recent presentation I made on a new paper making the case against pay equity policies, I noted that about 80% of research on the topic is carried out by female academics who already believe that gender pay differences are caused by sexism and that pay equity policies are a really great idea. Yet their evidence is weak to none, and their papers are chock full of citations to each other’s “well-known” findings. The near-total absence of critical debate on this central labor policy has inflated an epistemic bubble that could soar into the stratosphere.
That, in a nutshell, points the way to reform. Scientific progress depends on a contest of ideas, a contest of scholars with diametrically opposing views who have the incentive to scrutinize the work of others, including any citations they use to bolster their case. Absent a pluralism of viewpoints, or paradigms, science is impossible. Junk citations, from this perspective, are simply the rust on the ship, the encrustation of barnacles and seaweed that grows because the hull is not properly coated.
It’s time to drydock the Good Ship Academia and refit it with a thick coating of intellectual pluralism.
Image: Adobe Stock
7 thoughts on “How Junk Citations Have Discredited the Academy: Part 5”
My own feeling is the solution to this problem is emerging “organically”. I genuinely no longer expect to find the academic literature in my discipline interesting — I’m pleasantly surprised when it is, but I no longer *expect* it to be. That doesn’t mean I’ve taken up quilting instead. There is a tremendous amount of fascinating, insightful social scientific analysis — including anthropological analysis — available to read on Substack and elsewhere. I just read that stuff and am currently working on a book ms. much more influenced by the recent para-academic literature than the recent academic literature.
I saw it on Twitter, so I can’t credit it properly, but someone made a nice point about power and authority sometimes aligning and sometimes being orthogonal. The kind of self-dealing citation bafflegabbery so well analyzed in this series currently commands a lot of institutional power. It is absolutely hemorrhaging authority. When it was just a smug steady trickle, I was quite pessimistic about the situation but the more firehosey it becomes the cheerier I feel about the likelihood that I’ll get to see what comes next, that it will emerge within my professional lifetime.
Kathleen, This is so interesting. Please write this comment for MTC! Bruce
Thanks George and Dr. Ed. George you hit on a very important point of hyper-specialization, which means people rely on citations more AND they in turn are no position to judge those citations. Dr. Ed Yes, your problem of even non-junk citations may be to junk research is a newer one that NAS is tackling with its work on replicability.
Any padding of citations by adding black, indigenous or women authors, just because they are black, indigenous or women, should be grounds for immediate rejection of paper. Citations should add value to a paper’s content; padding is nothing more than virtue signaling.
Siloed science creates the same problems. When getting tenure requires that you establish yourself as THE expert on a topic, the result is to push individuals to specialize so tightly on a particular topic that no one else is focusing on, which leads to a lack of competing experts on the same topic – as well as discouraging true interdisciplinary work across paradigms. In part, I believe that far too many academics have real self-esteem issues, and specializing to the point that no one else can claim to have expertise over their work helps insulate them from criticism that goes right to the heart of their identity as a “smart” person.
I have a B.S. in chemistry with essentially a history minor and two interdisciplinary graduate degrees (Master’s of Public Administration (management/political science) and a PhD in Health Services Research, Policy, and Administration (basically health economics, medical sociology, epidemiology, and a heavy dose of statistics). My academic research was broadly focused on how political institutions and health programs interact, and looked at different institutional structures and programs – and in a bi-directional manner. Intellectually, the interdisciplinary background meant I wasn’t trapped by the paradigm of a particular discipline, and I could explore problems from different perspectives. It was quite productive, and the intellectual value can be seen in that my work is cited in literature across a broad range of fields – public health, political science, economics, management, engineering, etc. – and I had collaborations with professionals in a range of fields such as the sciences, sociology, engineering, nursing, etc. Unfortunately, the reaction of fellow faculty members and the department chair did not reflect that – I was told I “needed to focus” and do things :”with more impact” – my research had me on federal and state advisory boards, consulting with DoD on medical civic action aspects of civil affairs and counterinsurgency doctrine, testifying to government boards, preparing guidance for the state department of health, etc., but doing work that had application to the real world was REJECTED, and the example cited to me as an ideal was a colleague – and a damned good researcher I personally admire a great deal – who focused on the neuropsychology of timing mechanisms for drawing circles and tapping fingers. In particular, questioning assumptions in my field of public health by drawing lessons from public administration and institutional economics came under criticism, as if the field was somehow exempt from the same social forces as, say, economic development policy. Our program was housed in a kinesiology department, and doing such things as pointing out to a class that the epidemiological evidence that individuals in the BMI category labelled “overweight” had lower all-cause morbidity and mortality than those in the “normal” category challenged the raison d’etre that many on the kinesiology side used to justify their field – made worse by the fact that one of our tenured public health faculty members had published some of the early research showing that it was true.
I once read a statement that “an expert can speak authoritatively on a specific topic, but an intellectual can do the same on many.” The pressures to overspecialize create a lot of experts, but not intellectuals. When you are discouraged from thinking broadly, you do not create an intellectual workspace, but one where narrow-mindedness is rewarded.
I’m also reminded of the problems that have been demonstrated in attempts to duplicate research — and the inability to get similar results. This was first demonstrated in the field of Psychology and concerns have since drifted into other fields as well — and it needs to be remembered that this was all peer-reviewed research that was published in reputable journals.
Hence I have to ask how many of the legitimate citations are actually meaningless because they reference back to research that can not be duplicated. Smith and Jones may have published a peer-reviewed study finding that lit matches can be extinguished in a pail of gasoline, and the citation may be truly impressive, published in the “best” journal and the rest. But if I can’t duplicate their results — if I get a conflagration every time *I* drop a lit match into a pail of gasoline — what good is their citation?
My point is that the situation is far worse than the author may realize — if the citations refer back to flawed research, then the citations themselves are flawed as well. And then what?
Now, I used the example of a lit match in a pail of gasoline for a reason — if it’s more than 30 degrees below zero, as it sometimes is in Northern Maine, it is entirely possible to drop a lit match into a pail of gasoline without results. I’ve done it, the match simply goes out as if it were dropped into a pail of water — the vapor pressure is so low that there isn’t enough vapor to ignite. And the next few matches will go out as well, but they will boil off enough vapor in the process that another match will catch and that will quickly produce enough heat for the entire pail to ignite. (NB: This isn’t quite as dangerous as it sounds because the metal container (and rising heat) restricts fresh air and hence this is a very smoky fire which is controlled by it’s limited access to oxygen and the rapid cooling of the rising column of combustible gasses which cool into smoke.)
My point is that a lot of research is cited beyond what the actual findings were — yes, I did extinguish lit matches in a pail of gasoline, but gasoline which had been outdoors all night and it was -30 F that morning. The research technically was legitimate and technically could be duplicated — if someone went up to Aroostook County, Maine in January — but likely not if done anywhere else not quite so darn cold.
I’ve seen a lot of this kind of research in the education field — published research where I wanted to say “yea, but…”, much like one would have said to my gasoline example. And this research is routinely cited and recited and I think we have a problem far worse than mere junk citations….
Now as to a solution — I have no idea…
And THAT gets into the problem of peer review!
I just finished a peer review of an manuscript for a fairly significant health services research journal, which I recommended rejecting because, while the data was consistent with the hypothesized effect, the study was on the level of an undergrad research project in terms of sophistication – it completely failed to consider potential unmeasured confounding variables that could be responsible for both the adoption of the studied intervention AND the outcome variables measured. I recommended to the authors that it be considered for submission as a poster at a conference while they went back and rethought the study design.
I am a tough reviewer. I just looked at my review history and found that of the last 10 peer reviews I did for the journal, one was accepted, one requested a minor revision to clarify some statements, two requested major revisions amounting to almost completely redoing the statistical analysis, and the remaining six I recommended for outright rejection.
However, with increasing demands for publications for academic tenure AND a proliferation of journals requiring articles for content, that level of discrimination is eroding. If you add on many graduate degree programs that lack a component of acculturation towards intellectual rigor in analysis, you see a reason for the proliferation of non-reproducible publications. I was fortunate that my doctoral training was in the Health Services Research program at Minnesota, where at the time people like Todd Rockwood, Bryan Dowd, Roger Feldman, Jon Christianson, Doug Wholey, Beth Virnig, and others emphasized the need for a rigorous and skeptical approach to research – although it did cost me a job opportunity as research director for the surgeon general of a southern state who stated that my approach was “too rigorous” – I would argue that is not possible if you are trying to do real science and make real intellectual contributions. I went to the point of specifically noting “indeterminate” results in regression models in my dissertation work based on a test of the statistical *power* of the models. On the other hand, when I taught at Minnesota Duluth, I had the Director of Graduate Studies in the Special Education Department send her students to me for statistics and research design courses because she had grown disgusted with the poor quality of research being produced in her field. When I moved to Purdue, I had comments from other faculty that some of my MPH students were producing better research work in their theses than a lot of PhD students in the department (we were housed in the kinesiology and PE department) had been producing. The big difference was the seven courses we had to take at Minnesota in statistics and research design, versus the single class many of the Purdue students took.
The combination of a decline in the intellectual rigor involved in research design and data analysis with journals hungry for content is that authors submit weaker work, reviewers are less socialized to demand rigorous work, and journals are more likely to accept weaker work – which leads to a reproducibility crisis. Although it primarily at this point affects clinical and social science research, it has spun over into the physical sciences and engineering. When I arrived at Purdue in 2005, for example, a professor in the Nuclear Engineering department was embroiled in a scandal over non-reproducible results for a fusion study and allegations of research misconduct. I published a paper in Emerging Infectious Diseases in 2005 pointing out that the finding claims in a study of outcomes of E. Coli infections were unreliable because the authors had failed to address obvious problems of endogenous predictor variables in their regression design – and the authors completely missed the point in their response, which focused on biological issues irrelevant to the statistical analysis.
That isn’t to say that rigorous studies can’t produce different results. Different designs and samples can lead to different results, and the acceptance or rejection of a theory depends on the accumulation of a body of evidence providing confirmation more than any single study. Often, different results from different studies create questions that enrich our understanding of phenomena, such as the questioning of why black patients did not respond as well to ACE inhibitors and beta blockers as the effect sizes in the original RCTs predicted led to the realization of genetic differences in the phamacokinetics of the drugs, differences lost in the original clinical trials due to the well-known low levels of participation by black patients in such trials. THAT discovery led to reformulation of dosages for the subpopulation and improved cardiac outcomes.
However, for that to occur, published research STILL needs to be rigorous enough that the readers can have confidence that, in that study, the results were real, and that differences between study findings are due to differences in study design. THAT won’t happen unless academic researchers adopt rigorous standards, socialize their graduate students in them, and during peer review adopt a skeptical and demanding attitude towards the research that they review.