Editor’s Note: This article is part of an ongoing symposium on white fragility and its related concepts. To view all of the essays in this series, click here.
In Part I of this series, the introduction in 1998 of the race Implicit Association Test (IAT) — developed originally by Professor Anthony G. Greenwald and his colleagues at the University of Washington — was described. Since then, the IAT has generated a number of follow-up studies and widespread commentary on its role in understanding persons’ social attitudes toward race — and whether the test has important practical applications for use in a wide variety of environments. This has culminated in the IAT’s extensive promotion through Harvard’s Project Implicit website.
In Part I, an argument was made that a systematic evaluation of the race IAT and its claims must be organized under four categories of important questions. In Part II which follows, the first three of these questions are discussed.
Category #1: What Does the Race IAT Measure?
In the conclusion of their original article, Greenwald et al. were careful to avoid use of the word ‘unconscious’ in describing what the race IAT measures. Instead, they describe the test as ‘resist[ing] self-presentational forces that can mask personally or socially undesirable evaluative associations.’ Many race IAT descriptions use the more bland, safe, and neutral language of‘preference for Whites’ (rather than ‘racism’) to reflect ‘politically incorrect’ results.
Ironically, activist supporters of the IAT — as well as its fiercest critics — are prone to use more inflammatory language to describe what the test presumably measures. For example, the race IAT is described as ‘measuring how implicitly biased people are’ and as revealing the ‘unconscious levels’ and/or ‘unconscious roots’ of racial prejudice. Even critics of the race IAT talk about it as if it is a ‘measure of racism.’
IAT supporters are often surprised about the extent to which persons who hold ‘egalitarian attitudes,’ or those who are members of ‘minority/victim’ groups, achieve IAT results that are radically different than expected. That is, some black subjects display stronger preferences for whites, and gay subjects may express stronger “gay = bad” associations. This is accompanied by a combination of shock, shame, and embarrassment by those who are genuinely surprised by these findings. In fact, before potential test-takers proceed further into the Project Implicit website, they are instructed to agree to a mini-disclaimer which states: ‘I am aware of the possibility of encountering interpretations of my IAT test performance with which I may not agree.’
Part of the popularity of IAT theory is that it seems to reinforce repeated assertions by anti-racism activists that racism/prejudice is an omnipresent force that lies buried within the subconscious thinking of whites. One writer defines implicit bias as occurring whenever someone consciously rejects stereotypes and supports anti-discrimination efforts, but also holds negative associations in his mind unconsciously. According to this narrative, therefore, it is the task of those with more highly developed sensitivities about racial matters to educate the masses about the racial prejudice that lies beyond their conscious awareness.
The notion that the race IAT functions as a ‘portal into the unconscious’ has been challenged on both philosophical and empirical grounds. According to Doyle (2018):
Its invisibility is what makes it impossible to rebut. When someone accuses you of something that you yourself cannot see, in the eyes of your accuser, you cannot refute it. If I accused you of wearing a shirt that was stained by invisible jam, how could you tell me it wasn’t so? Its invisibility makes it so that you cannot see it, and implicit in my accusation is the assertion that I, for whatever reason, can see it. So it goes with racism and white privilege. When someone would accuse me of either, I would deny it. My opponent would then note that racists with white privilege do not believe they have it, and that not believing you have white privilege is a fundamental part of having white privilege. Said privilege also blinds one to their own racism. The tautological reasoning is airtight.
It may indeed surprise many readers that the phrase ‘unconscious racism’ is not clearly understood — even by those who believe that they know what this phrase means. Fortunately, researchers Blanton and Jaccard offer alternative interpretations as to what the phrase ‘unconscious racism’ could possibly mean. One alternative interpretation observes that human beings often lack awareness of the effects that their own actions have on other people. Thus, a 13-year-old boy who asks about the age of a middle-aged woman may not possess the social awareness (that comes from maturity) which would inform him that such a question may be perceived as offensive to women of a certain age. This reflects the central idea behind microaggressions theory, which holds that whites’ seemingly innocuous statements may be interpreted as hurtful to another person belonging to a different racial/ethnic group. Here, the issue is not that the hapless white person is consciously unaware of his presumed racism. Instead, the issue is that both actors have different socialization histories and value systems. The white actor possesses a value system (shaped by previous experiences) in which certain statements are genuinely considered to be harmless. In contrast, the non-white actor may possess a different value system (again, shaped by previous experiences) in which the same statement is considered hurtful. This is a simple matter of one party being ignorant of the other party’s value system — and is not necessarily reflective of hidden racial animus that lies beyond a person’s conscious awareness.
Second, individuals may be consciously unaware of the multitude of factors that influence their perceptions/attitudes on racial matters. These factors may involve events that were recently experienced, or the accumulation of events that have been regularly experienced over time. To illustrate, a person with vocational interests in the culinary field who has been regularly exposed to working in high-end restaurants — dominated by men as head cooks — would hold different perceptions than a person whose background with skilled cooking exclusively involved women as role models. Here, a person’s perceptions about gender and culinary skill are not due to some deep-seated gender bias for or against women in general. Instead, these perceptions can be simply attributed to idiosyncratic experiential histories about which the person may not be consciously aware.
Supporters of the ‘unconscious racism’ thesis essentially argue that negative associations held by whites about non-whites, as revealed by the race IAT: (a) are not consciously accessible to whites in non-IAT experimental conditions, (b) operate in the real world outside of a person’s individual awareness, and (c) take the form of clear racial discrimination, behavior which can be clearly documented as having negative or harmful effects on non-white individuals. Blanton and Jaccard essentially argue that no social-psychological research (up to the date of their article’s publication) comes close to supporting any of these assertions.
While Blanton and Jaccard offer alternative explanations as to the meaning of ‘unconscious,’ other researchers challenge the central thesis that people are indeed unconscious of their racial perceptions. On this front, Adam Hahn and his colleagues have empirically challenged race IAT interpretations through their research. In a series of laboratory studies, they directly asked participants to predict their results on IAT measures featuring five different social groups (Blacks, Latinos, Asians, Celebrities, and Children). They found that participants were ‘surprisingly accurate’ in their conscious predictions, suggesting that the predictions reflected unique and conscious insight into their own implicit responses — beyond intuitions about how people in general may respond. They interpreted their findings as casting doubt on the belief that IAT results reflect unconscious attitudes.
Other researchers concede that the race IAT measures ‘something,’ but that ‘something’ is not racism. A provocative paper by Arkes and Tetlock has as part of its title: “Would Jesse Jackson ‘Fail’ the Implicit Association Test?” This is because Reverend Jackson has made public statements that display pro-white and anti-black sentiments, yet his life’s work arguably makes him the last person to be charged with racial prejudice against blacks.
In their paper, Arkes and Tetlock suggest three alternative explanations for what the race IAT could measure. These are: (1) the race IAT measures easy familiarity with cultural/racial stereotypes to which subjects have been exposed, but with which they do not agree. They cite as an example Greenwald et al.’s original study, which found that Korean and Japanese participants held negative views of the other group, but that these results were stronger for subjects who were more steeped in Asian culture (thereby being more aware of each group’s mutual stereotypes of one other); (2) since large multiethnic and multiracial societies display long-standing, empirically validated, and easily observable group differences in a variety of important social outcomes (e.g., average crime rates, sexual mores and behaviors, cognitive test scores, levels of educational attainment and levels of economic accomplishments) — then a subject’s simple awareness of these average differences when displayed on the race IAT (and absent personally hostile emotions or the personal propensity to apply rigid stereotypes) cannot be labeled as ‘prejudiced’ in the commonly accepted understanding of this term; and (3) race IAT results have been experimentally shown to differ as a function of the different contexts in which subjects perceive persons of different races and ethnicities. That is to say, when subjects are exposed to depictions of minorities in contexts that promote negative characteristics (as opposed to depictions of minorities in contexts that promote positive characteristics), reductions in reaction time (RT) responses that would support a ‘prejudice against minorities’ interpretation occurred. These findings suggest that race IAT results reflect susceptibility to situational/contextual variations rather than stable personal dispositions. For other alternative scientific interpretations as to what race IAT experiments measure, readers are encouraged to consult a review by Althea Nagai.
Category #2: Does the Race IAT Measure Something Reliably?
Any psychological measurement must demonstrate minimum standards of reliability if the construct measured is to be taken seriously in theory-building and theory-testing. One type of reliability is the rank-ordering stability of individuals when scores are assessed on multiple occasions over time. When a group of subjects are tested in RT research, persons whose response times are first, second, third, to last in speed should generally retain this rank ordering upon multiple administrations with the same instrument over time. A retesting that yields an identical rank ordering in subjects’ responses from the first test administration to the second test administration will yield a stability coefficient of +1.0. When a second test administration yields a rank ordering that is a perfectly inverted mirror image of the ranking from the first test administration (i.e., the fastest person in Time 1 is the slowest person at Time 2, the second fastest person in Time 1 is the second slowest person in Time 2, etc., etc.), the stability coefficient will be -1.0. A coefficient of 0.0 means that there is no relationship in the rank ordering of subjects between a pair of test administrations under consideration. The closer a stability coefficient is to +1.0, the higher the stability in an instruments’ scores over time.
Psychological tests are generally considered to have good reliability for use in applied settings if studies yield test-retest stability coefficients of at least +0.7, and preferably over +0.8. Singal reviewed the reliability/precision estimates calculated from available race IAT studies and was immediately struck by how infrequently studies have been conducted on the reliability of these tests, given what is typically reported for other organizational, clinical, and educational assessment tools. From the few studies that were published, Singal found that reliability coefficients ranged from a high of +.65 to a low of +.39. A more recent 2014 study by Yoav Bar-Anan and Brian Nosek found that the race IAT yielded test-retest stability coefficients of +.44. Singal reports that when all IAT studies (including non-race related IAT studies) are aggregated, test-retest reliabilities averaged at around +.55 — which is objectively rated by psychometricians as poor.
Different writers have interesting ways of translating the meaning of the race IAT’s poor reliability into lay language that anyone can easily understand. One writer asserts that if one takes the race IAT today, and then takes it again tomorrow — or even in just a few hours — there’s a solid chance of getting a different result. One published critic of the race IAT is quoted as saying: “The [race] IAT isn’t even predicting the IAT two weeks later . . . How can a test predict behavior if it can’t even predict itself?”Another writer opines that since RT methodology measures responses at the precise level of milliseconds (thousandths of a second), then as little as a tenth of a second average difference in RT scores from one administration to another could very well determine the difference between being labeled ‘racist’ versus ‘not racist’ — and that these interpretational errors are multiplied as an instrument’s scores are less reliable. One critic elaborates:
“When I first took the implicit association test a few years ago, I was happy with my results . . . According to this test, I was a person free of racism, even at the subconscious level. I took the IAT again a few days later. This time, I wasn’t so happy with my results . . . According to this, I was a little racist at the subconscious level – against black people. Then I took the test again later on. This time, my results genuinely surprised me: It found once again that I had a slight automatic preference – only now it was is favor of black people. I was racist, but against white people, according to the test . . . Was this test even worth taking seriously, or was it b******t? I felt like I had gotten no real answers about my bias from this test.”
Category #3: Is the Race IAT Effective in Predicting Noxious Behaviors in Laboratory Settings?
At its simplest level, test validity refers to the degree to which evidence supports the inferences that are made from test scores for their intended purposes. The material in response to Category #2 casts serious doubt on the race IAT’s ability to measure whatever it purports to measure in a reliable and stable manner. For the sake of argument, even if it can be assumed that the race IAT does measure something reliably, this is not sufficient for establishing the validity of scores for particular purposes. As one IAT critique notes, a person’s height can be measured reliably, but it is not a valid measure of happiness.
In the introduction to Banaji and Greenwald’s 2013 book Blindspot: Hidden Biases of Good People, the authors claim that the race IAT is now established as signaling discriminatory behavior. When providing evidence in support of such statements, researchers correlate race IAT results with any number of behavioral outcomes observed in laboratory settings. Laboratory settings are indispensable for experimental research, in that they permit investigators to carefully control the environmental conditions under which the research is carried out, as well as carefully select the desired characteristics of research subjects. Laboratory settings allow researchers to hold variables constant when comparing groups, experimentally manipulate other variables presumed to influence outcomes, and use precise specifications to operationalize how psychological constructs are to be observed and measured. When research results are interpreted under these conditions, researchers are on stronger footing for ruling out alternative explanations for results.
A complete listing of the behavioral outcomes observed in race IAT laboratory settings is well beyond the scope of this relatively brief essay. Nevertheless, select examples include: videotaping subjects being interviewed by white and black interviewers, observing participants’ facial expressions, length of speech, laughing at jokes, measured proximity to the interviewer, and/or documenting interviewer impressions of interviewee friendliness and perceived comfort. For some studies, researchers analyze how subjects make decisions after interviewing equally qualified white and black applicants in a simulated hiring situation. In others, researchers study physicians’ optimal treatment recommendations for black and white patients who present with the same symptoms. Some studies operationalize racial preferences through studying voting preferences for a black vs. white candidate in a presidential election.
Validity, in its most basic terms, is understood by looking at the correlation between scores on a predictor variable (in this case, race IAT scores) and a criterion variable (scores from behaviors measured in a laboratory setting). In statistical terms, researchers calculate a correlation coefficient, square it, then multiply this value by 100. The resulting value indicates the percentage of the variation in one variable that is explained by variation in the other variable. Thus, two variables that are correlated at .7 share 49 percent variance (.72= .49; .49 x 100 = 49 percent). Variables correlated at 0.0 do not share variance, and variables perfectly correlated at +1.0 or -1.0 share all of their variance with each other.
In a 2009 meta-analysis of 122 research papers involving 184 independent samples and 14,900 subjects, Greenwald and his colleagues found that studies involving race IAT scores accounted for only 5.5 percent of the variation in racially discriminatory behavior in laboratory settings. In Singal’s review of a 2013 meta-analysis of IAT criterion studies published by Oswald and his colleagues, he argued that Greenwald’s race IAT/lab behavior correlations were overestimated from the inclusion of studies that did not actually measure racially discriminatory behavior — as well as the inappropriate exclusion of studies in which subjects displayed better behaviors toward out-groups compared to in-groups. Singal also points to a 2015 commentary by Greenwald and his colleagues where the authors essentially concede (on p. 557) that the psychometric shortcomings of the race IATs “render them problematic to use to classify persons as likely to engage in discrimination”—an admission that undermines the fundamental reason why countless persons, university instructors, organizations, and companies feel that the race IAT is useful. However, Greenwald and his colleagues counter this criticism with the argument that small effect sizes can have large ‘societal effects.’ In response, Singal nevertheless concludes:
“The point is that key experts involved in IAT research no longer claim that the IAT can be used to predict individual behavior. In this sense, the IAT has simply failed to deliver on a promise it has been making since its inception – that it can reveal otherwise hidden propensities to commit acts of racial bias. There’s no evidence it can.”
In Part III of this series, race IAT evidence under the fourth category of questions is discussed, concluding with an overall evaluation of its usefulness in addressing contemporary problems.