Tag Archives: Claude Steele

Claude Steele, Victim of Stereotype Threat?

Claude Steele, the social psychologist best known for developing the influential concept of “stereotype threat,” is in hot water. He is Executive Vice Chancellor and Provost of the University of California at Berkeley and holds appointments in the Psychology Department and the Graduate School of Education, ” He has come under fire for the way he handled a sexual harassment complaint against the dean of the law school (who as a result of that complaint and ensuing lawsuit is now the ex-dean), Sujit Choudhry.

Law Students Unhappy

“The provost ordered a 10% pay cut in Choudhry’s $415,000 annual salary,” the Los Angeles Times reports, “required Choudhry to attend counseling and ordered him to apologize to the assistant, Tyann Sorrell, after Berkeley officials determined last July that the then-dean had violated the campus’ sexual harassment policy by repeatedly forcing unwanted kissing, hugging and touching her.”

Some think there’s more: the suspicion that Provost Steele might have handed down only a figurative slap on the wrist in return for a favor. According to documents from the dean’s harassment investigation, “Choudhry urged the faculty to approve Steele’s appointment to the law school in May,” the Los Angeles Times article reports, “at the same time the dean knew he was being investigated over sexual harassment allegations.”

At a March 10 faculty meeting Steele agreed to resign from the law school appointment and “to remove himself from the search process for an interim dean, after widespread criticism of his leadership — including a survey that found 75% of nearly 400 law students surveyed did not want him involved.”

So far Steele has not been found guilty of any wrongdoing, and University of California President Janet Napolitano and UC Berkeley Chancellor Nicholas Dirks have issued statements defending him. The allegations of a quid pro quo are “absolutely untrue,” Dirks said. Even in the absence of established wrongdoing, however, it seems safe to say that at the least Steele has not handled his vice chancellery and provost responsibilities adroitly.

Since Steele’s disappointing performance in handling a controversial harassment controversy can be compared to performing poorly on a test, perhaps it is appropriate to ask whether Steele himself might be a victim of his own discovery.

<Ten Reasons Not to Wait 25 Years to Revisit Grutter>

Here is Steele’s description of the nature and effect of “stereotype threat” taken from his expert testimony in the Grutter affirmative action  case, where he argued that standardized test scores do not accurately reflect the ability of black students.

My research, and that of my colleagues, has isolated a factor that can depress the standardized test performance of minority students — a factor we call stereotype threat. This refers to the experience of being in a situation where one recognizes that a negative stereotype about one’s group is applicable to oneself.  When this happens, one knows that one could be judged or treated in terms of that stereotype, or that one could inadvertently do something that would confirm it.

In situations where one cares very much about one’s performance or related outcomes — as in the case of serious students taking the SAT — this threat of being negatively stereotyped can be upsetting and distracting.  Our research confirms that when this threat occurs in the midst of taking a high stakes standardized test, it directly interferes with performance.

Steele is African-American, and he is certainly aware of the widespread stereotype that minorities — no matter how distinguished — are often stereotyped when they are appointed to prestigious, highly visible, high stakes positions such as his, that they are often chosen more as a demonstration of their institution’s devotion to “diversity” than because of their own merit. Did Steele’s knowledge of those stereotypes interfere with his job performance? If not, does not fact that he did not succumb to “stereotype threat” undermine or seriously qualify the theory?

<The Implausibility of Stereotype Threat> 

“Stereotype threat” is no doubt one of the most vigorously explored topics in social psychology, and I take no position here on its scientific merits. In my essay here on the widely noticed Reproducibility Project, however, “Almost Two-Thirds of Psychological Studies Are Wrong,”

I did discuss two of Steele’s “stereotype threat” studies that could not be reproduced.

Whatever its general merits, however, I have never understood why that theory has been so widely relied on to justify abandoning or minimizing the influence of standardized tests. “Stereotype threat” means that even highly qualified blacks don’t do well on tests where blacks as a group underperform, and hence where there is a stereotype of black underperformance that will be applied to them. Thus it has always seemed to me that insofar as “stereotype threat” is a real problem, race-blind grading and admissions would be the most reasonable solution.

Claude Steele, however, opposes race-blind admissions, and recommends discounting standardized test results for blacks. His antidote to “stereotype threat,” he explained in a long article summarizing his theory, is to “tell students that you you are using high standards” — this signals that that they are in fact being evaluated by “standards rather than race” — “and that … they can meet those standards (this signals that you do not view them stereotypically).”

Telling universities to eliminate or minimize standardized test scores for blacks, thus giving them admissions preferences, however, sends exactly the opposite message, as I argued in “Claude Steele, ‘Stereotype Threat,’ And Racial Preference” back in 2003 criticizing his Grutter testimony. It says in no uncertain terms to minority students that they are not capable of meeting standards applied to others and they must be judged at least in part on the basis of their race to gain admission.

Threat Follows Its Targets

Nor are taking standardized tests the only venue where “stereotype threat” impairs minority behavior, Steele observed in his Grutter testimony. “Stereotype threat follows its targets onto campus, affecting behaviors of theirs that are as varied as participating in class, seeking help from faculty, contact with students in other groups, and so on.”

Does it affect only students? If not, could it have affected how the Berkeley provost dealt with the tests of his office? It would be ironic indeed if “stereotype threat,” Frankenstein-like, turned on its creator and undermined his recent job performance, and it would be equally interesting to see the explanation if it did not.

Almost Two-thirds of Psychological Studies Are Wrong

Einstein, as everyone knows, famously defined insanity as doing the same thing repeatedly and expecting different results. Science is the mirror image of insanity (which is not to say there are no mad scientists). It expects — indeed, requires — the same results when scientists do the same experiments or calculations over and over. Thus, according to an important and widely noticed study just published in Science, “Estimating The Reproducibility Of Psychological Science,” there is a real question whether much of the allegedly scientific research published in learned journals of psychology actually qualifies as science.

The Reproducibility Project, coordinated by University of Virginia psychology professor Brian Nosek, executive director of the Center for Open Science, involved a team of 270 psychologists from around the world who attempted to replicate the findings of 100 articles published in 2008 selected from three leading psychology journals: Psychological Science, Journal of Personality and Social Psychology, and Journal of Experimental Psychology: Learning, Memory, and Cognition.

A substantial majority of the studies studied, it turned out, were not reproducible, leading to “a clear conclusion” (as stated in the Science report): “A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes.”

Humorously Understated

“Weaker evidence for the original findings” is a polite, statistically precise but obfuscatory way of saying that the conclusions of those studies could not be confirmed. Reviewing these results, the New York Times declared in an article with an almost humorously understated title that “Many Psychology Findings Not As Strong As Claimed, Study Says.” The actual Times article was considerably more dramatic than its title suggests, noting for example that “Strictly on the basis of significance — a statistical measure of how likely it is that a result did not occur by chance — 35 of the studies held up, and 62 did not. (Three were excluded because their significance was not clear.) The overall ‘effect size,’ a measure of the strength of a finding, dropped by about half across all of the studies.”

The fact that the Reproducibility Project found that the findings of nearly two-thirds of the studies its researchers examined could not be reproduced is proving to be a substantial embarrassment in the field of psychology, and those associated with the review project are making a great effort to soften the impact of their striking results. “The eye-opening results don’t necessarily mean that those original findings were incorrect or that the scientific process is flawed,” the Smithsonian Magazine insisted.

When one study finds an effect that a second study can’t replicate, there are several possible reasons, says co-author Cody Christopherson of Southern Oregon University. Study A’s result may be false, or Study B’s results may be false—or there may be some subtle differences in the way the two studies were conducted that impacted the results.

“This project is not evidence that anything is broken. Rather, it’s an example of science doing what science does,” says Christopherson. “It’s impossible to be wrong in a final sense in science. You have to be temporarily wrong, perhaps many times, before you are ever right.”

Well, sure, but how reassuring is it to be told that about two-thirds of the presumably peer-reviewed psychological research published in leading journals is wrong … but only “temporarily”?

The original studies were virtually (probably literally) all based on experiments that the reproducers tried to reproduce, and thus a substantial amount of both the originals and the reproducers’ studies was devoted to statistical analysis of significance, reliability, etc. That is no doubt as it should be, and to its credit the Reproducibility Project and the Center for Open Science have made all of their own research available online. Perhaps in the future another reproducibility with even more resources and researchers will check the work of this one.

Odd Research Design

If there is such an effort in the future, I think it would be in order to consider a dimension that so far as I can tell was not attempted here — moving beyond an analysis of the statistical fit between research methodology and conclusions to a more qualitative consideration of the research design, significance, and even good sense. Several of the studies I looked at would have fallen short on those grounds even if their conclusions had been found to be statistically valid.

Consider, for example, K.R. Morrison and D.T. Miller, “Distinguishing between silent and vocal minorities,” Journal of Personality and Social Psychology 94 (2008): 871-882, whose results were confirmed, here, for the Reproducibility Project by Prof. Matt Motyl of the University of Illinois at Chicago. Morrison and Miller set out to test the entirely reasonable hypothesis that people will be more willing to express their opinions to an audience they think supportive than one they think would be critical. To test this hypothesis they

compared the proportions of bumper stickers [counted in the parking lots of 3 Target department stores] expressing liberal or conservative opinions in a county that voted for a more liberal candidate or a more conservative candidate in the 2004 US Presidential Election. Specifically, they hypothesized that liberals in the liberal county would be more likely to express their opinions than conservatives in the liberal county, and conservatives in the conservative county would be more

likely to express their opinions than liberals in the conservative county.

Surprise! There were more Democratic bumper stickers in the Democratic county and more Republican bumper stickers in the Republican county. But do these findings really confirm the hypothesis? Can’t they be as readily explained by the fact that Democratic counties have more Democrats and Republican counties more Republicans? Or perhaps the political parties were more organized and had more to spend on bumper stickers in counties where they were strong. And do we know the demographic/political breakdown of Target shoppers? Thus the fact that these findings were replicated by this method hardly makes them more significant.

I also looked at two studies by Stanford’s Claude Steele and co-authors purporting to test his ubiquitous “stereotype threat” theory. In “The Space Between Us: Stereotype Threat and Distance in Interracial Contexts,” Journal of Personality and Social Psychology 94 (2008): 91-107, the authors “use stereotype threat theory as a model” to test a prediction that whites would physically distance themselves from blacks in a conversation where the whites feared being stereotyped as racists. In a sense the theory, assumed to have been established by Steele’s earlier work, was used to test itself. Elaborate scenarios were established, and the authors found to their relief and satisfaction that the target white males sat closer to the black confederates when the conversation was about “love and relationships” than when the subject was “racial profiling,” unless the latter were described as a “learning experience.”

The attempt to replicate this study “was unable to attain statistical significance.” It did confirm that when the subject was racial profiling whites sat farther from blacks but was unable to attribute that to any perceived “stereotype threat” fear of being regarded as racist because the distance was largely unaffected by the “learning experience” variable. “Perhaps the prominence of racial profiling in the media, such as Ferguson, Missouri, and New York, has made people, regardless of ethnicity, more apprehensive to discuss the topic and subsequently distance themselves more during conversation,” the replication author suggested. The replication, however, did not even attempt to evaluate the authors’ conclusion that the “social distance” they found confirmed their view that “one’s concern with appearing prejudiced might have the ironic and unintended consequence of causing racial harms,” that “there may be ‘racism without racists.’” Thus there is reason to doubt whether those conclusions would be warranted even if the replication had been able “to attain statistical significance.”

In another study, “Social Identity Contingencies: How Diversity Cues Signal Threat or Safety for African Americans in Mainstream Institutions,” Journal of Personality and Social Psychology 94 (2008): 615-630, Steele et al. claim to have demonstrated that “people at risk of devaluation based on group membership are attuned to cues that signal social identity contingencies — judgments, stereotypes, opportunities, restrictions, and treatments that are tied to one’s social identity.”

In English: blacks are attuned to cues that they might be devalued because they are black. One of the most prominent threatening cues identified by Steele and his co-authors was “colorblindness,” which can be seen as “a means to ignore or invalidate the challenges that come with stigmatized group identities. Interpreted in this way, a colorblind diversity philosophy is diagnostic of marginalization, and we expect this cue to activate threatening social identity contingencies.”

The analysis of this study “did not replicate the original finding that fairness cues create more trust for Black but not White participants in an environment with low-minority representation.” It did not, however, attempt to evaluate the accuracy of reasonableness of the “cue” that a company’s colorblind policy can be seen as a threat to marginalize its black employees. But even if that and the study’s other findings were confirmed, however, the original study  would probably provide more convincing evidence of the pervasive political correctness in the Bay Area, where the participants were selected, than the persuasiveness of Steele’s “stereotype threat” theory.

Methodological replication, in short, is important … but it is not all-important. Studies like these three, for example, would be unconvincing even if their findings were confirmed.