By David S. Prescott, LICSW
In 2014, then US Attorney General Eric Holder made public his concerns about risk assessment in the criminal-justice system. He said, among other things, that, “These tools could have a disparate and adverse impact on the poor, on socially disadvantaged offenders, and on minorities ... they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”
Considerable dialog and debate followed. Many researchers examined their data and methods and found nothing inaccurate, although many pointed to systemic issues. That there are racial disparities in the criminal justice system is neither unknown nor controversial. Study after study has found this. Adding to the confusion is that many of the existing scales themselves tend to do well in research generally, even as many observers continue to have questions about how representative a client really is when looking at the development or cross-validation samples.
In 2015, Julia Angwin and her colleagues at ProPublica published a widely read article titled, “Machine Bias.” It took aim at risk assessment issues generally, and an instrument called the COMPAS in particular. It created enough further dialog that Anthony Flores, Kristin Bechtel, and Christopher Lowenkamp published a response in Federal Probation that took issue with Angwin and colleagues’ data analysis. The debate is interesting and informing, but beyond the scope of a single blog post.
Just the same, a 2020 review by Gina Vincent and Jodi Viljoen of the available risk measures turned up less evidence of bias than one might expect. They pointed out that not all risk assessments are created the same. They differ in their purpose, the way risk levels are determined, in their construction and the types of items included, etc.
More research is urgently needed. Those familiar with a wide range of instruments know that they can differ from each other in substantial ways. Nonetheless, practitioners have an obligation to understand how implicit and explicit biases may enter their work. This is not always so easy; no one wants things to go wrong on their watch and professionals may become reliant on instincts that are less empirically informed than they may believe. This could lead to a distrustful attitude and a readiness to seek revocation of probation with individuals they don’t understand.
Some examination of the available risk measures leads to more obvious speculations. With the well-documented racial disparities in the criminal justice system, items on risk measures related to criminal history (such as the number of prior convictions, number of prior sentencing dates) would certainly seem to indicate that there are ways that racial disparities can find their way into the measures themselves, whatever the intention of those who develop and use them. Some forms of sexual behavior have become increasingly commonplace while remaining stigmatized in court; might this play a role?
Other questions are more difficult to answer. Do scores on items related to relationship stability cause harm in cultures where pair-bonding may take a different form or where community disenfranchisement makes these relationships more difficult to build and sustain? What about people who live in high-surveillance areas who are arrested before they can reach the two-year cohabitation requirements of many scales?
Finally, what do we know about protective factors? The role of extended family, for example, can be very different in communities of color. On the other hand, what opportunities do they have to mitigate risk in communities where they are more likely to be arrested. It has been a common experience for professionals to see that most youth placed in diversion programs are White.
Evaluators do not always have it easy. There are many considerations when it comes to risk assessment, even with tools that can appear straightforward.
Given that many biases can take place beyond the awareness of the evaluator, is it possible that evaluators may become more biased when considering those items that are harder to define? For example, a White evaluator may find evidence of escalating negative affect in Black or Hispanic clients more worrying than in clients from their own background. This could be the result of the evaluator’s lack of understanding with respect to other cultures’ emotional expression.
Similarly, is it possible that an evaluator examining the sexual history and functioning of an individual from another culture may become biased? While some items in the available measures are clearer, some, such as “sexual preoccupation” might be easier for bias to enter. Let’s face it: most professionals have little training in the cross-cultural elements of human sexuality. Are the differences between majority and minority cultures enough to lead an evaluator to assign a higher score based on a lack of familiarity which leads to concern? An implicit tendency to score an item just a point higher “just in case?” Racial stereotypes of sexuality among historically marginalized people are well known. In fact, they can be inescapable enough that one has to wonder what kinds of effects they have on evaluators despite their training and best intentions.
It’s worth emphasizing that nothing in this blog post is meant to impugn the multitude of evaluators who do the best they can with the training, knowledge, and resources that they have. This author’s concern is all the ways that bias might creep into a risk assessment, one item at a time in virtually undetectable ways.
French science fiction author Gerard Werber observed, “Between what I think, what I want to say, what I believe I say, what I say, what you want to hear, what you believe to hear, what you hear, what you want to understand, what you think you understand, what you understand...They are ten possibilities that we might have some problem communicating. But let's try anyway…”
It might be worthy of consideration to extend the same idea to the process (not the measures) of risk assessment:
What the scale’s instructions say
What I think they say
How I think an item applies to a client
How I think a client scores on an individual item
How I hope or want them to score on it (beyond my awareness)
How I score that item
What I want to say about that item in a report
What I think I say about that that item in a report
What I want to say in the report
What I actually do report
What I want to recommend
What I think I recommend
What I recommend
What the reader sees
What the reader wants to see
What the reader thinks they see
How the reader interprets the report
What decision-makers want to do
What decision-makers think they should do based on the report
What decision-makers actually do based on the report
Of course, this potentially applies to each item as well as the total score and other elements of the report itself that may be implicitly biased or biasing.
I suspect many of us hope for broader dialogues on how we can all improve our processes.
 I am grateful to many people who read an earlier draft of this post and provided feedback. They include Tyffani Dent, Apryl Alexander, Laurie Rose Kepros, Seth Wescott, Amy Griffith, and Katie Gotch.