The FBI keeps track of homicide statistics, including the race of victims and offenders. However, a substantial number of homicides have unknown offenders. Herein I develop a method that combines CDC victimization data and FBI perpetrator data to produce more reliable homicide perpetration statistics. The analysis provides clues about the distribution of the unknown perpetrators.
Introduction
In 2019 a total of 16,245 homicides nationally were recorded by the FBI in their uniform crime report (UCR). The offender race was known in 11,493 cases and unknown in 4,752 cases. Of cases with known offender race, 4,728 were committed by a white person, 6,425 by a black person, and 340 by persons of other races. That is, where the offender race was known, 55.9% of offenders were black, 41.1% were white and the remaining 3.0% were other races. However, a question remains about the unknown perpetrators. Given no additional information, the most reasonable assumption is probably that unknown offenders follow the same distribution as known offenders. Nevertheless, how accurate this assumption truly is is still uncertain1. Is there any way we can test how accurate the above numbers from the FBI are?
A separate source of evidence comes from the Centers for Disease Control and Prevention (CDC). The CDC records deaths and their causes and—relevant to us—that includes deaths due to homicide. A useful feature of such data is that it largely circumvents many potential biases or unknowns that might otherwise exist in police data. It doesn’t require the crime is solved nor that the offender is known. An obvious weakness of such data is that the offender isn’t recorded or necessarily even known, only the cause of death. However, given the known fact that the great majority of homicides are intra-racial, that is, the offender and victim share the same race, it gives us another set of estimates to fall back on. In 2019, the CDC recorded that 52.8% of homicide victims were black and 42.2% were white. These numbers are quite close to the known offender figures in the UCR (55.9% and 41.1%, respectively), giving us some confidence that we’re unlikely to be massively off the mark. An important insight to note here is that, if all homicides were intra-racial, the victim distribution would be the same as the offender distribution (with respect to the proportions that are from each race). That is, if 42% of homicide victims were white, 42% of homicide victims would also have a white offender.
I develop upon this initial intuition and show that, if we know what proportion of homicides that are intra-racial, we can apply a particular correction on the CDC victimization data to estimate more reliable perpetration figures. In doing so, we can hopefully address some of the uncertainty above. The purpose here is only to attempt to accurately quantify differences in homicide perpetration, not to understand the causes of differences in criminality. I will now describe how this method works.
The simple model
For ease and illustrative purposes, we will start with a simple model where only offenders/victims who are black or white are considered. Note that Hispanic is considered an ethnic category in the census, not a racial one, and thus most Hispanics in America are included in the white category. Black and white people together capture approximately 85% of people and 95% of homicides in the United States.
In our simplified model, we consider first the number of black and white people who were victims of homicide. Let's call them Vᴮ and Vᵂ, respectively. Next we consider intra-racial coefficients; that is, of the homicides committed by a group, the proportion of the victims that are of the same group. We call these rᴮ and rᵂ. For example, if 90% of homicides committed by white people have white victims, 0.9 is the value of rᵂ. Our goal is to estimate the number of homicides committed by each group, which we call Hᴮ and Hᵂ. The relationship between these variables can be expressed as:
Vᵂ = rᵂHᵂ + (1-rᴮ)Hᴮ
Vᴮ = rᴮHᴮ + (1-rᵂ)Hᵂ
The logic is quite simple. White homicide victims [Vᵂ] can be partitioned into two groups: (1) white intra-racial killings [rᵂ×Hᵂ], and (2) black-on-white killings, i.e., black-perpetrated homicides that are not intra-racial [(1-rᴮ)×Hᴮ]. The equivalent logic applies in the second equation.
With a bit of algebra, these equations can be solved for our quantities of interest, namely the number of homicides committed by each of the respective groups:
Hᵂ = [Vᴮ(rᴮ - 1) + rᴮVᵂ]/[rᵂ + rᴮ - 1]
Hᴮ = [Vᵂ(rᵂ - 1) + rᵂVᴮ]/[rᵂ + rᴮ - 1]
Our simple model has now given us formulas for estimating the number of homicides committed by each group — we simply have to plug in the right values. The homicide victim counts (Vᵂ and Vᴮ) are directly provided by the CDC. Expressed in terms of proportions these were Vᵂ = 42.2% and Vᴮ = 52.8% in 2019. The intra-racial coefficients we will draw from FBI data; and this is the only reference the model makes to the FBI data. Using the FBI known offender crosstabs, in 2019 we find that white offenders killed 4,762 white people and 667 black people, respectively. This means that rᵂ for 2019 is 4,762/(4,762+667) = 0.88. From the same source we find that rᴮ = 0.81 in 2019.
We now have everything we need, and our simple model can estimate offense numbers. To illustrate, let us calculate the estimated proportions of homicides committed by each group in 2019:
Hᵂ = [0.528×(0.81 - 1) + 0.81×0.422]/[0.88 + 0.81 - 1] = 35%
Hᴮ = [0.422×(0.88 - 1) + 0.88×0.528]/[0.88 + 0.81 - 1] = 60%
Thus, our simple model estimates that, in 2019, 35% of homicides were committed by white people and 60% of homicides were committed by black people and the remaining 5% by others.
Completing the model
It is now time to incorporate the remaining people and complete the model. In principle, one could add a variable for each remaining racial group. However, considering that only 5% of the total homicides remain, there are arguably too few in each remaining groups for it to be useful to analyze them separately2. For the purposes of our model, the only important thing is that the remaining people (those who are neither black nor white) are included. Whether that is done in a single 'other' group or with multiple groups is immaterial. Thus, the full model will contain three groups: white, black and others.
The logic of the full model is really quite the same as the simple model. It merely needs three equations and three unknowns instead of two. And, while the logic remains simple, the algebra gets a bit more tedious and messy. For that reason I will spare you of the mathematical details. For the mathematically inclined, I can say that it becomes a linear system with 3 equations and 3 unknowns, and you need coefficients that connect any pairs of groups (e.g., an r coefficient that connects white-on-black homicides, rᵂᴮ, one that connects white-on-others, rᵂᴼ, and so on, for a total of 9). The system of equations is displayed below for those who are specifically interested in it, but understanding the logic of the simple model should otherwise be sufficient. We are now ready to show the results of the full model.
Applying the model on historical data
What has been described above can be used for any year for which both CDC and FBI data are available. Using the CDC Wonder and FBI Known offender crosstabs, we have data available from 1980-2020.
Let us first see how per capita homicide victimization rates compare with the model’s estimates of per capita homicide offending rates.
The first thing to notice is that the model’s estimates of per capita homicide offending rates (colored black) are very similar to the observed homicide victimization rates (red) for each group. This is to be expected. Any deviation between the offending and victimization rates must be due to inter-racial homicides, and only a small minority of homicides are inter-racial. A more subtle observation is that the offending rate for black people is higher than their victimization rate (full lines). The opposite is true for white people (dashed lines). This is because there are significantly more black-on-white homicides than vice versa, as illustrated below:
This leads us to a relevant realization: victimization rates are lower-bound estimates of homicide offending rates for black people, but upper-bound estimates for white people.
From the previously established rates, we can also calculate how many times more likely a black person (compared with a white person) is of being a victim of homicide as well as perpetrating homicide (as estimated by the model).
Across the 1980-2020 period, a black person is roughly 6 times more likely than a white person to be a victim of homicide, and roughly 8 times more likely than a white person to commit homicide, with a recent upward spike in 2020.
Instead of rates, we can also look at what proportions of homicides involve the respective groups. Here we can also compare offense statistics suggested by the model with that of the known offenders from the FBI. I illustrate these below for white and black people, respectively.
In the period 1984–1995 the FBI and model proportions coincide nearly perfectly. From 1996–2009 a small disparity between the two emerged, such that the FBI overestimated the white proportion of offenders and underestimated the proportion of black ones. This difference between the two estimates has steadily grown in the period 2010–2020.
This data alone cannot tell us why a disparity between the two estimates has emerged. However, it seems plausibly related to the growing racial disparity in clearance rates. Murders are much less likely to be solved in black neighborhoods, and this trend has widened in recent years. For this reason, the remaining murders with unknown offenders are expected to be disproportionately black when compared to the known offenders. This would lead the FBI known offender numbers to undercount the true number of black offenders.
Discussion
The results of this analysis show that, particularly in recent years, FBI known offender statistics have underestimated the true proportion of homicides being committed by black people (and overestimated the proportion committed by white people). There is no accusation here of any intent to deceive. Rather, known homicide offenders are simply not perfectly representative of homicide offenders in general, and unknown offenders happen to be disproportionately black, resulting in them being undercounted in the FBI figures. Why exactly this has increasingly been occurring in recent years is not clear, however I have speculated that it coincides with the growing gap in clearance rates.
A critic might suggest that it is not the FBI numbers that are wrong, but it is this model’s estimates that are wrong. While I don’t think it is practically possible that the model’s estimates are so inaccurate that it would render the above conclusions qualitatively wrong, I can see a plausible way in which the model’s estimates are slightly too high. The proportion of black homicides that are intra-racial, rᴮ, was estimated from FBI’s known numbers. If, as we have suggested, that disproportionately many homicides involving black people are among the unknown offenders and victims, it may be the case that there are more black-on-black murders than expected. That is, the rᴮ value in 2020 might for example be 0.85 instead of 0.81. From the formulas it can be seen that the higher the rᴮ value, the lower their estimated homicide perpetration rate. Therefore a critic might argue that the true rᴮ is actually much higher, maybe 0.90 instead of 0.81!
A more careful consideration shows that the possible values of rᴮ are rather constrained. The FBI tells us that there were 1877 black-on-white homicides in 2020. There are surely more — some will be missing from among the cases with unknown offenders and victims — however there are at least 1877. If the “true” rᴮ was 0.90, then 1877 would represent at most 10% of homicides committed by black people, meaning that they would’ve committed a total of at least 1877/0.1 = 18770 homicides in that year. The CDC reported just 24,352 homicide deaths in 2020, which means that those 18770 black-perpetrated homicides would represent 77% (!) of the total homicides. This shows that the rᴮ value is constrained within a rather narrow subset of possible values. If you set it too high, the known inter-racial homicides imply a high perpetration proportion. If you set it too low, a high homicide rate is implied by the model. Indeed, it is possible to search for the value of rᴮ which minimizes the estimate of the proportion of offenders who are black under these two constraints. This allows us to set a lower bound on the true value. In 2020, I find that this “optimal” rᴮ value is approximately 0.87 and from this it is calculated that 59% is a lower bound of the proportion of homicides that are committed by black people. This lower bound is higher than the 56.5% of known offenders suggested by the FBI. In other words, the FBI figure for 2020 really is too low, as we had concluded. When I repeat this lower bound analysis for the other years in the period 1980-2020, I find some interesting results. Throughout most of the period, the lower bound estimates have quite consistently been moderately lower than the figures given by the FBI. This is what you expect if the numbers are accurate; the lower bound value should be below the true value. However, since around 2012 to 2020, the lower bound estimates have either been very close to or have even exceeded the FBI figures. This verifies our previous suggestion that it is especially in the last decade that the FBI known offender figures have substantially and disproportionately undercounted black homicide offenders.
It might be worth making a small comment on the distinction between Hispanic and non-Hispanic white people. The presented method could be extended to directly model both independently if the right data were available. But the current data in the FBI known offender crosstabs do not allow for disaggregating by both Hispanic origin and race. For that reason, I have not attempted to disentangle Hispanic and non-Hispanic white people in this post. However, the CDC victimization data does allow disaggregating them, and looking at this can give us a rough idea of what would’ve happened if we could disaggregate them as offenders. In the period 2018-2020, the per capita rate of homicide victimization is roughly twice as high for Hispanic whites when compared to non-Hispanic whites (5.8 vs 2.8 per 100,000). A bit more than a third of the total white homicide victims were Hispanic (9,272 Hispanic white victims and 16,588 non-Hispanic white victims). For that reason we might also expect that roughly a third or more of white offenders would be Hispanic. In the same period 2018-2020, the black homicide victimization rate was 24.9 per 100,000, suggesting that it was 4.3 times higher than the Hispanic victimization rate and 8.9 times higher than the non-Hispanic white victimization rate.
In conclusion, crime statistics are always subject to unknowns and have the potential for being biased. I have developed a new method to evaluate the accuracy of certain FBI homicide figures. It is occasionally suggested that the FBI figures overestimate the black homicide rate relative to the white homicide rate. However, the results of this analysis demonstrate that the opposite has increasingly been true in the last decade. I have speculated that this is related to the widening disparity in clearance rates. Addressing that may improve the accuracy of the FBI statistics going forward.
The available numbers allow us to set some bounds on the possible values, by considering what would happen if none or all of the unknown cases involved a particular race. Given that there are 4,752 homicides with unknown offenders, we can bound the proportion of offenders who are white between 29% [(4,728+0)/16,245] and 58% [(4,728+4,752)/16,245]. Similarly, the proportion of offenders who are black must be between 40% [(6,425+0)/16245] and 69% [(6,425+4,752)/16,245]. These extremes are obviously unrealistic, and the real figures in 2019 are expected to be solidly within those boundaries, not near them.
For the same reason, the UCR also doesn’t report inter-racial homicide counts between all racial groups in their tables. They display them as “White”, “Black or African American” and “Other race.” See for example Expanded Homicide Table 6.
I only skimmed your article, and I'll admit my math isn't strong enough to follow it. But I have looked at simpler (2020) data. Here are some take-aways: Blacks commit roughly 57% of known homicides. But wait: about 30% are done by unknown perp. Depending upon what assumptions one makes, it will change the the estimate of actual Black homicide rate as follows:
The overall Black rate is as low as ~ 50% IF one assigns a strict portion of the unknowns at the same rate as the knowns.
But wait, there are some common-sense observations that mere math doesn't address. That the majority of victims killed by "unknown" are Black. Since we know that race of killer and victim are almost always same, from this we can infer that the majority of unknown perps are likely Black.
Another inference: Blacks commit violent crime at multiples of other races. In large metros with large or majority Black populace, Blacks commit very nearly all of the violent crime including murder.
There is also a poltical issue: Many jurisdictions do not report their crime data to the FBI. There are many reasons for this, but prime suspect would be hiding unfavorable data to promote a "progressive" or "woke" point of view. Thus, we must assume the FBI data are incomplete.
From the above, I can safely assume that most (probably > 75%) of the "Unknown" are committed by Blacks, which leads to the unhappy conjecture that Blacks commit a lot more than 57% of killings. At least we know they don't commit more than about 87% of them. Using the 75% of unknowns, we'd get 79.5%, a bit better, but still nothing to be proud of.
I admit that above are based on only a few statistics. I may be wrong. Feel free to do your own research if you like. Or perhaps there is a simple explanation, like White Supremacists are killing young black males and driving into the inner city to dump their bodies, and somehow they are eluding capture.
Thanks.
One thing I can add is that Hispanic homicide victimizations dropped sharply after the financial crash of 2008, driving down the white total and the white % of offenders in your model. I'm not sure why: perhaps the most marginal Latinos returned to their home countries? Unfortunately, the Hispanic rate got quite a bit worse in 2021-early 2022.