Data about mass violence can seem to offer insights into patterns: is violence getting better, or worse, over time? Is violence directed more against men or women? But in human rights data collection, we (usually) don’t know what we don’t know – and worse, what we don’t know is likely to be systematically different from what we do know.
This talk will explore the assumption that nearly every project using data must make: that the data are representative of reality in the world. We will explore how, contrary to the standard assumption, statistical patterns in raw data about violence tend to be quite different than patterns in the world. Statistical patterns in data tend to reflect how the data were collected rather than changes in the real-world phenomena data purport to represent.
Using analysis of killings in Iraq 2005-2010, homicides committed by police in the US 2005-2011, and analysis of genocide in Guatemala in 1982-1983, this talk will discuss the use of capture-recapture methods to estimate of total patterns of violence, where the estimates correct for heterogeneous underreporting. We’ll discuss how biases in raw data can, in some cases, be mitigated through estimation. The examples will be grounded in their use in public debates and in expert testimony in criminal trials for genocide and war crimes. The talk will conclude with an introduction to the methods and analysis used by the transitional justice mechanisms in Colombia—-the Special Jurisdiction for Peace and the Truth Commission. I’ll describe the data used by the Truth Commission for their quantitative conclusions. We will publish the data in April 2023, along with tools (in an R package) so the data can be analyzed correctly.