Chapter 1 Executive Summary

How should we interpret the endless stream of figures of COVID-19 counts produced by health departments and organizations around the world? For better or worse, we primarily experience large complicated events through counts- How many are dead?; How many are sick?; How many tests were performed? These are universal questions, immediately accessible to both the producers of information like doctors and scientists and consumers of information like policy makers and citizens. However, for anyone who regularly works with the answers to those questions, actual data, they know that every one of those simple numbers in a cell needs a big asterisks pointing to a long footnote explaining all of the problems in developing and using that statistic. This book is that footnote for COVID-19 counts. It is intended as a practical guide for using COVID-19 data, and all of the regularities and subtle gotchas you can expect to find.

This guide is built around a new resource called the Global Census of Covid-19 Counts (GC3). It is a single normalized and georeferenced aggregation of all of the other public aggregations of COVID-19 counts data available. It currently aggregates 31 databases, who are in turn scraping and aggregating over ten thousand sources like public statements, news reports, and individual health department websites. Only by mosaicing all of these different sources together (978,650 observations and growing), are we able to finally provide full temporal coverage over the entire COVID-19 pandemic, and full spatial coverage over all countries in the world and in most places states/provinces as well. We are now able to track counts of confirmed cases and deaths in 4,923 locations, and number of tests performed in 1,373 locations.

This book is a deep dive into what problems and opportunities you can expect to find in these data. It is organized in order from simpler issues of data acquisition and aggregation, to more complicated questions of bias and latent true measurement.

1.1 Key Takeaways (TLDR)

1.2 Key Caveats

First and foremost, this is methodological text as opposed to an epidemiological text. I have neither training nor professional experience in epidemiology, medicine, or any of the germane natural sciences. I will repeatedly defer, and ask the reader to defer, to the expert consensus on any directly substantive empirical issue within those areas. What I am an expert in, if anything, is dirty cross-national time series data collected by complicated political institutions. I have learned the pitfalls of working with this kind of data, first as a political scientist working on datasets of sub-national and international violent events, and now as a computational social scientist working more broadly on very large knowledge graphs of billions of facts spanning politics, economics, and demography. This is a practical guide on data selection, cleaning, aggregation, and inference on panel event data. The application just so happens to be with events that are illness and mortality.

Second, the advice and conclusions here are intended as and should be interpreted as arguments for inferential conservativism. I am largely arguing how difficult it is to know what’s really going on, given the difficulty of measurement and number of competing plausible alternatives. These confidence intervals are the upper bound of certainty.

Third, the uncertainties documented here should not be misconstrued as ignorance or indifference. There is an anti-intellectual movement in the U.S. and elsewhere attempting to provide political alternatives to scientific expertise during the COVID-19 pandemic. Methodological critiques are often unfortunately appropriated by such movements in an attempt to show that experts are wrong or expertise itself does not exist. Let me unequivocally pre-empt those ideas. There is no COVID-19 “good news” to be found here. The national lockdowns were not an overreaction and re-opening too quickly will likely lead to a costly resurgence. The difficulty in measuring and modeling this pandamic is a reason for MORE caution NOT less. And for all of the problems with COVID-19 research right now, there are literally tens of thousands of dedicated and ernest professoinals producing high quality work that is getting us closer to ground truth each and every day. The same cannot be said for the ideologues standing in opposition, whose work has been almost uniformly intellectually dishonest, incurious, and sophist. This text is a testiment to how science is hard, not unecessary, and is a call for more scientific rigor, not less.