Shall we count the living or the dead?
An insight ignored for more than 60 years
Today, our new preprint “Shall we count the living or the dead” (coauthored with Matt Fox, Rhian Daniel, Asbjørn Hróbjartsson and Ellie Murray) has been posted to arXiv. In this manuscript, we discuss an idea which has been independently rediscovered several times across several different academic fields, yet is still not routinely implemented in applied statistical practice.
This post originally contained an animated video to explain the idea. The video contained a segment which stated that a popular textbook had “dismissed Sheps’ ideas”. While the relevant paragraph from the textbook has certainly been used by reviewers to dismiss our work on Sheps’ ideas, this may not have been the intention of the textbook authors. We have temporarily removed the video from this newsletter and delisted it from YouTube, until a correction version can be made available.
Ten years ago, when I was a doctoral student in Epidemiology at the Harvard School of Public Health, I had what I then believed was a truly original and important idea about how to choose the scale for measuring the effects of a medication (for example, in randomized trials).
It quickly became apparent that my idea had been "scooped" more than half a century earlier by Mindel Cherniack Sheps (1913-1973). In her 1958 paper "Shall we count the living or the dead", Sheps relied on the same intuition to reach the same conclusion: That when an intervention reduces the risk of an outcome, the effect should be summarized using the standard risk ratio (which “counts the dead”, i.e. considers the relative probability of the outcome event), whereas when the intervention increases risk, the effect should instead be summarized using the survival ratio (which “counts the living”, i.e. considers the relative probability of the complement of the outcome event).
Sheps’ ideas have never been implemented in practice, and most medical statisticians and epidemiologists are unaware of her work. I therefore concluded that I had two important task ahead of me:
Formalize Sheps’ intuition in terms of a fully specified counterfactual causal model (these models did not exist in 1958)
Convince methodologists and applied statisticians that this wasn't just a cute idea, but the correct solution to a well-recognized problem with real implications for medical decision making.
I spent the next several years working on this. This research program was not fundable, so I recruited personal friends as collaborators and coauthors; these superstars of Epidemiology, Medicine and Statistics donated their time and efforts towards sharpening the idea, identifying where the argument was unclear and how it could be improved, coming up with examples and improving the structure and flow of the manuscripts. Despite a lot of resistance from reviewers, we were able to publish several papers, yet we are still seeing little progress in terms of getting statisticians and applied medical researchers to actually implement Sheps' solution.
This latest manuscript adds the following:
We discuss the relevant considerations for choice of effect measure, and clarify why we consider stability to be paramount. We show why the typical decision-theoretic argument for using the risk difference fails.
The causal model that supports Sheps’ conclusions is presented in terms of Sufficient-Component Cause Models (“Causal Pie Models”), a close variation of Mackie’s INUS framework for causation. This allows us to demonstrate that the argument for stability of the survival ratio is actually an improved version of an argument that is used in many epidemiology courses in favor of the risk difference.
Instead of focusing exclusively on the situation where the effect is exactly homogeneous between groups, we clarify the advantages of defining effect heterogeneity in terms of deviations from Sheps' preferred variant of the relative risk, rather than deviations from the other variant of the relative risk.
We provide evolutionary reasons for stability of Sheps’ preferred variant of the relative risk. The conclusion depends on a biological asymmetry between levels of the exposure variable, which will generally only be viable when there was a “default state” in evolutionary history (for example: not being treated with Penicillin was the default state for our ancestors, whereas there was no default state of some other exposure variables, including gender).
In the appendix, we provide an impossibility proof for the odds ratio. Specifically, we show that if scientists choose the conditioning set (set of effect modifiers) by reasoning about the distribution of covariates which determine whether an individual will respond to treatment, conditional stability of the odds ratio will only be obtained if the conditioning set is large enough to imply stability of all other measures of effect.
While we were working on this paper, we gradually became aware that variations of the same basic idea have been rediscovered several times across different academic fields:
Patricia Cheng, a psychology professor at UCLA, published the “Power-PC” model in 1997. This model relies objects called “causal generative and preventive power”. Philosopher Clark Glymour from CMU has referred to these causal powers as "a brilliant piece of mathematical metaphysics", and a substantial literature has developed in those fields in support of this approach to extrapolation of causal effects. The Power-PC model is very closely related to our independently developed causal models, and its recommendations are identical to Sheps’.
Andre Bouckaert and Michel Mouchart, statisticians from Universite Catholique de Louvain, developed the “Sure outcomes of random events” model in 2001; this model contains mathematical objects that are identical to the objects used in our justification for Sheps' conclusion.
Mark van der Laan, Alan Hubbard and Nicholas Jewell, biostatisticians at UC Berkeley, proposed a measure of effect called “the causal switch relative risk”, which automatically selects the variant of the risk ratio recommended by Sheps.
Rose Baker, an Emeritus Professor of Applied Statistics at Salford Business School, and Dan Jackson, a biostatistician at AstroZeneca, developed a new measure of effect for meta-analysis of binary outcomes which they called the “generalized relative risk reduction (GRRR)”. GRRR can be understood as a convenient representation of the causal switch relative risk, and their justification for the effect measure hints at the same underlying understanding of underlying causal mechanisms.
Les Irwig and Paul Glasziou, leading thinkers in Evidence-Based Medicine, suggested in BMJ in 1995 that effects of interventions should be summarized using “relative benefits and absolute harms”. While this is technically a different proposal, it will lead to identical predictions if the outcome event is rare.
Sainyam Galhotra, Romila Pradhan and Babak Salimi are computer scientists working on algorithmic fairness, who recently posted a pre-print on arXiv which argues that objects called “sufficiency scores” and “necessity scores” can improve on state-of-the-art approaches to algorithm fairness and interpretability. These objects are mathematically very close variations of objects used in models that justify Sheps’ conclusions
In response to some of my earlier work, Carlos Cinelli and Judea Pearl, computer scientists at UCLA, developed a variation of causal Directed Acyclic Graphs which uses identical assumptions, and leads to identical conclusions, as the causal models that support Sheps' conclusion.
Almost all these scientists were working without knowledge of the others. This convergence of ideas from very different academic traditions hints that there is something about the underlying concepts which is appealing and intuitive to scientists who spend time thinking seriously about models for binary outcomes. We believe this ideas constitutes an attractor in idea space, and that it will continue to be “rediscovered” until it is either explicitly refuted, or routinely implemented in statistical practice.
Today, in addition to the arXiv preprint, I am also announcing plans for a documentary about all the researchers who have rediscovered variations of Sheps’ insight. When travel becomes possible again, a small film crew will follow me across North America and Europe while I have conversations with some of the scientists discussed in this blog post, as well as some of the detractor, and possibly some people who knew Sheps or who have written about other aspects of her life.
I am personally committing significant financial resources to this project, and will also accept crowdsourced backing via GoFundMe and Dogecoin (DPZNaH8zCiVAm7irRvAL9gX9ij2uuw3RGk). All contributions (no matter how small) will be acknowledged in the credits; all contributions above $15 will receive a digital copy of the final version of the documentary.
This newsletter will be used for weekly updates on my progress in convincing medical statisticians that not only was Sheps right, her ideas are fundamentally important for clinical decision making. Each newsletter will include questions and comments from readers, and my response to those comments. These comments can be submitted either via email at anders@huitfeldt, or anonymously at admonymous.co
Anders: Naturally, I think your reference to Modern Epidemiology 3rd ed. as mentioning "Sheps only briefly to dismiss her ideas about effect measures" is offensively misleading or at best incomplete. Throughout the book we mentioned many things only briefly because there was so much to cover and our space was severely limited by the publisher; instead we were trying to serve readers by working in at least brief citations to what (up to 2007) had been commonly overlooked ideas.
As for your paper, here's some details I noted immediately at the start of its appendix (of course because they concern the dismissive citation to ME3):
1) Second sentence of the Appendix: The citation of the quote [38] should be specifically to Ch. 2 of ME3 (Rothman Greenland Poole Lash) and better still would be [38, p. 9]. This is not only to help readers locate the quote and put it in context but also because that chapter was perhaps the only contentious one (mildly so at that but tortured by editing back and forth among 4 authors not quite in agreement about every detail). In particular I agree that the quoted sentence should have been more clearly confined to outcomes that involve terminal events corresponding to absorbing states (like death, organ removal, etc.) vs. continuation at risk for the event, which are quite asymmetric logically and physically.
2) Sheps and the 1989 paper I sent you (Khoury MJ, Flanders WD, Greenland S, Adams MJ. On the measurement of susceptibility in epidemiologic studies. Am J Epidemiol 1989;129:183–190)
are cited on Ch. 4 p. 65 of ME3 where it explains how Sheps relative difference equals the proportion susceptible P(C) under a nonidentified independence assumption which fails (for example) under biologic models leading to the accelerated-failure-time (AFT) survival model used in g-estimation. Which raises the question: How does that substance translate into the framework of your new paper and what it is arguing?