I think it was Yuval Noah Harari, author of Sapiens A Brief History of Humankind and Homo Deus A Brief History of Tomorrow during an interview for his new book 21 Lessons for the 21st Century proclaiming that it isn't really Big Data we need. It is clarity. In my mind, this speaks to how insights are gleaned from the right data--not just sheer volume.
The loudest voices are often the most blustery and ill-informed. Think of all big buzzwords evolving as shiny and new--we work them into our presentations, write them into our narratives, and search for them in the fire-hose of media headlines.
Not a fan of generalities but here is one now. I ALWAYS query survey writers when they include vague terminology in their drafts of instruments. We can't measure what isn't defined. And in the oft chance we attempt to measure--the analyses is doomed. I am the same way when clients are sitting around a room pontificating on "value", "innovation", or the "patient" as the latest blockbuster in healthcare.
We need to do a better job of detecting the signal from the noise.
Here is the regression model describing signal and noise. If you look at the right side of the equation,
"If I am sick and given a choice of treatments, the central question to me is which treatment has the best chance to cure me, not some randomly selected ‘representative’ person."--Xiao-Li Meng, Department of Statistics, Harvard University, Cambridge, MA
How do we decide which signals may or may not be of potential interest? I know, I know--it is easier to make claims to be on the side of the patient and far less interesting to look under the hood but when I see aggregated data telling me one thing, outcomes telling me another, and a large percentage of stakeholders simply looking in the other direction--I become suspicious.
Think about the data from immune-oncology clinical trials. We compare outcomes between groups without elaborating on a potentially undiscovered confounding or third variable. There is actually a term for this--Simpson's paradox.
For this paradox to occur, two conditions must be present: (a) an ignored or overlooked confounding variable that has a strong effect on the outcome variable; and (b) a disproportionate distribution of the confounding variable among the groups being compared (Hintzman, 1980; Hsu, 1989). The effect size of the confounding variable has to be strong enough to reverse the zero-order association between the independent and dependent variables (Cornfield et al., 1959), and the imbalance in the groups on the confounding variable has to be large (Hsu, 1989).
The complexity of innate and adaptive immunity is vast. In this era of personalized medicine we need to appreciate the infinite scope of "what we don't know that we don't know".
Heralding phase II results as ready for prime time conflicts with the need to avoid aggregation bias. Aggregation bias, also referred to as ecological bias, refers to crudely or partially adjusted associations failing to signal appropriate measurement of the effect of an exposure or treatment. This typically occurs due to differences in other risk variables among both control and treatment arms.
How do we decide how to interpret complex clinical findings? I suggest improving our data literacy. And reading everything by Xiao-Li Meng.
The full quote from Macbeth is a perennial favorite among many,
Out, out, brief candle!
Life’s but a walking shadow, a poor player
That struts and frets his hour upon the stage
And then is heard no more. It is a tale
Told by an idiot, full of sound and fury,