data&donuts
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency

CANCER: THE BRAND

Raw data: both an oxymoron and bad idea

12/27/2018

 
Picture
Are you still shocked about how many data projects are understaffed, poorly defined, or lack an overall cohesive strategy or purpose for the data?
​
Because I work with data and maybe even more specifically in my role as a curator of data--I spend a lot of time listening. I listen to patients. Not little dots or circles described enthusiastically from the podium at JP Morgan Healthcare Conference as "expanding market share" but independent advocates, thoughtful executives, and  physicians/healthcare providers at the point of care.

I also made an observation. The closer the "expert" is to the patient, the more trustworthy they seem to be. Although I also work in health policy and health economics, the patient is way off in the distance. An ideal, rather than someone with an actual seat at the table. Business models swirl around each node in US health systems but we should not lose sight of the real objective--improving patient outcomes and prevention strategies upstream from disease.
​
​Why are we lamenting the closure of hospitals and condensed networks of health systems? Isn't that what success looks like if more and more people are healthy?
2018 is the first year in recent memory where I stepped away from the office during the entire holiday season. Okay maybe there were meetings here and there but mostly with clients that have become friends over the years. Projects have grown beyond the data itself and into the architecture driving governance strategies and frameworks.
“Data”, runs a common refrain, “is the new oil.” Like the sticky black stuff that comes out of the ground, all those 1s and 0s are of little use until they are processed into something more valuable. That something is you.

Seven of the world’s ten most valuable companies by market capitalisation are technology firms. Excluding Apple, which makes money by selling pricey gadgets, and Microsoft, which charges businesses for its software and services, all are built on a foundation of tying data to human beings. Google and Facebook want to find out as much as it is possible to know about their users’ interests, activities, friends and family. Amazon has a detailed history of consumer behaviour. Tencent and Alibaba are the digital wallets for hundreds of millions of Chinese; both know enough about consumers to provide widely used credit scores--
How to think about data in 2019, The Economist
We see data everywhere and have discussions of objectivity and rigor in attempts to identify, clean, collect, analyze, and visualize. But what if we are identifying sources of information that support our prevailing beliefs and ideology? More than once, clients ask for the outcome without ever identifying the question they seek to answer.

Are we doing the same thing? Even raw data has been selected and processed. An important quote by Nick Barrowman states that "data is always the product of cognitive, cultural, and institutional processes that determine what to collect and how to collect it."
“Raw data” is both an oxymoron and a bad idea.—Geoffrey C. Bowker, Memory Practices in the Sciences
Transparency in the selection of datasets, data models, and even the data you leave out because of budgetary limits should all be disclosed.

​My goal for 2019 is to encourage more collaborative partners to participate in the selection of entities, relationships, and attributes in building models. Knowing how we build the models expands the team knowledge of how questions should be articulated and considered.
Paul Edwards defines “data friction” as—"friction consisting of worries, questions, and contests that assert or affirm what should count as data, or which data are good and which less reliable, or how big data sets need to be".
I think about this a lot when analyzing oncology data. It is hard to compare multiple trials without noticing the broad selection and interpretation of surrogate endpoints, homogenized patient populations  hardly representative of actual real world clinical environments, and the aggregation of data from disparate sets as representative of a single patient's "best" treatment algorithm.

I recommend listening to Vinay Prasad MD MPH either by following on twitter or his recently launched podcast, Plenary Session. You may not be a data scientist but you do need to be able to read graphs and charts. You need to know when something stinks. Especially in oncology where we seem to suffer from a bit of dissonance.

Why are we not able to accept failure in oncology clinical trials? Randomized controlled trials are conducted to minimize confounding. Not only measured confounders but those we aren't aware of. The group think is that variables will be evenly distributed across prognostic groups and minimize any impact. Vinay reminds us of this when he notes how p-hacking and negative studies exhaustively mined for anything positive--may actually introduce confounding we already mitigated by randomizing.

Picture
Picture
Vinay Prasad also has perfected the slow jam of tweet threads. One of the final of 2018 arrived with a ribbon on Christmas Eve 2018. This article was among the 15 most interesting things he read, listened or saw in 2018.

The full paper is available on Google Drive, A systematic review of trial-level meta-analyses measuring the strength of association between surrogate end-points and overall survival in oncology. The graph below depicts the evolution of trial-level surrogates as evidence over the last ~ 18 years.
Picture
When reading clinical trial results or attending conference proceedings note the trend away from comparable and meaningful measures of proof and think about the data you need to see. What is meaningful in the clinical environment and what is a signal--barely detectable but audaciously hyped.
The singular datum is not the particular in relation to any universal (the elected individual in representative democracy, for example) and the plural data is not universal, not generalizable from the singular; it is an aggregation. The power within aggregation is relational, based on potential connections: network, not hierarchy.

To be sure, data also depend upon hierarchy. Part of what distinguishes data from the more general category, information, is their discreetness. Each datum is individual, separate and separable, while still alike in kind to others in its set. It follows that the imagination of data is in some measure always an act of classification, of lumping and splitting, nesting and ranking, though the underlying principles at work can be hard to recover
--"Raw Data" Is an Oxymoron
    Bonny is a data enthusiast applying curated analysis and visualization to persistent tensions between health policy, economics, and clinical research in oncology.
    Follow @datamongerbonny

    Archives

    November 2020
    January 2020
    October 2019
    August 2019
    July 2019
    January 2019
    December 2018
    August 2018
    June 2018
    April 2018
    March 2018
    December 2017
    November 2017
    September 2017
    June 2017
    May 2017
    April 2017
    October 2016
    September 2016
    August 2016
    April 2016
    March 2016
    November 2015
    October 2015
    September 2015
    July 2015
    June 2015
    May 2015
    April 2015

    Categories

    All

    Subscribe to our mailing list

    * indicates required
    Email Format
Proudly powered by Weebly
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency