data&donuts
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency

hello data
I visualize data buried in non-proprietary healthcare databases
https://unsplash.com/@winstonchen

(data)modeling--how much crap can you take?

10/15/2018

 
A lot of modeling is how much crap you can take--Lauren Hutton
My first gig working in pharma was a lesson in acronyms. I had a 12 month contract at GSK and I learned an exponential amount. Everyone used verbal shorthand and I jotted the scattered letters down to look up later.

I think it softened my skin. I have no problem asking the questions. The biggest surprise is how few of us challenge the answers.

Although I have experience as a bench scientist, academic writer at a large academic medical center, as well as a lot of agency work--the added layer of pharmaceutical experience was the real hat trick.
​
Picture
Starting my data journey, I often heard statisticians discussing data models as if they were a thing I could pull from a shelf. They sound vetted and official don't they?

​A data model is quite simply an "accumulated set of discovered relationships" within a dataset. Make no mistake the models are tested, but if sufficient "target variables" or "class labels" are not identified and considered, you can imagine the limit of insights to be gleaned from analyses.
Approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision makers, or simply reflect the widespread biases that persist in society.

It can even have the perverse result of exacerbating existing inequalities by suggesting that historically disadvantaged groups actually deserve less favorable treatment.--
Big Data's Disparate Impact Barocas & Selbst
Working in health policy and accessing databases filtered by race, economics, and a wide variety of demographic target variables--it is critical to consider all of the "possible values of the target variable into mutually exclusive categories".

A 360 perspective is imperative while data mining. The ability to target variables specified and parsed to avoid systematically disadvantaging protected classes leads to actionable insights. It is a necessarily subjective process but can be improved with data literacy and applied to non-binary classifications.
Here is an important graphic from The Senate: Affirmative Action for White People published in the NY Times--written by David Leonhardt. Civics lesson rewind: residents of Washington, D.C., and Puerto Rico do not have voting representation in the Senate.
Right now, about four million American citizens have almost no congressional voting power, not even the diluted power of Californians or Texans. Of these four million people — these citizens denied representative democracy — more than 90 percent are black or Hispanic.
​

They are, of course, the residents of Washington, D.C., and Puerto Rico. Almost half of Washington’s residents are black, and nearly all of Puerto Rico’s are Hispanic.
Picture
“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” – Dan Ariely

Walking down New Hampshire Avenue on my way back to my hotel, I stumbled upon the Democracy Tree. It is nestled between the roots of this majestic tree. Here is what it says:

This Tree is dedicated to the more than half million veterans, taxpayers, and citizens of the District of Columbia who, despite fighting in foreign wars, paying their full measure of taxes and continue to have no voting representation in the Congress of the United States of America.

​"Taxation without Representation is Tyranny."
Foundry Democracy Project, Foundry United Methodist Church

Picture
Picture
The results are pretty outrageous. The Senate gives the average black American only 75 percent as much representation as the average white American. The average Asian-American has 72 percent as much representation as a white person. And the average Hispanic American? Only 55 percent as much. That’s right — the structure of the United States Senate treats a Hispanic citizen as only about half as important as a white citizen.
My point in all of this is surprisingly not political. The illustration here is to show that technology and advancements in our Big Data culture are forgetting the nuance of machine learning, AI, and algorithms--the insights are only as good as the data we feed into them.

​"Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers."

The full quote below is an important example of how stewards of data (and aren't we all) have a unique responsibility to be thorough and informed as we explore and mine data. Garbage in, garbage out but worse yet--discrimination in, discrimination out...
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with.
Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers. In other cases, data may simply reflect the widespread biases that persist in society at large. In still others, data mining can discover surprisingly useful regularities that are really just preexisting patterns of exclusion and inequality. Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society.
Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.--Big Data's Disparate Impact Barocas & Selbst

Comments are closed.
    Sign up for our newsletter!
    Picture
    Browse the archive...
    follow us in feedly
    Picture
    Thank you for making a donution!
    donations=more content
    In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.

    ​Remember the quote by Upton Sinclair...


    “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”

    Follow the evolution of Alzheimer's Disease into a billion dollar brand
    Picture
Proudly powered by Weebly
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency