A lot of modeling is how much crap you can take--Lauren Hutton
My first gig working in pharma was a lesson in acronyms. I had a 12 month contract at GSK and I learned an exponential amount. Everyone used verbal shorthand and I jotted the scattered letters down to look up later.
I think it softened my skin. I have no problem asking the questions. The biggest surprise is how few of us challenge the answers.
Although I have experience as a bench scientist, academic writer at a large academic medical center, as well as a lot of agency work--the added layer of pharmaceutical experience was the real hat trick.
Starting my data journey, I often heard statisticians discussing data models as if they were a thing I could pull from a shelf. They sound vetted and official don't they?
A data model is quite simply an "accumulated set of discovered relationships" within a dataset. Make no mistake the models are tested, but if sufficient "target variables" or "class labels" are not identified and considered, you can imagine the limit of insights to be gleaned from analyses.
Approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision makers, or simply reflect the widespread biases that persist in society.
Working in health policy and accessing databases filtered by race, economics, and a wide variety of demographic target variables--it is critical to consider all of the "possible values of the target variable into mutually exclusive categories".
A 360 perspective is imperative while data mining. The ability to target variables specified and parsed to avoid systematically disadvantaging protected classes leads to actionable insights. It is a necessarily subjective process but can be improved with data literacy and applied to non-binary classifications.
Here is an important graphic from The Senate: Affirmative Action for White People published in the NY Times--written by David Leonhardt. Civics lesson rewind: residents of Washington, D.C., and Puerto Rico do not have voting representation in the Senate.
Right now, about four million American citizens have almost no congressional voting power, not even the diluted power of Californians or Texans. Of these four million people — these citizens denied representative democracy — more than 90 percent are black or Hispanic.
“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” – Dan Ariely
Walking down New Hampshire Avenue on my way back to my hotel, I stumbled upon the Democracy Tree. It is nestled between the roots of this majestic tree. Here is what it says:
This Tree is dedicated to the more than half million veterans, taxpayers, and citizens of the District of Columbia who, despite fighting in foreign wars, paying their full measure of taxes and continue to have no voting representation in the Congress of the United States of America.
"Taxation without Representation is Tyranny."
Foundry Democracy Project, Foundry United Methodist Church
The results are pretty outrageous. The Senate gives the average black American only 75 percent as much representation as the average white American. The average Asian-American has 72 percent as much representation as a white person. And the average Hispanic American? Only 55 percent as much. That’s right — the structure of the United States Senate treats a Hispanic citizen as only about half as important as a white citizen.
My point in all of this is surprisingly not political. The illustration here is to show that technology and advancements in our Big Data culture are forgetting the nuance of machine learning, AI, and algorithms--the insights are only as good as the data we feed into them.
"Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers."
The full quote below is an important example of how stewards of data (and aren't we all) have a unique responsibility to be thorough and informed as we explore and mine data. Garbage in, garbage out but worse yet--discrimination in, discrimination out...
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with.
Browse the archive...
Thank you for making a donution!
In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.
Remember the quote by Upton Sinclair...
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”
Sign up for our newsletter!