A lot of modeling is how much crap you can take--Lauren Hutton
I think it softened my skin. I have no problem asking the questions. The biggest surprise is how few of us challenge the answers.
Although I have experience as a bench scientist, academic writer at a large academic medical center, as well as a lot of agency work--the added layer of pharmaceutical experience was the real hat trick.
A data model is quite simply an "accumulated set of discovered relationships" within a dataset. Make no mistake the models are tested, but if sufficient "target variables" or "class labels" are not identified and considered, you can imagine the limit of insights to be gleaned from analyses.
Approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision makers, or simply reflect the widespread biases that persist in society.
It can even have the perverse result of exacerbating existing inequalities by suggesting that historically disadvantaged groups actually deserve less favorable treatment.--Big Data's Disparate Impact Barocas & Selbst
A 360 perspective is imperative while data mining. The ability to target variables specified and parsed to avoid systematically disadvantaging protected classes leads to actionable insights. It is a necessarily subjective process but can be improved with data literacy and applied to non-binary classifications.
Right now, about four million American citizens have almost no congressional voting power, not even the diluted power of Californians or Texans. Of these four million people — these citizens denied representative democracy — more than 90 percent are black or Hispanic.
They are, of course, the residents of Washington, D.C., and Puerto Rico. Almost half of Washington’s residents are black, and nearly all of Puerto Rico’s are Hispanic.
“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” – Dan Ariely
This Tree is dedicated to the more than half million veterans, taxpayers, and citizens of the District of Columbia who, despite fighting in foreign wars, paying their full measure of taxes and continue to have no voting representation in the Congress of the United States of America.
"Taxation without Representation is Tyranny."
Foundry Democracy Project, Foundry United Methodist Church
The results are pretty outrageous. The Senate gives the average black American only 75 percent as much representation as the average white American. The average Asian-American has 72 percent as much representation as a white person. And the average Hispanic American? Only 55 percent as much. That’s right — the structure of the United States Senate treats a Hispanic citizen as only about half as important as a white citizen.
"Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers."
The full quote below is an important example of how stewards of data (and aren't we all) have a unique responsibility to be thorough and informed as we explore and mine data. Garbage in, garbage out but worse yet--discrimination in, discrimination out...
Advocates of algorithmic techniques like data mining argue that these techniques eliminate human biases from the decision-making process. But an algorithm is only as good as the data it works with.
Data is frequently imperfect in ways that allow these algorithms to inherit the prejudices of prior decision makers. In other cases, data may simply reflect the widespread biases that persist in society at large. In still others, data mining can discover surprisingly useful regularities that are really just preexisting patterns of exclusion and inequality. Unthinking reliance on data mining can deny historically disadvantaged and vulnerable groups full participation in society.
Worse still, because the resulting discrimination is almost always an unintentional emergent property of the algorithm’s use rather than a conscious choice by its programmers, it can be unusually hard to identify the source of the problem or to explain it to a court.--Big Data's Disparate Impact Barocas & Selbst