The Texas sharpshooter fallacy is known by many names but at the highest level it is a form of confirmation bias. We tend to find or gravitate toward information that confirms our own preconceptions, leading to statistical errors. A Texan shot a round of bullets at the side of a barn. When he finished, he painted a target around a random cluster and declared himself a sharpshooter.
This is a popular fallacy encouraged by ever present big data colliding with multiple sets of research interests. The fallacy is characterized by defining a hypothesis only after data have already been gathered and analyzed. Many research results are guilty of focusing only on data supporting our preconceived ideas and ignoring any differences.
The fallacy reminds me of the drug pipelines for Alzheimer's Disease. The current market has a greater than 99.6% failure rate but the research paradigm hasn't shifted-- even in the shadow of a failed prevailing single-minded focus--amyloid plaques.
Read the article below highlighting the importance of recognizing limitations of large data sets--and even larger egos.
Implications of the Principle of Question Propagation for Comparative-Effectiveness and “Data Mining” Research
It is likely that the amount of observational research will increase significantly, especially studies involving data mining of large administrative databases and electronic medical records. However, epistemological arguments suggest that data mining efforts cannot provide definitive answers to the questions asked by the comparative-effectiveness research (CER) program. Rather, CER should be considered hypothesis-generating research aiming to inform future prospective studies that will invariably require new (and better) data collection.
There are large gaps in the longitudinal profile of patient and population level data. Providers lack access to claims data and insurers salivate at the opportunities dormant in patient-level EMR databases. Characteristically the business need has been addressed by data mining companies looking for associations and patterns in a variety of industry specific variables. The evolving US healthcare landscape has advocated for comparative-effectiveness research as a foundational approach for discovery--utilizing our abundance of existing data.
The flawed premise however is that data collected and stored in relational databases for one business need is immediately appropriate to answer the modern quieries and connected datasets from EMR, administrative databases, patient outcomes, and even genomic or proteomic data sets.
The best data mining research can hope to accomplish is to provide hypothesis generating
Thoughtful discussions about content development and outcomes analytics that apply the principles and frameworks of health policy and economics to persistent and perplexing health and health care problems...
Browse the archive...
Thank you for making a donution!
In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.
Remember the quote by Upton Sinclair...
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”
Sign up for our newsletter!