This is a popular fallacy encouraged by ever present big data colliding with multiple sets of research interests. The fallacy is characterized by defining a hypothesis only after data have already been gathered and analyzed. Many research results are guilty of focusing only on data supporting our preconceived ideas and ignoring any differences.
Read the article below highlighting the importance of recognizing limitations of large data sets--and even larger egos.
Implications of the Principle of Question Propagation for Comparative-Effectiveness and “Data Mining” Research
It is likely that the amount of observational research will increase significantly, especially studies involving data mining of large administrative databases and electronic medical records. However, epistemological arguments suggest that data mining efforts cannot provide definitive answers to the questions asked by the comparative-effectiveness research (CER) program. Rather, CER should be considered hypothesis-generating research aiming to inform future prospective studies that will invariably require new (and better) data collection.
The flawed premise however is that data collected and stored in relational databases for one business need is immediately appropriate to answer the modern quieries and connected datasets from EMR, administrative databases, patient outcomes, and even genomic or proteomic data sets.
The best data mining research can hope to accomplish is to provide hypothesis generating
results, which will then need to be subjected to further scrutiny using the hypothesis-testing paradigm. As the nation embarks on a multibillion-dollar investment to develop new registries, data warehouses, and other standardized collections of data in an electronic format, it
is important to heed some long-known principles in the philosophy