It isn't always easy to identify bad data. The best way, is to have multiple data sources to test your hypothesis against. Create a data library. Research white papers or published articles that list the wide variety of data sources analyzed for stories or clinical studies. Build your own database.
Proper data governance and analytics is impossible for a single data scientist engaged as a consultant. What is possible is scalable team development. Let me explain. There is a hierarchal or linear backdrop to any data project. Questions of privacy, access, responsibility, scope, etc are decided long before the launch of a successful project. The data professional appreciates the time required for data preparation and cleaning prior to analyses--many times we are alone in that regard.
The Fish That Makes Other Fish Smarter might not seem to be about data--but give it a read anyway.
By removing bloodsucking parasites, the cleaner wrasse improves the intellectual abilities of its clients.
The Fish That Makes Other Fish Smarter--Ed Young, The Atlantic
The wrasse are remarkably savvy about how they perform their services. Redouan Bshary, from the University of Neuchâtel, has shown that they sometimes cheat their clients by taking illicit bites of the protective mucus covering their skin.
I didn't want to leave you with the impression that all collaborations yield the best results. The wrasse mentioned above seems a bit unscrupulous. Those of us working on teams recognize the "mucus biter". We also recognize the tendency to prioritize quality for more "valuable" clients. I would suggest avoiding this behavior for not only the obvious reasons but because data advocacy is important no matter who swims by...
Without the cleaners, the damselfish might also not have enough energy to fully fuel their demanding brains. They’re targeted by parasitic, bloodsucking crustaceans, which makes them “anemic, sluggish, and weak,” Binning says. When cleaners remove these parasites, the distressed damsels can divert their energies toward other matters—like thinking...
Seek out the damselfish. In fact, that has been my business model. Clients are often surprised how affordable data consultancy can be--nonproprietary datasets are often comparable with high-priced schemes for privatized data. Think about it. Data is a historical record. It has already happened. What we do next requires skills in regression, classification, and independent domain knowledge.
Follow along--be a data wrasse.