You might not need statistics. Hold on. Before you start doing cartwheels down the hall--this only applies if there is no uncertainty in your data.
Perhaps you have data collected from a continuing medical education activity. You are only interested in how the attendees navigated through your course and what your data demonstrates about this group and this group only.
But what if you want to be able to extrapolate these findings to the broader population? Now we have introduced uncertainty. It is impossible to reach every single member of your target audience roaming around the planet. Now we are dealing with uncertainty and the mathematics needed to talk in probabilities--statistics--"inferring proportions in a whole from those in a representative sample"
How would primary care physicians, for example, that did not attend your educational activity answer the questions you asked throughout the engagement? Better yet, what attributes of your activity are members of the educational audience accessing at the point of care? Was it your activity that nudges a participant toward evidence-based behavior? Are you certain it wasn't another program or resource, colleague conversation, alternative type of media, or other fractional or incremental activity?
I primarily focus on ideas. First, looking at wide and shallow datasets and mining for hypotheses. These are only for inspiration. The rigor is downstream of the initial queries. Collaboratively a question is identified and the team finds the best tools for the job.
Survey design is often needed to gather data.
Does your team include someone with the skillset to carefully perform mathematical approximations for continuous data, categorical data, sliding scale data? How do you determine number of questions, correct for the number of questions, construct survey to minimize response bias, and satisficing?
Who evaluates the original purpose of survey? Is a survey the best tool? What is worth measuring?
In my case, the toolbox includes R language, Python, SAS, Qualtrics, and Tableau.
The biggest problem in collaboration usually rests around a Type III or IV error.
Type I error (incorrectly rejecting the null hypothesis)
Type II error (not rejecting the null when you should)
Type III error (you correctly reject the null hypothesis, but for the wrong reason)--Statistics How To
Type IV error is a specific type of Type III error (correctly reject the null hypothesis, but make a mistake interpreting the results)
A few examples:
Data clients typically invest or inherit large datasets. Perhaps they purchased an extract of claims data, CMS data, or hospital system data (intentionally vague to not reveal client identities).
Here is the caution.
Many business teams are guilty of mining their data for inspiration. If it stopped there--we would be okay. The problem exists when the attempt is made to introduce statistical "rigor" into the same data set.
Cassie Kozyrkov, Chief Decision Scientist at Google Cloud does a great job clarifying why this is problematic. Think of sitting down in the morning with your cup of tea looking forward to a nice piece of sourdough toast. The toaster pops up and holy cow, your toast appears to have the image of Elvis carefully outlined on the lightly browned surface.
If you want to investigate the miracle--or better yet--the exploits of your toaster, would you put in a fresh slice of bread or put the Elvis toast back in the toaster?
Inspiration is cheap, rigor is expensive--Cassie Kozyrkov
Sign up for our newsletter!
Browse the archive...
Thank you for making a donution!
In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.
Remember the quote by Upton Sinclair...
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”