data&donuts
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency

hello data
I visualize data buried in non-proprietary healthcare databases
https://unsplash.com/@winstonchen

Fresh data insights: may contain nuts

2/17/2017

 
Picture
My name is Bonny and I take potentially perverse delight in reading crappy survey instruments. And like the doorknobs in the Sixth Sense--the mistakes are more obvious when you go back for a second glance.

I am not judging without merit and I never criticize without offering to help. You may have noticed a series of posts about survey design with easy peasy suggestions you can adopt immediately--or ignore. Your call. 
The survey instrument is one of the most popular and accessible tools for gathering data. The quality of that data is determined by the questions asked, how they are written, fielded, and analyzed as well as your answer bank and format.

Just the other day a professional society fielded a list of learning objectives claiming that attendees would be able to "appreciate and understand" a list of general PRO measures.

I asked how they would measure both of these behaviors simultaneously or even separately. How does one measure "appreciation" or "understanding"? Is that really the desired outcome?

Whether you use a survey instrument or other metrics to gather information, you need to be precise. Go ahead and open the next survey you are fielded. There are likely multiple lessons for you to discover.



The article below is from Nature Genetics 2004. It remains (at least for me) the seminal voice on what we are measuring when we collect "race" as part of demographic data for analytics and population insights. Full disclosure, my thesis was on population genetics so if I get too technical please send me a message for clarification.

We differ at the nucleotide level somewhere between 1 in 1000 to 1 in 1500--when you consider we differ from chimpanzees at only 1 in 100 this brings some scope and perspective. Looking at the graphic below it seems that we do vary into neat little buckets by race. Until we introduce a population of South Indians that assume a geographic identity somewhere between the 3 measured below (yellow dots in structure analysis graphic).

A neighbor-joining tree is one of several tools to construct phylogenies or evolutionary histories.  Maximum likelihood, maximum parsimony and Bayesian Inference are able to not only measure  the amount of change between sequences but also specifically identify the differences. But for looking at whole genome phylogenetic tree, you can make a distance matrix based on how many genes differ between your species, and use that to construct an NJ tree.

Genetic variation, classification and 'race"--Jorde and Wooding

The longest branches in this tree separate individuals within the same continental populations (most variation occurs within populations)--branch length refers to how much difference occurs along a branch
The longest internal branch separates African from non-African individuals.
Picture
Visualizing the Indian individuals it becomes clear that there is considerable overlap between the Europeans and East Africans.
Picture
Now here is where it gets really interesting--and particularly relevant. The authors created another neighbor-joining tree based on polymorphisms of the gene angiotensinogen, encoding a component of the renin-angiotensin blood pressure pathway. The figure below demonstrates 246 sequence variants. The variant 235T is associated with a higher risk of hypertension.
The variant has a frequency as high as 90% in some African populations and as low as 30% in European populations.

​...In many cases individuals from different continents are more similar to one another, with regard to this gene, than are individuals from the same continent.

Patterns such as these are seen in many genes that are thought to underlie susceptibility to common diseases. Allelic variation tends to be shared widely among populations, so race will often be inaccurate predictor of response to drugs or other medical treatments.

It would be far preferable to test directly the responsible alleles in affected individuals.
Picture
I don't know why more of us don't ask the difficult questions. It seems crazy and short-sighted not to analyze data with deeper granularity. Why are we still stratifying data based on race? Clinical trials hoping to target patient-level therapies need to ask relevant questions based on evolutionary insights and disease-related variation at the patient level--not the skin level.

​I see "data" people...won't you join us?

Comments are closed.
    Sign up for our newsletter!
    Picture
    Browse the archive...
    follow us in feedly
    Picture
    Thank you for making a donution!
    donations=more content
    In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.

    ​Remember the quote by Upton Sinclair...


    “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”

    Follow the evolution of Alzheimer's Disease into a billion dollar brand
    Picture
Proudly powered by Weebly
  • Data & Donuts (thinky thoughts)
  • COLLABORATor
  • Data talks, people mumble
  • Cancer: The Brand
  • Time to make the donuts...
  • donuts (quick nibbles)
  • Tools for writers and soon-to-be writers
  • datamonger.health
  • The "How" of Data Fluency