The article below is from Nature Genetics 2004. It remains (at least for me) the seminal voice on what we are measuring when we collect "race" as part of demographic data for analytics and population insights. Full disclosure, my thesis was on population genetics so if I get too technical please send me a message for clarification.
We differ at the nucleotide level somewhere between 1 in 1000 to 1 in 1500--when you consider we differ from chimpanzees at only 1 in 100 this brings some scope and perspective. Looking at the graphic below it seems that we do vary into neat little buckets by race. Until we introduce a population of South Indians that assume a geographic identity somewhere between the 3 measured below (yellow dots in structure analysis graphic).
A neighbor-joining tree is one of several tools to construct phylogenies or evolutionary histories. Maximum likelihood, maximum parsimony and Bayesian Inference are able to not only measure the amount of change between sequences but also specifically identify the differences. But for looking at whole genome phylogenetic tree, you can make a distance matrix based on how many genes differ between your species, and use that to construct an NJ tree.
Genetic variation, classification and 'race"--Jorde and Wooding
The longest branches in this tree separate individuals within the same continental populations (most variation occurs within populations)--branch length refers to how much difference occurs along a branch
Visualizing the Indian individuals it becomes clear that there is considerable overlap between the Europeans and East Africans.
Now here is where it gets really interesting--and particularly relevant. The authors created another neighbor-joining tree based on polymorphisms of the gene angiotensinogen, encoding a component of the renin-angiotensin blood pressure pathway. The figure below demonstrates 246 sequence variants. The variant 235T is associated with a higher risk of hypertension.
I don't know why more of us don't ask the difficult questions. It seems crazy and short-sighted not to analyze data with deeper granularity. Why are we still stratifying data based on race? Clinical trials hoping to target patient-level therapies need to ask relevant questions based on evolutionary insights and disease-related variation at the patient level--not the skin level.
I see "data" people...won't you join us?
Sign up for our newsletter!
Browse the archive...
Thank you for making a donution!
In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.
Remember the quote by Upton Sinclair...
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”