How do you measure race? As a data scientist I am shocked how often politicians, economists, journalists, and academics accept an aggregation of ticked boxes as a demographic variable. What do you anticipate doing with the data? Are you looking for biology or social constructs?
You might think the issues have been settled but have they? When we call someone African America are we really saying they represent the entire continent of Africa? How about a specific country? Are we thinking the cultures are homogenous across geographic boundaries?
If you don't have a reason for collecting data variables--don't. That isn't to say that race isn't relevant because it is--but not simply as a categorical variable. It contributes content, granularity, and relevance to the human experience and how the edges contribute to a connected network.
Listen to a brilliant talk about "identity" by Taiye Selasi, don't ask where I'm from, ask where I'm a local. What variables would you collect to capture data to define geography or ancestry? What about the influence of a well-defined local identity potentially driving behavior and professional choice?
How could I come from a nation? How can a human being come from a concept? It's a question that had been bothering me for going on two decades.
My task as a data analyst/scientist is to tell you the story and insights from collected information. The inferences gleaned from identity variables should be accurate and tailored to the questions and hypothesis you have generated. If I also tell you I am from the United States you more than likely know more about what I am not, rather than what I am.
Deep geographical relationships contributing to my cultural identity have been with at most 3 states. Not the entire country. I was born in New Jersey and departed for California in my 20s. I now have lived in North Carolina longer than anywhere else in my life. Even if we aren't thinking about how this sort of information defines my identity more than just saying I am American--wouldn't it minimally impact messages that might influence my behavior? Where we are "local" is the meta-data often over-looked in discussions of race.
Measure what is measurable, and make measurable what is not so.
You should integrate a wide variety of demographic questions into survey development whenever the goal is to make inferences from a population. Because of my work describing the heterogeneity of patient populations or healthcare provider behavior when presented with evolving clinical research--I rely on discrete choice models.
"Discrete choice models statistically relate the choice made by each person to the attributes of the person and the attributes of the alternatives available to the person."--Wikipedia
Perhaps overly simplistic of a definition but I think understandable in the context of how to integrate auxiliary variables into advanced survey methodology. How are you measuring what might be missing?
Does a multiple choice question format reflect the appropriate scientific level of choice architecture to your hypothesis? I am going to go out on a limb here and say, no--not really.
Finally, what we're talking about is human experience, this notoriously and gloriously disorderly affair. In creative writing, locality bespeaks humanity. The more we know about where a story is set, the more local color and texture, the more human the characters start to feel, the more relatable, not less. The myth of national identity and the vocabulary of coming from confuses us into placing ourselves into mutually exclusive categories.--Taiye Selasi writer and photographer