Have you heard of the Harper's Index? Often described as statistical poetry, the monthly index highlights the numbers behind the stories. A bare scaffolding is exposed and we are the richer for engaging.
The July 2018 index below begins with a thematic discussion of self-storage units and by creative association weaves vertically down the page to culminate in NYC sewage traveling by train. Incrementally this makes sense if you follow the thread.
This wonderfully orchestrated index got me thinking. What if this format--loosely reimagined--could serve as a guide to data exploration? I guide many companies and professionals along the path of data literacy. Typically, I create a few visualizations and we spend the afternoon unpacking what's under the hood. I am not a big fan of a traditionally didactic approach. I view data as dynamic and evolving. I know, I know--the headlines herald the arrival of artificial intelligence and machine learning as the panacea to insights and innovation.
But most of us are small data folks. Somehow the data explosion left many voiceless in a world where the esoteric terminology arrived without a Rosetta Stone. Even if we have a serviceable amount of data literacy, where do we find the data to address our questions?
I rely on the American Community Survey and Census data when researching social determinants of health. I have a strong interest in race data because of my own curiosity regarding identity but also professionally. How can strong statements about race be made in the absence of wide datasets that capture actual genetic variants or social correlates?
So I started thinking of my own story. My plan was to find a data stream not unlike the Harper's Index and share the data sourcing to hopefully encourage others to create a data catalogue of useful information. I decided to begin with the census report applicable to my birth year. The Census data is where you can find population data relevant to discussions of social correlates of health in addition to your own lingering questions about the world we live in.
Think of the data journey possible anchored by statistics centered on Negro Population reported in the 1970 Census. As a bi-racial woman even I was startled to see the historic language prevalent in what seems like a not-too-long ago government document. Where would my index begin?
Percent distribution of negroes reported residing in the Northeast in 1969--19%
The year in which interracial marriages were deemed legal by the U.S. Supreme Court--1967
Percentage of all marriages which were racially mixed in US (1967-1970) [specifically NJ where I was born]--0.6%
*data from a 1975 funded project by National Institutes of Health #1-RO1-HD-05137
Census by Decades
Additional sources of population totals by race 1790-1990 available here --> Historical Census Statistics on Population Totals by Race, 1790 to 1990, and by Hispanic Origin, 1970 to 1990, For Large Cities and Other Urban Places in the United States
Are you interested in data sourcing to understand population demographics and communities and how publicly available data sources can be accessed to fill in the gaps?
Follow along here or twitter @datamongerbonny