I don’t. Next question? I am only partially joking. The most common format for the output of most non-proprietary large datasets (at least in healthcare) seems to be CSV. Occasionally I can grab a SAS file but I think spreadsheets are here to stay. A CSV file has all of the formatting and formulas stripped out of the file so although they are still cumbersome--they work. This data is from the Household Pulse 2020 COVID household survey from the Census. You can readily see that the ability to gather any information about the shape of this data is limited. Writing a few lines of Python code can provide information about the shape of data and the variables included although unless you are familiar with the data, you will also need to download the data dictionary. This particular survey contains 82 columns and 132,961 entries or rows.
You can also explore data on CENSUS website and use their interactive tool. I usually start here and formulate data questions as I go. Reach out with any questions. The newly launched newsletter will be designed to included links for deeper dive tutorials or a focused narrative for less tech orientated subscribers. You can subscribe here. Because I am switching my existing list of subscribers to the old format over to the new format--anyone subscribing to the new format before the end of September will continue to have access for free.
Comments are closed.
|
Telling stories...Finding, curating, tidying, analyzing, and communicating your data creates many opportunities for discussion and collaboration... Take a look around...
Categoriestwitter... |