In the last two posts I relied on df.info() to explore a few datasets. A lot of times you just want to hop in and hop out but there are occasions where you need a bigger lens to look at big data. Here is where Pandas Profiling is superb. Read the article below for more details but I am pulling the most relevant steps into the blog for your review and direct application to healthcare datasets.
10 Simple hacks to speed up your Data Analysis in Python
I do a lot of work evaluating population health data. This sample is from Community Health Status Indicators (CHSI)--a bit dated but a good practice dataset to experiment with and learn a few new data skills. I have an extensive checklist of curated data sources shared with workshop attendees or clients but I am happy to share an edited sample with anyone -- reach out to me either on twitter or LinkedIn.
The x-axis below is the total number of data points while the Y axis is the value of every feature for that particular data point. Hit play to see interactivity.
Obviously a curated dataset is much less crowded but you can select variables for comparison using tools in upper right menu.
We now do on-demand webinars in addition to onsite workshops tailored to your data conversations.