What a busy week. I am taking a refresher R course, Introduction to R for Journalists: How to Find Great Stories in Data. I also hop in and out of the Data Science Specialization created by The Johns Hopkins over at Coursera. Admittedly the R Programming course is quite intensive but if you make it out the other side you will be in masterful shape to tackle large datasets in relatively no time at all. I highly recommend the 10 course specialization but admit it is hard to stay focused while juggling client work and travel.
But here is why it is important. If you are safely ensconced in Excel spreadsheets and tables--good luck. That data endeavor is manual and labor intensive. You need to rebuild the ship every time you decide to set sail. Not the best use of your time. Data projects rely on open source data and also a seemingly infinite number of non-proprietary data floating around the web. Data brokers have created streamlined solutions by cleaning the data for you and in many cases combining datasets to provide longitudinal analyses.
A little elbow grease though, and you are able to write a little code to update or tidy your date on the fly. As someone who did this manually to create data visualizations for clients--the juice is worth the squeeze. I code in R and Python and the living is easier...
This is also the first week in several, where work won out over running on the trails and training for an ultra-event. Sometimes only able to secure a few hours in the pre-dawn, I use the time to re-enter work life with the scale tipped back toward work. I don't know how you stay on your professional toes but I would be sunk without podcasts whispering insights and ideas into my ears.
Perhaps not a typical vacation read, but The Open Revolution: Rewriting the Rules of the Information Age introduced me to the work and writings of Rufus Pollock an economist and founder of Open Knowledge International.
I like the powerful analogy of baking a cake. If the ingredients are locally available or even in your pantry--well done you. But what if you had to drive to the farm and wait for the flour to be milled, eggs to be produced, etc. Well a data or insight "cake" has similar challenges. Sources of friction--legal issues, data quality, and data logistics are all financial as well as temporal barriers to efficiency.
The Frictionless Data Field Guide is for those of us working with data of all stripes. I found it an erstwhile companion to my work in R as a guide to workflows, data collections, data sourcing, validation, getting data out in the world, and improving data publishing. For the sake of transparency I made a mess of my desktop and dropbox files as I manipulated huge amounts of data into repositories and visualizations. After each project I dutifully collected the raw data and packaged the workbooks for clients but needed to devote not a trivial amount of time creating order from the flotsam. The workflow has made my data sources ready for the next query and accessibly organized.
In the video below, Rufus explains how the containerization of shipping automated reduced costs and increased efficiency by an order of > 1000%. Think of data packages as containers for data--once we have standardized data "containers" we have tools to validate, store, search, import, and export data.
I have to admit, it took some time to snap out of my holiday routine of long runs in Montauk and leisurely attention to projects needing a little "kicking down the road". But as we all know--there is much work to be done. I am a new member of the National Press Club and ideas for stories are buzzing around my head constantly. The newly refreshed skills in R programming language help to swat away the time factor. Yes, you need to focus and do the work up front--but once you create your datasets and data frames--you are frictionless...
Sign up for our newsletter!
Browse the archive...
Thank you for making a donution!
In a world of "evidence-based" medicine I am a bigger fan of practice-based evidence.
Remember the quote by Upton Sinclair...
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”