You may recognize Hadoop as the big data open-source java based programming framework. I met Hadoop a few years ago when I was invited to a day long workshop by a big data vendor. There was a plate of stale cookies and horrible watery coffee. And an uninspiring powerpoint presentation. I try to learn a bit about distributed computing environments because on occasion I do work with large datasets.
But here is the thing--large doesn't necessarily scale to big or qualify as Big Data with a capital "B". I learned quite quickly how healthcare industry data often misleads and shifts focus in favor of appearance. If Big Data is the new thing and you want some--loosen up those purse strings. But what if you could access the tools and begin analyzing on your own?
Understanding the options and data tools beyond Excel will empower accessibility and creativity. Now is a good time, don't you think?
The numbers themselves – unless purposefully falsified – cannot lie, but they can be used to misrepresent the public statements and ranking systems we take seriously. Statistical data do not allow for lies so much as semantic manipulation: numbers drive the misuse of words. When you are told a fact, you must question how the terms within the fact are defined, and how the data have been generated. When you read a statistic, of any kind, be sure to ask how – and more importantly, why – the statistic was generated, whom it benefits, and whether it can be trusted.--Jonathan Goodman Aeon magazine
I am pretty excited to share a few new tools and datasets with you over the next few weeks and months. I use Python, SQL, MATLAB, and Tableau and will be attending the Joint Statistical Meeting in Baltimore next week.
I am going to be sharing insights from the meeting and how that influences the data projects I am currently developing. Reach out on twitter or LinkedIn with any questions or comments.
And the answer to your question--does Hadoop make your data look big? I would think twice about going out in public like that...