There are certain tasks that I have been doing a certain way forever and ever. I did not realize how complex I was making some of these work flows until I was asked to teach a class over on Teachable. Faced with creating a recipe I noticed that perhaps--in some cases--the juice was no longer worth the squeeze. Too many steps to explain to a heterogenous audience. It is one thing to slog through content if you know everyone has been baselined and we are starting from the same spot.
It is quite another thing to know that some will be yawning while others are likely to be gnashing their teeth in frustration. I won’t pretend to know the right balance but here is what I know.
The best approach is to have mini conversations. These are general to be sure but allow exposure to the vastness of a complicated subject. I realize that it is your filter for information that matters--not mine. I imagine you will learn the way I did. Collecting little bits of information here and there, retaining the glittery bits to “feather your nest” as it were.
I had this epiphany while working with Census Data. Like many of you, I have been working on a work flow ahead of the 2020 Census. Unless you are teaching, you tend to fly through certain steps and only realize your error once you are reviewing the webinar or recording. Over on Teachable the learning curve was steep but not unsurmountable.
One day I might wear makeup or add style but for now it is all about content.
This week, we will build a map. Stay tuned...
There is a commercial that says something like, “Savoring the moments that were always there.” It reads like a silver lining to self-isolation during a global pandemic. The trouble with many of us--we have been working remotely for a long time. Now the secret is out. Depending on your sensibilities this has been perhaps an “aha!” moment or a glimpse into a reality that isn’t for you.
I lean toward the savoring side. In fact, one of the reasons I consider myself “unemployable” is I would never consider traveling to an office. Unless it is down a set of stairs, through the foyer and dining room and into a quiet small office. Actual traditional employment was off the table before Covid-19. Now it is quite banned from the table and I would argue not even allowed in the house.
Technology often laments about slow adoption and implementation but carefully avoids the responsibility the people have shirked by still going about business like a pack of luddites. If you don’t believe me go apply for a job. Take a rich history of successful collaborations and outcomes and cram it into the equivalent of a chiseled stone plaque. The portals for submitting CVs are outdated, inefficient, and ask you to replicate information or experience already outlined elsewhere...a specially fun task if you work as a data scientist or analyst.
Job descriptions for technical professionals are often compared to finding a mystical unicorn. I am not sure who is responsible for writing the job description fodder but I want whatever they are imbibing that stimulates the delusion. Many requirements for certain expertise using a platform or software exceed the existence of the platform or software.
I receive dozens of messages from recruiters and HR “professionals” offering me wonderful opportunities specific to my skills. Except they have no idea what my skills might be. It would take them 5 seconds to find out that I run my own consultancy in data analytics--not likely I am going to chuck it all for a 9 to 5. But they persist.Think I’m Mad as Hell from Network.
On the other hand, if I was a recruiter or similar professional, why not use LinkedIn like the resource it could be? Read what folks are posting, look for the diamond in the rough and start identifying prospective employees with harpoons instead of wide nets.
The scene from Network reminded me of David Mamet because I confused it with Glengarry Glen Ross which he actually did write. Part of my savoring what has always been, was to actually watch MasterClass. I bought my husband a class a few years ago and managed to parlay that into a yearly subscription at a reduced rate.
It is, moreover, evident from what has been said, that it is not the function of the poet to relate what has happened, but what may happen- what is possible according to the law of probability or necessity. The poet and the historian differ not by writing in verse or in prose. The work of Herodotus might be put into verse, and it would still be a species of history, with meter no less than without it. The true difference is that one relates what has happened, the other what may happen. Poetry, therefore, is a more philosophical and a higher thing than history: for poetry tends to express the universal, history the particular.--Aristotle
My decision to no longer write manuscripts for publication weighed heavy on my mind for several months. This audio of David Mamet was like a nice tidy bow. I recently broke my vow to remove myself from the dubious role of medical writer. In the era of Covid-19 pandemic public speaking engagements dwindled or died on the vine. I said yes to work that should have been a hard no. I haven’t been a full-time medical writer in over a decade. Historically, a writer would pull together resources and summarize the existing data. Next, we helped develop the research question but not so specifically as to leave opposing data out of the conversation. We developed an annotated outline where the actual collaborations kicked off in high gear. Typically these conversations were so informed and nuanced that they were the meat on the bones of a strong outline. The authors' voices were the point of the manuscript--my role was simply to create a unified voice and narrative for submission.
Well, that was then, this is now. Now the ring leader is the client of the client. Companies spring up for hire to write whatever it is you envision--long before a patient has enrolled in a clinical trial--and often way upstream from FDA approval. The goal is for them to please their client, not inform at the point of care to improve cost, quality of life, or outcomes. Profit rules not people. I get it. I am and was well-paid to do this. Ridiculously compensated for an even more ridiculous task. And we all pretend we are doing good.
We are not.
Show up, shut up, do your job. That is all they want. Its secretarial. Its marketing.
My normal routine has been disrupted. I am betting you can relate. What remains though is vital and can illuminate the foundational elements that can moor us and keep us sane--or at least keep the crazy bits to a minimum.
Most days a group of National Press Club members gather for 30 minutes on Zoom. The Journalism Institute hosts our discussions and presents a quick writing prompt to focus our discussion. Often timely or pulled from the community at large we share thoughts, laughs, outside resources, and a nontrivial amount of camaraderie and support.
One of my contributions was a glimpse into how I am managing to conduct workshops, speak at virtual events, and keep a bold working life in the face of grounded flights, cancelled venues, and unmitigated chaos. I thought I would share some of the tools and resources that have been monumental in this shift. Most, if not all, internet resources have a free option for exploring before purchasing. And the majority of these suggestions are reasonably priced. I think about it like wearing Prada boots with a pair of Target jeans. Make the necessary investments when and if you are able.
HyperDrive hub for managing additional ports
I purchased portable USB LED Video lights on adjustable tripod stands. You don't need to get them all at once but a pack of two was less than $60. You might have seen the loop/circular variety but I think they would get crushed during travel unless you are hyper vigilant. You will be surprised about how much better you look with proper lighting.
You can see the set-up in the corner of my office. It has the yellow filter (comes with white also) and I have it adjusted so it is right behind the webcam tripod when I am standing at my desk. You can turn on your camera and the play with the lighting as it is fully adjustable and see where you look less like a zombie and more like your best self. I also have another one to my right that either sits on the desk or on the floor with the tripod fully extended.
On top of my laptop you can see the webcam. It quickly clips onto the laptop or slides into the top of the tripod immediately behind it int he photo.
The last few things are subscription based tools I discovered along the way. A few I am trying out to see if they are useful enough to warrant a yearly subscription. Simply agree to a monthly trial for now--your mileage may vary.
For example, Noun Project offers icons and other useful graphics for building data visualizations, online courses, reports, and any customizable deliverable where you want to add a little polish.
You are going to thank me for this one. I have hours and hours of conference sessions recorded--some where I am speaking but others on interesting topics in unique venues or with notable experts. I meant to transcribe them or repurpose those Zoom talks I have given but who has time for that. Although this would be the perfect task for an assistant, I don't have one at the moment.
Try Descript. I primarily use it for simultaneously editing video and audio but think about creating a podcast or editing interviews.
My experience with Descript introduced me to Loom. Loom is asynchronous tool that allows me to teach data analytics or survey design for example by screen casting. A simple workflow might be something like this. I record sessions on topics, and then edit them on Descript, and upload to my Teachable course. Boom.
In all honesty I have limited experience with Canva but I am also exploring perhaps integrating it into my workflow. I am starting to gravitate away from outside platforms and hoping to rely on my blog to share, message, and link to opportunities for engagement. I don't think I would need both Noun Project and Canva but I am exploring.
This has been a quick review of my foundation or roots in this time of upheaval and uncertainty. I hope buried here is an insight or suggestion to make your road a little smoother.
There are a lot of ways to support the blog if you found something monumental or time-saving. Share with a few friends, respond with a few of your own favorite tools, donate, or connect over on twitter.
I have been working on a course on Teachable that isn't ready for primetime just yet but newsletter subscribers and sustaining donors will get links to courses for free. I will keep you in the loop.
Not sure if this escaped your interest but we have had our own modern day water pump in theories about the widespread COVID-19 in New York. The part of the water pump is being played by subway turnstiles and is a fascinating read if nothing else. The following is a working paper that reads like a conversation--one of the reasons I enjoyed reading and interpreting the data. Full disclosure please be aware that this National Bureau of Economic Research paper is not peer-reviewed and is circulated for discussion and comments only.
THE SUBWAYS SEEDED THE MASSIVE CORONAVIRUS EPIDEMIC IN NEW YORK CITY
New York City’s multitentacled subway system was a major disseminator – if not the principal transmission vehicle – of coronavirus infection during the initial takeoff of the massive epidemic that became evident throughout the city during March 2020. The near shutoff of subway ridership in Manhattan – down by over 90 percent at the end of March – correlates strongly with the substantial increase in the doubling time of new cases in this borough. Maps of subway station turnstile entries, superimposed upon zip code-level maps of reported coronavirus incidence, are strongly consistent with subway-facilitated disease propagation. Local train lines appear to have a higher propensity to transmit infection than express lines. Reciprocal seeding of infection appears to be the best explanation for the emergence of a single hotspot in Midtown West in Manhattan. Bus hubs may have served as secondary transmission routes out to the periphery of the city.
For each station, the idea is first to compute the time trends in turnstile entries and coronavirus incidence, and then assesses whether there is a relation between the two trends across different subway stations (Fredriksson and Oliviera 2019). Unfortunately, there is a serious problem with this extraordinarily popular method of doing policy analysis (Bertrand, Duflo, and Mullainathan 2004). In particular, there is likely to be significant serial correlation in the outcomes among adjacent subway stations situated along the same line.
Following the realization that looking at the individual subway stations may not be the appropriate unit of analysis, the discussion reveals the utility of considering subway lines. I will summarize the static model of epidemic propagation discussed in more detail in the paper but basically susceptible individuals are classified as S and their contact with infectious individuals is classified as I.
Incidence of new infection depends on the frequency of contact between S and I and the probability that there is transmission of infection.
The Goscé model offers a number of insights that are immediately applicable to the data from the New York City Flushing subway line. The first is that the rate of disease transmission is related to the number of trips and average number of stations per trip along the entire subway line, and not just to the number of entries at any one subway station. Second, passengers entering the subway line even at a remote, less populous station are slowing down the system, thus increasing the transit time that the S’s stay in contact with the I’s. Third, those uninfected S- passengers who cram shoulder-to-shoulder into a particular subway are increasing train-car density and thus raising the average number of other S-passengers infected by an I-passenger who happens to be standing in the middle of the train. Fourth, local trains – like the Flushing local – are more likely to seed epidemic infections than express lines. Finally, an entire subway line, rather than the individual stations or subway cars, is the appropriate unit of analysis.
An important consideration is the impact of reducing train service likely accelerated the spread of virus as commuters found themselves crammed into fewer cars for longer periods of time.
One distinguishing factor between the present study and prior work is that seasonal influenza has generally had a reproductive number R in the range of 1.2–1.4, while pandemic influenza has had an R in the range of 1.4–1.8, with the high end representing the 1918 pandemic (Biggerstaff et al. 2014). By contrast, we have estimated the R in New York City during the initial surge of infections in early March to be on the order of 3.4 (Harris 2020). An overall assessment of these research efforts may lead some scientific reviewers to conclude that cause-and-effect remains difficult to prove. Still, we doubt whether any public health practitioner would be reluctant to take action on the basis of the facts we now know.
Harris, J. E. 2020. The Coronavirus Epidemic Curve Is Already Flattening in New York City.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3563985: National Bureau of Economic Research Working Paper No. 26917, April 3, 2020
If you want data--we have data. The Johns Hopkins Interactive Resource provides the data that fuels many of the graphics exploding across the internet. I have shared thoughts in panel discussions and here, Data after coronavirus...what survives?
In a reality that may never return, I present topics in data literacy across a wide-variety of industries but mostly community or population level data. When you are live with full access to attendees it is synergistic to be able to offer clarifications, deep dives into topics that arise, or even question how data is sourced, prepared, analyzed, and communicated. More importantly--we can challenge how the data question was formulated. Often, this requires modifying questions to better serve the data available.
In publications with limited space for back-story and education regarding terms, models, or algorithms used we can unintentionally mislead. In fact, the format of my classes often begins with the viewing of art standing in for graphics, selected to reveal biases we may not even be aware of...
In the current environment where we are isolated, bombarded with information, and perhaps fearful this is evolving into a perfect storm for misinterpretation. I have noticed battles on twitter between statisticians, epidemiologists, data scientists, and even economists. Lots of grumbling about statistical models, weak assumptions, and who bears the right to pontificate or offer expertise.
Let me tell you my perspective for what it may or may not be worth. I believe that statistics, epidemiological principles, data science, and economics are all tools and information we need to understand as data professionals. Learning to read a visualization and to create them requires a deep understanding of a lot of edges from different industries.
But here is also the thing. I use many tools, like python for example, without the understanding of a developer. Perhaps this is easier than falling short in other skills because it is hard to reach a wrong conclusion if you can't even get the code to run! I don't feel inferior, and neither should you, when using statistical tools or complex analytic algorithms. If they are to be applied to complex problems we should be able to gain a workable amount of fluency to know what we know and hope for collaboration and conversation when we are wandering off in the wrong direction.
I would like to roll back this discussion to a few foundational elements of the maths and assumptions that underlie much of the confusion. And where I think we need the experts to clarify and engage in a narrative that elucidates not isolates.
The notion that we can manage without models and that sufficient quantities of data—big data—can take the place of models is a seductive one.--What is the Purpose of Statistical Modeling
We can gather data of the scale visualized in the COVID-19 Dashboard, but do the numbers indeed speak for themselves? It is vital to recognize that there are indeed different types of models. Only one of them, "data-driven, empirical, or interpolatory" can't be wrong. It is simply summarizing underlying data--empirical models can however; serve no purpose and have low value.
On the one hand we have theory-driven, theoretical, mechanistic, or iconic models, and on the other hand we have data-driven, empirical, or interpolatory models. Theory-driven models encapsulate some kind of understanding (theory, hypothesis, conjecture) about the mechanism underlying the data, such as Newton’s Laws of motion in mechanics, or prospect theory in psychology. In contrast, data-driven models merely seek to summarize or describe the data.--What is the purpose of statistical modeling?
David Hand, Professor of Mathematics and Senior Research Investigator at Imperial college in London and author of What is the Purpose of Statistical Modeling published in the Harvard Data Science Review, cautions that Theory-driven models can indeed be wrong or misleading. Think about the scope of COVID-19 visualized by confirmed cases, death tallies, and hospitalization rates perhaps not representing the actual reality they are intended to represent.
For example, if you are not familiar with data visualization or statistics and simply view the COVID-19 projections as published by the Institute for Health Metrics and Evaluation you may not realize that the light purple shaded graphic represents the uncertainty around the measures. I rely on these graphics to describe resource allocation projections but not knowing this vital piece of information if all you are doing is a quick glance can change the game.
The Financial Times has been my favorite resource. They are offering free access to COVID-19 stories (thankfully). I read my allotment of complimentary stories but the $60/month fee for full access is a little steep. I like the readability and well-annotated graphics.
The more you read and look at data visualizations the more there is to learn. Pre-attentive attributes guide our attention but aren't reliable for determining what information might indeed be missing.
We should consider the Breiman definition of information, "to extract information about how nature is associating the response variables to the input variables."
You might detect the illusion of prediction and information when we are missing many of the input variables to describe relative frequencies of disease (COVID-19 testing across the population regardless of symptoms), modes of transmission, estimates of actual number of cases. These limitations will indeed impact our ability to plan interventions and allocate health resources.
For prediction, data-driven models are ideal–indeed in some sense optimal. Given the model form (e.g. a linear relationship between variables) or a criterion to be optimized (e.g. a sum of squared errors), they can give the best fitting model of this form, and if the criterion is related to predictive accuracy, the result is necessarily good within the model family. In contrast, theory-driven models are required for understanding, although of course they can also be used for prediction.--David Hand, What is the purpose of statistical modeling
I was recently discussing limitations and potential harms of using readily available statistics reported with the rapidly accumulating data. The Tableau Dashboard although well-intentioned might tempt many to generate graphics of limited value. In times like these, I would definitely continue to explore and learn about the data through these free platforms but I would caution the data family to yield the outcomes and insights to professionals. Here is a summary of insights gleaned from a recent article in The Guardian, Coronavirus statistics: what can we trust and what should we ignore?
I would be cautious of data reporting a daily count of confirmed cases or new deaths.
We are not testing the entire population. Determinations of eligibility for testing is widely heterogenous. Consider counties where you have to be admitted to a hospital or exhibiting profound symptoms vs. exposure to a positive case to be confirmed vs. testing the whole population.
If the sickest of all of us are being tested--would you be surprised to see increasing death rates? What about deaths not attributed to confirmed cases but likely due to COVID-19? Are we testing the dead? What if the death occurs before the test results have been returned? How are co-morbidities being attributed on death certificates?
What about false negatives? We have an expanded pool of professionals applying swabs to nasal passages and throats--how are individuals previously tested as negative but now positive counted? How will home-testing impact the sensitivity and specificity of testing?
What methods are used to smooth the data so we can capture trends?
Logarithmic scales allow comparisons between populations--many have opinions on this but I think it reflects the exponential viral growth. Yes, you might miss the overall magnitude of the problem without the s-curve but when we have R-naught driving the spread of disease--I think log scale--as long as it is clearly defined, is helpful and relevant.
Data models can be useful but the media rarely provides the limitations of the chosen models or highlights the uncertainty.
The science behind antibody testing is beyond this discussion but I suggest you listen to this quick tutorial by Peter Attia MD. His podcast is one of only a few resources I read or listen to regularly about COVID-19.
Most of the big, attention-grabbing illustrations of data science in action are data-driven. But if theory-driven models can be wrong, data-driven models can be fragile. By definition they are based on relationships observed within the data which are currently available, and if those data have been chosen by some unrepresentative process, or if they were collected from a non-stationary world, then their predictions or actions based on the models may go awry.--David Hand, What is the purpose of statistical modeling?
Apparently I am an extroverted introvert. When in social situations I am calm, friendly, and have been accused of being mildly entertaining. I thrive both literally and figuratively on public speaking and being at the podium.The problem is--I prefer to stay home or with small groups of friends.
I tell you this to give you an idea of how I have been tempering the changes of the last month or so. Delays in projects, rescheduling of talks, and serious doubts about the status of conference appearances for the remainder of the year not withstanding--I remain pretty good. I have long been a creature of habit and more importantly, a remote worker. My last W-2 gigs (many years ago) were also jobs where I worked from a remote office and traveled to client locations or to the office on a quasi-quarterly basis.
Many of us have been watching the pandemic and either relying on data visualizations or recreating our own from raw data. The problem is--there are many missteps and fumbles around what the data is actually capable of contributing to the narrative.
If you are an epidemiologist or have studied epidemiology for public health many of the miscommunications are quite obvious. I think we could all use a better foundation in data literacy and fluency and what better place to start then with a map from Johns Hopkins Coronavirus Resource Center. The data is available for download in GitHub and you will find instructions and guidance. I recommend you read the resources providing information on the terms used to describe the pandemic and important guidelines regarding epidemiology.
Click to set custom HTML
When I view the map, the red sort of creates an ominous and deadly vibe. Yes, people are dying but perhaps we need to see context to understand--fear mongering will only get us so far. I barely noticed the green font depicting the number of people recovered. If the red dots are indicating confirmed cases it is much worse. Confirmed only means they were validated with a test--a test with its own biases and limitations. And we know in the US at least we are limited in testing or even providing the tests to populations of people in our communities.
Context is king when working with large complex datasets.
There are important considerations that need to accompany any visualization but COVID-19 data has a time horizon that is critical to clarify. For example, when were national measures enacted like shelter in place, or self-isolation (shown here by star symbols). What happens if we are only measuring confirmed cases in areas where tests are known to be largely unavailable or limited?
I personally prefer the selection of a logarithmic scale on the y-axis to better convey exponential viral growth. There is a lot to discuss in this graphic. Did you notice that the US does not have a marker indicating a national message to shelter in place? If we observe countries that have issued national orders--how long before the bend in the curve is evident?
Here are a few resources to help you make better visualizations...
There is a weird facet of my personality that applauds irony in all its iterations. I was asked to speak at a local community event for "innovators and entrepreneurs" highlighting the United Nations Sustainable Goals--specifically equality. My suggestion to introduce the utility of census data and how to access, clean, and analyze for free was welcomed.
Unfortunately, in the absence of effective marketing -- the draw of census data is not exactly standing room only. Because as it turned out--they must have been all standing somewhere else. The attendees mill about drinking free beer and nibbling on heavy hors d'oeuvres and once the second round of talks begin--they are typically engaged in other conversations. Not to worry, I persist.
Back to the census data. To explore broad questions beyond GDP -- the overworked metric of gross domestic product-- it is vital and it is important to dig deeper. And better yet the insights are free once you can tackle the steep learning curve. What an opportunity to meet your potential clients, patients, or customers in the communities where they live. Identify the barriers to improved outcomes by identifying structural determinants and working for policies to ameliorate wide disparities in not only income but opportunity.
If we believe that we, as Americans, are bound together by a common concern for each other, then an urgent national priority is upon us. We must begin to end the disgrace of this other America.
If you said I was stubborn you wouldn't be a liar. I refused to acknowledge the signs of an oncoming cold. Figuring I could run it out I did an easy 10 mile run hoping my oxygenated lungs would expunge the irritants and I would be back to being shiny and new. Let's just say all didn't go as planned and I spent the afternoon sipping tea spiked with a bit of whiskey and watching a few documentaries.
Signs of Humanity was a brilliant surprise. Willie Baronet is an artist and professor in Texas. Well, after watching his documentary I can say with confidence--he is also a filmmaker. His story illustrates the humanity and compassion evident in his interviews of over 100 homeless people. Offering to purchase their signs, he collects them and creates art installations to bring awareness and conversation to the front line of our debates on community and policy change.
I have always know that poverty isn't simply one thing. It is a cascade of small and large tragedies that can leave us hopeless, bereft, and completely alone.
Watch Signs of Humanity. If you have a Prime account it is free.
As an analyst, I can only measure what I bring to the discussion. If you write about poverty, social determinants of health, or other variables with an easy numeric tally or comparator you are leaving data on the table. The tensions we hold can help inform and elevate discussions.
There is no "other" in discussions of poverty. In an economy where we must keep our fingers crossed that we don't lose our jobs (and the benefits they provide), become ill, or need to reduce work load to care for ailing parents--there is no floor. You can drop right down to the bottom at the blink of an eye.
Over the years we have been lucky. My husband and I had our parents during the fragile years of building our own little family. There were so many random challenges that didn't seem to care if we were highly educated and well compensated. His boss shot himself in the heart and we were left without a steady income that had seemed teflon over the prior 17 year period. I once worked in Pharma and as companies were sold, merged, and scrapped--I began to appreciate the fragility of long term security.
I work in healthcare for the human side of medicine--not the profit motives winding through our fragile US health system. I want us to ask better questions and to do a better job at questioning answers. We need to pay attention, become data literate, and share our stories.
Follow along with me as we explore census data, government data, and other resources to help add a human dimension to a much needed narrative...