Addicted to Public Health

A couple years ago I started becoming obsessed with the opioid epidemic. I spend a non-trivial amount of my time thinking about it and if I am ever scrambling for a topic in a social situation, it ends up being pretty much the only thing I can think of. (Because I am So Smooth.) As a public health devotee, the epidemic hits all of my passions. To name just a few: Issue exacerbated by outdated stigma? Check. Multiple demographics impacted? Big time. Heartbreaking narratives? You will never stop crying. Socioeconomic confounders? And how.

Recently, alarm over the epidemic has reached such a fever pitch that various agencies have started hosting opioid-crisis-focused datathons/codathons/hackathons. (Three different new words for essentially the same concept does seem a little excessive, I agree.)

Being a data scientist is pretty much the coolest thing, and the world seems to have caught on to this fact. This means that there is more competition to do the really interesting work, but it also means that there is a critical mass of data scientists who will be interested in sciencing together for a marathon period (24 hours straight at least) on particular topics in our personal time.  I love attending them. They are a great opportunity to build less-exercised skills through a sudden flood of experience hours. So, you can just imagine how I feel about getting access to new opioid epidemic data as part of a hackathon. To spell it out: Teamwork + Opioid Epidemic + Data + Hackathon = G.O.A.T.

I have participated in 2 Opioid Hackathons in the past 6 months, and I and some of my coworkers are planning one of our own, sponsored by our company. One of my kick-ass data science colleagues, Catherine Ordun, submitted our results for these events in an abstract to the International Society for Disease Surveillance (ISDS) Conference in Orlando this year, and we are presenting tomorrow! I’ll be back to post more about the topic (unless I take another 3 year hiatus, obvi.)

Screen Shot 2018-01-30 at 10.31.03 PM


What is a SNP?

I recently came across this video which I made during Software Carpentry Instructor Training last year… and it’s not terrible, so I thought I’d share it!

In other news, the map application, which I mentioned during my last post, is awesome! Total success. I can’t share it here, because the data isn’t yet public, but, trust me, it’s really really great. If you want to try with some of your own data, you can checkout the project GitHub site here: shiny-court-grapher.

The main difficulty of the project, figuring out how to incorporate maps and how to use geocoded data, is mostly behind us. ggplot is wonderous!

It’s 10 o’clock — Do you know where your columns are?

I’ve made a few updates to the post about storing column locations, based on some input from a colleague.

If Only Life Had A Command-Line

"All things change, and we change with them." “All things change, and we change with them.”

Anyone who works in data analysis knows that any assumptions that you make about the formatting of the data that you receive are bound to be wrong. (Read: Assume the data came from a caveman, just to be safe.)

Handy Line

At a minimum, even if everything else is perfect (unlikely), the column names are probably not in the same order in every data set. So, rather than looking up the column number every time, I use the following line to store the number of the column of interest — in this case the “Chr” (chromosome) column — for later use throughout the script. It’s pretty basic, but super useful:

Here’s what’s happening:

  1. sed "s/r//g" $DATAFILE – strip out any weird Windows carriage returns
  2. head -n1 – look only at the first (header) line
  3. sed 's/t/n/g' – replace all tab characters with…

View original post 155 more words