What is a SNP?

I recently came across this video which I made during Software Carpentry Instructor Training last year… and it’s not terrible, so I thought I’d share it!

In other news, the map application, which I mentioned during my last post, is awesome! Total success. I can’t share it here, because the data isn’t yet public, but, trust me, it’s really really great. If you want to try with some of your own data, you can checkout the project GitHub site here: shiny-court-grapher.

The main difficulty of the project, figuring out how to incorporate maps and how to use geocoded data, is mostly behind us. ggplot is wonderous!


It’s 10 o’clock — Do you know where your columns are?

I’ve made a few updates to the post about storing column locations, based on some input from a colleague.

If Only Life Had A Command-Line

"All things change, and we change with them." “All things change, and we change with them.”

Anyone who works in data analysis knows that any assumptions that you make about the formatting of the data that you receive are bound to be wrong. (Read: Assume the data came from a caveman, just to be safe.)

Handy Line

At a minimum, even if everything else is perfect (unlikely), the column names are probably not in the same order in every data set. So, rather than looking up the column number every time, I use the following line to store the number of the column of interest — in this case the “Chr” (chromosome) column — for later use throughout the script. It’s pretty basic, but super useful:

Here’s what’s happening:

  1. sed "s/r//g" $DATAFILE – strip out any weird Windows carriage returns
  2. head -n1 – look only at the first (header) line
  3. sed 's/t/n/g' – replace all tab characters with…

View original post 155 more words