BG Lab 1
Jump to navigation
Jump to search
### PSYCH 3102 Behavioral Genetics
### Lab 1 -- downloading and exploring a genome
### Author: Scott Vrieze
######################################
### STEP 1, get a terminal working ###
######################################
###
### If you have a PC running Windows, install CYGWIN
### https://www.cygwin.com/
### CYGWIN will automatically install in C:/cygwin/
###
### On a mac or linux computer, open up a terminal
###
#################################################
### STEP 2, make a directory in which to work ###
#################################################
### On a Windows PC, open up Cygwin.
###
### On a Mac or linux computer, open a terminal.
###
### A window with text will open up. This box is BY FAR the most
### powerful thing on your computer. The trick is learning how to use
### it.
###
### Let's practice!
### Let's see where you are by listing the contents of the directory
ls
### Let's create a new directory called "bg", where we can work.
mkdir bg
### Check that you created that directory by running ls again
ls
## Now, move into the bg directory
cd bg
## You should be in the bg directory. Run ls to see what's in here
ls
## The result should come up blank, because there's nothing in this
## directory yet! Let's find something interesting to put in here.
############################################
### STEP 3, download the practice genome ###
############################################
###
### The dataset is in our google drive folder, with the following
### direct link:
### https://drive.google.com/file/d/0B608ps4vtHUaWFNOWXJqZ0tDMXc/
###
### Download the file and then move the file to, on a Windows computer:
### C:/cygwin/home/<username>/bg/
### On a mac:
### /home/<username>/bg/
### Let's check and see if you got it right. Open a terminal and run
cd bg
### Then list the contents of the directory
ls
### You should see something like the following output:
### $ ls
### hu916767_20170324191934.txt
###
### If that's what you saw, congratulations, you put the file in the
### right place!
################################################
### STEP 4, look at the contents of the file ###
################################################
###
### In your terminal, go to the bg folder, then type
less hu916767_20170324191934.txt
### That should open the file in your terminal. You can scroll up or
### down using the arrow keys. To scroll faster you can press the
### space bar. To close the "less" session, press "q".
### What if we just want to look at the first few lines?
head hu916767_20170324191934.txt
### The last few lines?
tail hu916767_20170324191934.txt
### How many variants are there? Try "wc -l". This will give you the
### number of lines in the file, which is approx the number of
### variants.
wc -l hu916767_20170324191934.txt
### That's a lot of variants. How can I extract a certain variant,
### without scrolling through the whole file?
grep 'rs9430244' hu916767_20170324191934.txt
### Try another one
grep 'rs8176719' hu916767_20170324191934.txt
### Huh, what does DD mean? I thought nucleotides could be A, C, T, or
### G. Also -- Google that variant. What phenotype does it affect? What
### phenotype does this person have?
### We can also grab both variants, if we wanted to
grep -E 'rs8176719|rs9430244' hu916767_20170324191934.txt
######################################
### STEP 5, join commands together ###
######################################
###
### Now we'll do something called "piping". Piping allows you to run a
### command on a file, then send the output of that command to a new
### command, and possibly on to a new command. Let's give it a try.
### We saw above that "grep" allows you to extract all lines that
### match a certain character string. We also saw that "wc -l" counts
### the number of lines. Can we combine these two commands?
###
### Let's extract all the variants that are "GG"
grep 'GG' hu916767_20170324191934.txt
### OK, that didn't work so well, the output just kept on feeding our
### screen. Instead, we'll use a pipe to send that output to wc -l
grep 'GG' hu916767_20170324191934.txt | wc -l
### How about all SNPs that are homozygous?
grep -E 'GG|CC|TT|AA' hu916767_20170324191934.txt | wc -l
### How about all the variants that are homozygous?
grep -E 'GG|CC|TT|AA|II|DD' hu916767_20170324191934.txt | wc -l
### How many indels are there?
grep -E 'II|DD|ID|DI' hu916767_20170324191934.txt | wc -l
### Now, a little trickier. How many variants are on chromosome 1? Is
### this command going to work? Why or why not?
grep -E '1' hu916767_20170324191934.txt