BG Lab 1

### PSYCH 3102 Behavioral Genetics
### Lab 1 -- downloading and exploring a genome
### Author: Scott Vrieze


######################################
### STEP 1, get a terminal working ###
######################################
###
### If you have a PC running Windows, install CYGWIN
###  https://www.cygwin.com/
###  CYGWIN will automatically install in C:/cygwin/
###
### On a mac or linux computer, open up a terminal
###

#################################################
### STEP 2, make a directory in which to work ###
#################################################
### On a Windows PC, open up Cygwin.
###
### On a Mac or linux computer, open a terminal.
###
### A window with text will open up. This box is BY FAR the most
###  powerful thing on your computer. The trick is learning how to use
###  it.
###
### Let's practice!

### Let's see where you are by listing the contents of the directory
ls

### Let's create a new directory called "bg", where we can work.
mkdir bg

### Check that you created that directory by running ls again
ls

## Now, move into the bg directory
cd bg

## You should be in the bg directory. Run ls to see what's in here
ls

## The result should come up blank, because there's nothing in this
## directory yet! Let's find something interesting to put in here.


############################################
### STEP 3, download the practice genome ###
############################################
###
### The dataset is in our google drive folder, with the following
###  direct link:
###  https://drive.google.com/file/d/0B608ps4vtHUaWFNOWXJqZ0tDMXc/
###
### Download the file and then move the file to, on a Windows computer:
###   C:/cygwin/home/<username>/bg/
### On a mac:
###   /home/<username>/bg/


### Let's check and see if you got it right. Open a terminal and run
cd bg

### Then list the contents of the directory
ls
###  You should see something like the following output:
###    $ ls
###    hu916767_20170324191934.txt
###
###  If that's what you saw, congratulations, you put the file in the
###  right place!


################################################
### STEP 4, look at the contents of the file ###
################################################
###
### In your terminal, go to the bg folder, then type
less hu916767_20170324191934.txt

### That should open the file in your terminal. You can scroll up or
### down using the arrow keys. To scroll faster you can press the
### space bar. To close the "less" session, press "q".

### What if we just want to look at the first few lines?
head hu916767_20170324191934.txt

### The last few lines?
tail hu916767_20170324191934.txt

### How many variants are there? Try "wc -l". This will give you the
### number of lines in the file, which is approx the number of
### variants.
wc -l hu916767_20170324191934.txt

### That's a lot of variants. How can I extract a certain variant,
### without scrolling through the whole file?
grep 'rs9430244' hu916767_20170324191934.txt

### Try another one
grep 'rs8176719' hu916767_20170324191934.txt

### Huh, what does DD mean? I thought nucleotides could be A, C, T, or
### G.  Also -- Google that variant. What phenotype does it affect? What
### phenotype does this person have?

### We can also grab both variants, if we wanted to
grep -E 'rs8176719|rs9430244' hu916767_20170324191934.txt


######################################
### STEP 5, join commands together ###
######################################
###
### Now we'll do something called "piping". Piping allows you to run a
### command on a file, then send the output of that command to a new
### command, and possibly on to a new command. Let's give it a try.

### We saw above that "grep" allows you to extract all lines that
### match a certain character string. We also saw that "wc -l" counts
### the number of lines. Can we combine these two commands?
###
### Let's extract all the variants that are "GG"
grep 'GG' hu916767_20170324191934.txt

### OK, that didn't work so well, the output just kept on feeding our
### screen. Instead, we'll use a pipe to send that output to wc -l
grep 'GG' hu916767_20170324191934.txt | wc -l

### How about all SNPs that are homozygous?
grep -E 'GG|CC|TT|AA' hu916767_20170324191934.txt | wc -l

### How about all the variants that are homozygous?
grep -E 'GG|CC|TT|AA|II|DD' hu916767_20170324191934.txt | wc -l

### How many indels are there?
grep -E 'II|DD|ID|DI' hu916767_20170324191934.txt | wc -l

### Now, a little trickier. How many variants are on chromosome 1? Is
### this command going to work? Why or why not?
grep -E '1' hu916767_20170324191934.txt
BG Lab 1

Navigation menu

Search