Difference between revisions of "Homework 7: Functional Annotation"
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==== Notes ==== | ==== Notes ==== | ||
− | |||
==== Code for Homework ==== | ==== Code for Homework ==== | ||
Line 17: | Line 16: | ||
# Some modules you'll need | # Some modules you'll need | ||
module load tabix_0.2.6 | module load tabix_0.2.6 | ||
− | module load | + | module load anno_1.0 |
− | |||
# I encourage you to annotate your own genome, but for purposes of this | # I encourage you to annotate your own genome, but for purposes of this | ||
Line 36: | Line 34: | ||
-o chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz \ | -o chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz \ | ||
-r /Users/scvr9332/reference_data/gotcloud.ref/hs37d5.fa \ | -r /Users/scvr9332/reference_data/gotcloud.ref/hs37d5.fa \ | ||
− | -g /Users/scvr9332/reference_data/anno/ | + | -g /Users/scvr9332/reference_data/anno/refFlat_hg19.txt.gz \ |
-p /Users/scvr9332/reference_data/anno/priority.txt \ | -p /Users/scvr9332/reference_data/anno/priority.txt \ | ||
-c /Users/scvr9332/reference_data/anno/codon.txt | -c /Users/scvr9332/reference_data/anno/codon.txt | ||
Line 62: | Line 60: | ||
### The following command tells me at how many sites this individual | ### The following command tells me at how many sites this individual | ||
### has a heterozygous stop gain allele: | ### has a heterozygous stop gain allele: | ||
− | zgrep '\s0|1:\|\s1|0:' stop_gains.vcf | wc -l | + | zgrep '\s0|1:\|\s1|0:' stop_gains.vcf.gz | wc -l |
###----------------------------------### | ###----------------------------------### |
Latest revision as of 21:22, 3 December 2015
Notes
Code for Homework
# PSYC 7102 -- Statistical Genetics
# Homework #7: Functional Annotation
# Due: December 10th
# Overview: In this homework I ask you to annotate genes in your genome
# and interpret the consequences of a few annotations.
# Try not to frequently run commands on the head node, use an interactive
# session by running the following command:
qsub -I -l walltime=23:00:00 -q short -l mem=2gb -l nodes=1:ppn=2
# Some modules you'll need
module load tabix_0.2.6
module load anno_1.0
# I encourage you to annotate your own genome, but for purposes of this
# homework I ask that you conduct all analyses of this file, which contains
# a single person from 1000 Genomes. Please copy both files to a directory
# of your choice.
/Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz
/Users/scvr9332/annotation/chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz.tbi
###############
### STEP 1 ###
###############
# Please run the following command to annotate your genome.
anno -i chrALL.filtered.PASS.beagled.HG00096.rsID.vcf.gz \
-o chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz \
-r /Users/scvr9332/reference_data/gotcloud.ref/hs37d5.fa \
-g /Users/scvr9332/reference_data/anno/refFlat_hg19.txt.gz \
-p /Users/scvr9332/reference_data/anno/priority.txt \
-c /Users/scvr9332/reference_data/anno/codon.txt
### Take a look at all the output files (especially the top.anno.frq
### file).
###
###----------------------------------###
###----QUESTION 1, a-b (2 points)----###
###----------------------------------###
### a) How many stop gain variants are there?
### b) How many UTR5 variants?
##############
### STEP 2 ###
##############
### Recall from the 1000 Genomes paper that stop gain variants tend to
### be rare. Let's compare their frequency to that of synonymous
### variants.
zgrep '^#\|Stop_Gain' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > stop_gains.vcf.gz
zgrep '^#\|Synonymous' chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz | bgzip -c > synonymous.vcf.gz
### The following command tells me at how many sites this individual
### has a heterozygous stop gain allele:
zgrep '\s0|1:\|\s1|0:' stop_gains.vcf.gz | wc -l
###----------------------------------###
###----QUESTION 2, a-e (5 points)----###
###----------------------------------###
### Modify the zgrep command immediately above to answer:
### a) At how many sites does this individual have a heterozygous synonymous variant?
### b) At how many sites does this individual have a homozygous stop gain?
###
### Now Take the first homozygous stop gain.
### c) What gene does this variant affect?
### d) How many transcripts of this gene does this variant affect?
### e) What other functional consequences does this variant have in
### this gene? (e.g., splice site)
##############
### STEP 3 ###
##############
### Spencer's favorite gene is TLR4. Let's take a look.
tabix chrALL.filtered.PASS.beagled.HG00096.rsID.anno.vcf.gz 9:12000000-121000000 | grep 'TLR4' | less -S
###----------------------------------###
###----QUESTION 3, a-d (4 points)----###
###----------------------------------###
### a) How many variants are annotated to lie within some component of TLR4?
### b) How many are stop gain or essential splice site?
### c) Look at rs78848399. Descibe the amino acid change that occurs
### as a result of having an alternate allele at this site.
### Now use a tabix command to extract variant rs564925636.
### d) Describe the amino acid change that occurs as a result of
### having the alternate allele at this site. How does strand impact
### your answer to this question compared to your answer to c)?