Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Vrieze Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
BG Lab 2
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Lab assignment 2 == <syntaxhighlight lang="bash"> ### Lab 2 assignment ### Assigned: 4/20/2017 ### Due: 4/27/2017 at the beginning of class. Late assignments (even by 5 minutes) ### will not be accepted! ### ### Note: all questions should be answered with respect to the ### genotypes from hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz ### Question 1 (4 points) ### a) Extract a variant from the vcf and show me the command you used ### and the output of the command. Tell me what the individual's ### genotype is at this site. ### Question 2 (4 points) ### How many variants did 23andMe genotype in exons; that is, in protein coding sequences. ### Show me the commands you used to figure this out. ### Question 3 (8 points) ### a) Is this individual likely to be lactose intolerant? Show me the ### steps you used to figure this out. ### b) Pick one of the variants you used to determine lactose ### intolerance. What is the geographical distribution of this ### variant's allele frequency? </syntaxhighlight> Example full credit answers 1. Most of you got this one right. The most common mistake was to include too much information and too many steps (although that generally did not cost you any points). <syntaxhighlight lang="bash"> zgrep -w 'rs671' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz 12 112241766 rs671 G A . . ANN=A|missense_variant|MODERATE|ALDH2|ENSG00000111275|transcript|ENST00000261733|protein_coding|12/13|c.1510G>A|p.Glu504Lys|1571/2018|1510/1554|504/517||,A|missense_variant|MODERATE|ALDH2|ENSG00000111275|transcript|ENST00000416293|protein_coding|11/12|c.1369G>A|p.Glu457Lys|1465/1572|1369/1413|457/470||,A|3_prime_UTR_variant|MODIFIER|ALDH2|ENSG00000111275|transcript|ENST00000548536|nonsense_mediated_decay|13/14|c.*1386G>A|||||22035|,A|3_prime_UTR_variant|MODIFIER|ALDH2|ENSG00000111275|transcript|ENST00000549106|nonsense_mediated_decay|3/4|c.*89G>A|||||89|WARNING_TRANSCRIPT_NO_START_CODON GT 0/0 </syntaxhighlight> This individuals has 0 alternate alleles, so their genotype is G/G. Two reference alleles. 2. There are multiple ways to answer this. One of the most straightforward was as follows, although we could quibble over whether it should have included any splicing variants. <syntaxhighlight inline lang="bash"> zgrep 'synonymous\|missense\|start_gain\|start_lost\|stop_gain\|stop_lost\|3_prime_UTR_variant\|5_prime_UTR_variant' hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz | wc -l 52772 </syntaxhighlight> 3. One good answer was: "a) This person is likely to be lactose intolerant. One primary variant for lactose intolerance is rs4988235, while another one is rs182549. For both of these, the genotype to be lactose intolerant is C/C. This person had the C/C genotype for both sites. <syntaxhighlight lang="bash"> zgrep rs182549 hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz 2 136616754 rs182549 C T . . ANN=T|intron_variant|MODIFIER|MCM6|ENSG00000076003|transcript|ENST00000264156 |protein_coding|9/16|c.1362+117G>A||||||,T|intron_variant|MODIFIER|MCM6 |ENSG00000076003|transcript|ENST00000492091|processed_transcript|2/5 |n.181+3423G>A|||||| GT 0/0 zgrep rs4988235 hu916767_20170324191934.1kgALTallele.withHeader.snpEff.vcf.gz 2 136608646 rs4988235 G A . . ANN=A|intron_variant|MODIFIER|MCM6|ENSG00000076003|transcript|ENST00000264156|protein_coding |13/16|c.1917+326C>T||||||,A|intron_variant|MODIFIER|MCM6|ENSG00000076003|transcript|ENST00000492091 |processed_transcript|3/5|n.343+326C>T||||||,A|intron_variant|MODIFIER|MCM6|ENSG00000076003 |transcript|ENST00000483902|retained_intron|1/1|n.544+326C>T|||||| GT 0/0 </syntaxhighlight> Though it looks like at the second site the genotype is G/G, this is reading from the positive strand. The negative strand [which is used on SNPpedia, and is the transcribed strand], would be C/C. b) Geographical distribution for the allele frequency of rs182549 The Minor allele frequency is 0 in Africa, and southern Europe and Asia, while the minor allele is more prevalent (even becomes major) in Eastern Europe and Western US. Minor allele is slightly prevalent in northern South America."
Summary:
Please note that all contributions to Vrieze Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
MyWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)