Difference between revisions of "UK Biobank/Downloading the data"

From Vrieze Wiki
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 42: Line 42:
 
== Quality Control ==
 
== Quality Control ==
 
<ol>
 
<ol>
<li> All files can be found on RC at: /work/KellerLab/luke/QC_UKB_FILES
+
We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes.
<li> UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/luke/QC_UKB_FILES/UK_Biobank_Axiom_Array
+
<li> All files can be found on RC at: /work/KellerLab/UKBiobank/genetics/raw/Quality_Control
 +
<li> UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/UKBiobank/genetics/raw/Quality_Control/UK_Biobank_Axiom_Array
 
<li> Additional Affymetrix and UKB information can be found on their websites:
 
<li> Additional Affymetrix and UKB information can be found on their websites:
 
         [http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiom-array/ UK Biobank Axiom Array], [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS UKB-Genetics Archive]
 
         [http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiom-array/ UK Biobank Axiom Array], [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS UKB-Genetics Archive]

Latest revision as of 18:09, 15 March 2016

These procedures were all derived from the documentation at the UK Biobank. This information is here as a record and reference. Researchers should not have to repeat these steps.

Phenotypic data

  1. The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email.
  2. All of the utilities from the UK Biobank download page were retrieved.
  3. The key, k1234.key was saved from the PI's email.
  4. This command was run to decrypt the downloaded phenotype file
    $ ./ukb_unpack ukb1234.enc k1234.key
    
    which produced the file ukb1234.enc_ukb
  5. Once decrypted, the following commands were run to extract the data into useful formats
    $ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb
    $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb
    $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb
    
    1. bulk is a list of IDs for use with the ukbfetch utility
    2. docs produces an html file containing documentation of the variables in this dataset
    3. r produces a tab deliminated file and an R script for labeling and putting levels on the variables.

Genotypic data

  1. Genetic data is downloaded following the instructions at the UK Biobank site.
  2. Scripted downloads of all chromosomes were done using a command such as
    $ seq 1 26 | parallel -j1 ./gfetch cal {}
    $ seq 1 26 | parallel -j1 ./gfetch imp {}
    
  3. A single sample map (impv1.sample) for the imputed data also was downloaded
    $ ./gfetch imp 1 -m
    

Quality Control

    We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes.
  1. All files can be found on RC at: /work/KellerLab/UKBiobank/genetics/raw/Quality_Control
  2. UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/UKBiobank/genetics/raw/Quality_Control/UK_Biobank_Axiom_Array
  3. Additional Affymetrix and UKB information can be found on their websites: UK Biobank Axiom Array, UKB-Genetics Archive
  4. A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC.
  5. A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC.
  6. A README.txt file located on RC contains the steps used and additional information.