Difference between revisions of "UK Biobank/Downloading the data"
Jump to navigation
Jump to search
(Created page with "# The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email. # All of the utilities from the UK Biobank [http://biobank...") |
|||
(10 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | These procedures were all derived from the [http://biobank.ctsu.ox.ac.uk/showcase/exinfo.cgi?src=accessing_data_guide documentation] at the UK Biobank. This information is here as a record and reference. Researchers should not have to repeat these steps. | |
− | + | ||
− | + | == Phenotypic data == | |
− | + | ||
− | < | + | <ol> |
+ | <li>The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email.</li> | ||
+ | <li>All of the utilities from the UK Biobank [http://biobank.ctsu.ox.ac.uk/showcase/download.cgi download] page were retrieved.</li> | ||
+ | <li>The key, k1234.key was saved from the PI's email.</li> | ||
+ | <li> This command was run to decrypt the downloaded phenotype file | ||
+ | <syntaxhighlight lang="bash"> | ||
$ ./ukb_unpack ukb1234.enc k1234.key | $ ./ukb_unpack ukb1234.enc k1234.key | ||
− | </ | + | </syntaxhighlight> |
− | which produced the file ukb1234.enc_ukb | + | which produced the file ukb1234.enc_ukb</li> |
+ | <li>Once decrypted, the following commands were run to extract the data into useful formats | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | $ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb | ||
+ | $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb | ||
+ | $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb | ||
+ | </syntaxhighlight> | ||
+ | <ol> | ||
+ | <li>bulk is a list of IDs for use with the ukbfetch utility</li> | ||
+ | <li>docs produces an html file containing [https://ibg.colorado.edu/~lessem/ukb6395.html documentation of the variables] in this dataset</li> | ||
+ | <li>r produces a tab deliminated file and an R script for labeling and putting levels on the variables. | ||
+ | </ol> | ||
+ | </li> | ||
+ | </ol> | ||
+ | |||
+ | == Genotypic data == | ||
+ | <ol> | ||
+ | <li> Genetic data is downloaded following the instructions at [http://biobank.ctsu.ox.ac.uk/showcase/exinfo.cgi?src=AccessingGeneticData the UK Biobank site]. | ||
+ | <li> Scripted downloads of all chromosomes were done using a command such as | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | $ seq 1 26 | parallel -j1 ./gfetch cal {} | ||
+ | $ seq 1 26 | parallel -j1 ./gfetch imp {} | ||
+ | </syntaxhighlight> | ||
+ | <li> A single sample map (impv1.sample) for the imputed data also was downloaded | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | $ ./gfetch imp 1 -m | ||
+ | </syntaxhighlight> | ||
+ | </ol> | ||
+ | |||
+ | == Quality Control == | ||
+ | <ol> | ||
+ | We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes. | ||
+ | <li> All files can be found on RC at: /work/KellerLab/UKBiobank/genetics/raw/Quality_Control | ||
+ | <li> UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/UKBiobank/genetics/raw/Quality_Control/UK_Biobank_Axiom_Array | ||
+ | <li> Additional Affymetrix and UKB information can be found on their websites: | ||
+ | [http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiom-array/ UK Biobank Axiom Array], [https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS UKB-Genetics Archive] | ||
+ | <li> A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC. | ||
+ | <li> A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC. | ||
+ | <li> A README.txt file located on RC contains the steps used and additional information. |
Latest revision as of 18:09, 15 March 2016
These procedures were all derived from the documentation at the UK Biobank. This information is here as a record and reference. Researchers should not have to repeat these steps.
Phenotypic data
- The phenotype file was downloaded from UK Biobank by the project PI as instructed in the data accessibility email.
- All of the utilities from the UK Biobank download page were retrieved.
- The key, k1234.key was saved from the PI's email.
- This command was run to decrypt the downloaded phenotype file
which produced the file ukb1234.enc_ukb
$ ./ukb_unpack ukb1234.enc k1234.key
- Once decrypted, the following commands were run to extract the data into useful formats
$ ./ukb_conv ukb1234.enc_ukb bulk -eencoding.ukb $ ./ukb_conv ukb1234.enc_ukb docs -eencoding.ukb $ ./ukb_conv ukb1234.enc_ukb r -eencoding.ukb
- bulk is a list of IDs for use with the ukbfetch utility
- docs produces an html file containing documentation of the variables in this dataset
- r produces a tab deliminated file and an R script for labeling and putting levels on the variables.
Genotypic data
- Genetic data is downloaded following the instructions at the UK Biobank site.
- Scripted downloads of all chromosomes were done using a command such as
$ seq 1 26 | parallel -j1 ./gfetch cal {} $ seq 1 26 | parallel -j1 ./gfetch imp {}
- A single sample map (impv1.sample) for the imputed data also was downloaded
$ ./gfetch imp 1 -m
Quality Control
-
We identified lists of individuals and positions to exclude from information in the UKB data and in the Axiom Array unimputed genotypes.
- All files can be found on RC at: /work/KellerLab/UKBiobank/genetics/raw/Quality_Control
- UKB and Affymetrix performed a number of QC analyses to exclude questionable positions and identify individual samples. Additional pdfs from the UKBiobank are found within /work/KellerLab/UKBiobank/genetics/raw/Quality_Control/UK_Biobank_Axiom_Array
- Additional Affymetrix and UKB information can be found on their websites: UK Biobank Axiom Array, UKB-Genetics Archive
- A list of 1068 individuals to exclude is in Exclude_individuals.poorQC.UKB_Affy_sex.id on RC.
- A list of 8010 positions to exclude is in duplicate.positions.excludesnps.txt on RC.
- A README.txt file located on RC contains the steps used and additional information.