Difference between revisions of "GSCAN dbGaP"

From Vrieze Wiki
Jump to navigation Jump to search
Line 2: Line 2:
 
===ID Mapping===
 
===ID Mapping===
 
The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only.
 
The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only.
 +
 +
This script can be found in /work/KellerLab/GSCAN/dbGaP/Framingham/ID_mapping.r
  
 
  ### ID mapping file from dbGaP_Subject_ID to SAMPID  
 
  ### ID mapping file from dbGaP_Subject_ID to SAMPID  

Revision as of 23:12, 24 August 2016

Framingham

ID Mapping

The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only.

This script can be found in /work/KellerLab/GSCAN/dbGaP/Framingham/ID_mapping.r

### ID mapping file from dbGaP_Subject_ID to SAMPID 
ID.map <- read.table(gzfile("/work/KellerLab/GSCAN/dbGaP/Framingham/PhenoGenotypeFiles/RootStudyConsentSet_phs000007.Framingham.v28.p10.c1.HMB-IRB-MDS/PhenotypeFiles/phs000007.v28.pht001415.v16.p10.Framingham_Sample.MULTI.txt.gz"), header=T, sep="\t", stringsAsFactors=F)

### Genotype files
genotype.IDs <-  read.table("/work/KellerLab/GSCAN/dbGaP/Framingham/PhenoGenotypeFiles/ChildStudyConsentSet_phs000342.Framingham.v16.p10.c1.HMB-IRB-MDS/GenotypeFiles/phg000006.v9.FHS_SHARe_Affy500K.genotype-calls-matrixfmt.c1/subject_level_PLINK_sets/FHS_SHARe_Affy500K_subjects_c1.fam", header=F)
names(genotype.IDs) <- c("famid", "SAMPID", "patid", "matid", "sex", "phenotype")

length(which(genotype.IDs$SAMPID %in% ID.map$SAMPID))
## 6954 

x <- merge(genotype.IDs, ID.map, by="SAMPID", all.x=T)
x <- x[,c(1:5,7)]

### One of Joyce's phenotype ID files (eventually all phenotypes for all available participants were merged in, but this is a good example.)
phenotypes <- read.table("OffC_Exam_1.txt", header=T, sep="\t")

xx <- merge(x, phenotypes, by="dbGaP_Subject_ID", all=TRUE)

xx <- xx[,c(2:ncol(xx),1)] ## Reorder so the dbGaP_Subject_ID is the last column 

write.table(xx, file="framingham_GSCAN_phenotypesCovariates.ped", quote=FALSE, sep="\t", row.names=F)