Difference between revisions of "GSCAN dbGaP"
Jump to navigation
Jump to search
Line 2: | Line 2: | ||
===ID Mapping=== | ===ID Mapping=== | ||
The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only. | The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only. | ||
+ | |||
+ | This script can be found in /work/KellerLab/GSCAN/dbGaP/Framingham/ID_mapping.r | ||
### ID mapping file from dbGaP_Subject_ID to SAMPID | ### ID mapping file from dbGaP_Subject_ID to SAMPID |
Revision as of 23:12, 24 August 2016
Framingham
ID Mapping
The following snppit describes how the PLINK genotype file IDs (which use "sampid") were mapped to the dbGaP_Subject_ID contained in the available phenotype files. The snippet only contains one of Joyce Ling's phenotype files for illustration only.
This script can be found in /work/KellerLab/GSCAN/dbGaP/Framingham/ID_mapping.r
### ID mapping file from dbGaP_Subject_ID to SAMPID ID.map <- read.table(gzfile("/work/KellerLab/GSCAN/dbGaP/Framingham/PhenoGenotypeFiles/RootStudyConsentSet_phs000007.Framingham.v28.p10.c1.HMB-IRB-MDS/PhenotypeFiles/phs000007.v28.pht001415.v16.p10.Framingham_Sample.MULTI.txt.gz"), header=T, sep="\t", stringsAsFactors=F) ### Genotype files genotype.IDs <- read.table("/work/KellerLab/GSCAN/dbGaP/Framingham/PhenoGenotypeFiles/ChildStudyConsentSet_phs000342.Framingham.v16.p10.c1.HMB-IRB-MDS/GenotypeFiles/phg000006.v9.FHS_SHARe_Affy500K.genotype-calls-matrixfmt.c1/subject_level_PLINK_sets/FHS_SHARe_Affy500K_subjects_c1.fam", header=F) names(genotype.IDs) <- c("famid", "SAMPID", "patid", "matid", "sex", "phenotype") length(which(genotype.IDs$SAMPID %in% ID.map$SAMPID)) ## 6954 x <- merge(genotype.IDs, ID.map, by="SAMPID", all.x=T) x <- x[,c(1:5,7)] ### One of Joyce's phenotype ID files (eventually all phenotypes for all available participants were merged in, but this is a good example.) phenotypes <- read.table("OffC_Exam_1.txt", header=T, sep="\t") xx <- merge(x, phenotypes, by="dbGaP_Subject_ID", all=TRUE) xx <- xx[,c(2:ncol(xx),1)] ## Reorder so the dbGaP_Subject_ID is the last column write.table(xx, file="framingham_GSCAN_phenotypesCovariates.ped", quote=FALSE, sep="\t", row.names=F)