Methods

Note to all users: the software code running this new version of PDGene still has some errors and instabilities that we are in the process of fixing – we appreciate your patience during this process! If you encounter a problem or have suggestions on how to improve the site, please feel free to contact us and we will get back to you as soon as possible. We are currently also curating the "Top Results" list that the users of the old PDGene version may know; this will become available at the beginning of August 2014.

1. Aim of the PDGene database

The aim of the PDGene database is to provide a comprehensive compendium of genetic association results in the field of Parkinson’s disease (PD; Lill et al, 2012). In earlier versions of PDGene this was achieved by identifying individual genetic association studies via literature screening and manually combining published results by meta-analysis. For this updated version of PDGene, we follow a much more extensive approach by providing details of the most recent and largest meta-analysis of genome-wide association studies (GWAS) generated for populations of European descent; (as published in Nalls et al, 2014; see below for more details). We plan to include results from additional PD GWAS and other large-scale genotyping efforts in European as well as other ethnicities in the future. Authors of such studies are encouraged to contact us in case they are willing to share their large-scale genotyping data for inclusion in PDGene. In addition, we will consider the inclusion of specific “rare variants” (i.e. those with a minor allele frequency <0.1%, and therefore not covered in Nalls et al, 2014).

A non-maintained, older version of the PDGene database can be found at http://archive.pdgene.org. That archive version includes details for all studies, data, and results originally published in Lill et al, 2012. Note, however, that the new version of PDGene (i.e. the one you are currently visiting) provides much more comprehensive results for essentially all polymorphisms also displayed in the previous version!

2. Current content of the PDGene database

The data currently available on PDGene include all results pertaining to the discovery phase of the GWAS meta-analysis by Nalls et al, 2014. This includes data on 7,782,514 genetic variants in up to 13,708 PD cases and 95,282 control from 15 independent GWAS datasets of European descent. Variants were imputed using the August 2010 release of the 1000 Genomes Project European-ancestry haplotype reference set and filtered according to standard quality control criteria. In line with the criteria applied in the published study (Nalls et al, 2014), only variants with a minor allele frequency ≥0.1% and those assessed in at least 3 of the 15 datasets have been included for display in PDGene. The database also includes association results of genotyping data generated on the “NeuroX chip”, a semi-custom genotyping array, on 5,353 PD cases and 5,551 controls of European descent for the most significantly associated polymorphisms from the discovery phase (i.e. across 26 loci showing genome-wide significant association (p <5x10-8) in the discovery phase with PD risk) as well as for 6 additional, previously reported GWAS signals. Details on the included datasets as well as all genotyping procedures and statistical analyses can be found in our original publication Nalls et al, 2014).

In order to ensure the privacy of individuals who participated in the GWAS which contributed to the meta-analysis results posted on PDGene (see Craig et al, 2011, and references therein for an introduction into the topic), we have devised the data-sharing strategy described below. We believe that this approach allows both to provide meaningful summaries of all genome-wide association results while at the same time ensuring the protection of the participants’ identity:

For all of the top 10,000 most significant GWAS meta-analysis results, detailed information including summary odds ratios, 95% confidence intervals, and p values are available on PDGene. This includes visual summaries of the aggregated data (as “forest plots”) providing details of dataset-specific results. The underlying meta-analyses are based on a fixed-effect model assuming additive effects; forest plots have been created using a customized R script (see Lill et al, 2012).

The remainder of the results, i.e. those not falling into the "top 10,000" category described above, are displayed at substantially reduced precision by binning meta-analysis p values into categories minimally encompassing two orders of magnitude (see below) and by only providing the sign of the effect direction (i.e. “>1” for risk effect estimates and “<1“ for protective effect estimates of allele 1 vs. reference allele 2 as indicated in the corresponding meta-analysis tables).

To this end, the following p value bins have been defined for meta-analysis results not included in the “top 10,000” category:

“≥0.05” corresponds to p values ≤1 and ≥0.05
“<0.05” corresponds to p values <0.05 and ≥1x10-4
“<1x10-4” corresponds to p values <1x10-4 and ≥1x10-6
“<1x10-6” corresponds to p values <1x10-6 and ≥5x10-8
“<5x10-8” corresponds to p values <5x10-8

Owing to data sharing restrictions, only results for the “top 10,000” polymorphisms of the two GWAS datasets contributed by “23andMe” could be shared on PDGene. For all remaining and “binned” meta-analysis results, these datasets needed to be removed for online display. This explains why some meta-analysis results beyond the “top 10,000” category are different on PDGene vs. those provided in our original publication and why the number of meta-analysis results differ slightly (Nalls et al, 2014).

We will continuously review the scientific literature on the topic of privacy protection in the context of genomic studies and will adjust our data sharing strategy accordingly should this be deemed necessary by us or the members of the scientific advisory boad.

3. How to search the database

The universal search field located on PDGene's homepage and in the main menu can be used to search for official gene names and commonly used aliases, NCBI's “dbSNP” official SNP identifiers ("rs-numbers") as well as chromosomal locations based on the current human genome build (GRCh37/hg19). Chromosomal locations can be entered for entire chromosomes (e.g. "chr4"), chromosomal intervals (e.g. "chr4:89000000-91000000”) or single base pair positions (e.g. "chr4:89704960").

4. The "Top Results" list

The “Top Results” list is currently being curated and will become available shortly.

In an effort to facilitate the identification of the most promising meta-analysis results available in PDGene, a continuously updated list displaying the most strongly associated variants/loci (“Top Results”) is available on the website. This list includes genes/loci that contain at least one variant that i) shows genome-wide significant (p <1x10-5) association with PD in the GWAS meta-analysis and ii) which, if replication on the NeuroX chip was attempted, showed significant (p <0.05) association in that validation phase. Ranking is based on the most significant meta-analysis result per locus. While we believe that the top results list represents an up-to-date summary of established as well as particularly promising PD candidate genes, we cannot exclude the possibility that a subset of these "Top Results" may still represent false-positive findings.

Of note, SNPs are automatically annotated to the nearest gene within an interval of +/- 50kb based on the Ensembl database (the distance is indicated in brackets behind the gene name); in cases where this region does not contain any known gene, only the SNP identifier ("rd-number") is being listed. When interpreting the association results, it should be noted that the nearest gene is not always the most compelling functional candidate underlying the statistical association signal.

5. Mendelian PD genes

PDGene database does not include or display reports of causal mutations leading to PD of classic Mendelian inheritance (such as disease-causing mutations in SNCA or LRRK2). Instead, it focuses on results from genetic association studies assessing polymorphisms with a minor-allele frequency of ≥0.1% in. For a collection of mutations in Mendelian PD genes, please consult the PDmutDB, curated by the VIB Department of Molecular Genetics of the University of Antwerp, and the The Parkinson’s disease mutation database, curated by the Parkinson's Institute from the Leiden University Medical Center. Note that meta-analysis results on polymorphisms with a minor allele frequency of 0.01% in any of the established Mendelian PD genes are still displayed on PDGene.

6. UCSC genome browser

In addition to being summarized directly on PDGene, all meta-analysis p values are also displayed on a customized UCSC genome browser track that allows to interpret the meta-analysis results with the help of other features potentially available in the same region of interest on this genome browser. This custom-track can be accessed by searching the PDGene database for a genetic term (e.g. a gene name, e.g. “SNCA”, an rs-number, e.g. “rs356182”, or a chromosomal region such as “chr4:89000000-91000000” using hg19 coordinates) and by clicking the “Browse” button on the PDGene homepage. Furthermore, there are additional instances throughout PDGene (e.g. chromosomal locations for rs-numbers or genetic regions) that are cross-linked to the custom-track on the UCSC genome browser.

The sharing of p values on the UCSC genome browser follows the same approach as elsewhere on PDGene (i.e. detailed p values are only displayed for polymorphisms in the “top 10,000” category, while all others are “binned” and displayed with the value of the lower bound of the respective bin; see above). To facilitate the distinction between both types of results on the UCSC genome browser, polymorphism names (“rs numbers”) are highlighted in green for SNPs with fully resolved p values while those with binned p values are displayed in black font on the UCSC custom track.

7. The “MyMeta” function

On this version of PDGene, users have the possibility to create customized meta-analyses for selected polymorphisms of the “top 10,000” SNPs in PDGene using the same meta-analysis script that was also used for the computation and visualization of all other meta-analysis results on PDGene, i.e. based on a fixed-effect model assuming additive effect estimates. For instance, this new functionality allows users to add their own genetic data (i.e. genotype summary data or additive ORs and standard errors, e.g. from a yet unpublished genetic association study) to the meta-analyses of any of the “top 10,000” SNPs to calculate an updated summary result. Alternatively or in addition, users can use this functionality to exclude specific datasets included in the default PDGene meta-analyses and recalculate the summary OR. Please note that all meta-analyses computed by the “MyMeta” functionality are generated "on the fly" in the sense that data submitted to PDGene will not be stored or tracked internally at any time. We explicitly encourage users to utilize the resulting meta-analysis forest plots as part of their own presentations and/or publications provided they adhere to the citing rules (see How to cite us).

8. Abbreviations/conventions used throughout PDGene

1000G = data derived from the 1000 Genomes project

A vs. B (allele contrast) = indicates direction of effect, where A denotes the effective odds ratio (protective or risk effect) and B the reference allele with an odds ratio of 1

C = Caucasian ethnicity

CEU = Utah residents with ancestry from northern and western Europe (one HapMap population used as part of the 1000 Genomes project)

CHB = Han Chinese in Beijing, China (one HapMap population used as part of the 1000 Genomes project)

CI = confidence interval

GWAS = genome-wide association study

hg19 = human genome build 19 (GRCh37/hg19, the currently most commonly used genome build on which the variants’ positions and gene annotations are based in PDGene)

I2 = amount of heterogeneity between study-specific results that is beyond chance

ICD = International Classification of Diseases

JPT = Japanese in Tokyo, Japan (one HapMap population used as part of the 1000 Genomes project)

OR = odds ratio

PD = Parkinson’s disease

SNP = single-nucleotide polymorphism

UKPDSBB = UK PD Society Brain Bank criteria for diagnosis of probable Parkinson's disease

9. References

Craig DW, Goor RM, Wang Z, Paschall J, Ostell J, Feolo M, Sherry ST, Manolio TA. Assessing and managing risk when sharing aggregate genetic variant data. Nat Rev Genet 2011;12(10):730-6

Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, Schjeide BM, Schjeide LM, Meissner E, Zauft U, Allen NC, Liu T, Schilling M, Anderson KJ, Beecham G, Berg D, Biernacka JM, Brice A, Destefano AL, Do CB, Eriksson N, Factor SA, Farrer MJ, Foroud T, Gasser T, Hamza T, Hardy JA, Heutink P, Hill-Burns EM, Klein C, Latourelle JC, Maraganore DM, Martin ER, Martinez M, Myers RH, Nalls MA, Pankratz N, Payami H, Satake W, Scott WK, Sharma M, Singleton AB, Stefansson K, Toda T, Tung JY, Vance J, Wood NW, Zabetian CP; 23andMe, The Genetic Epidemiology of Parkinson's Disease (GEO-PD) Consortium; The International Parkinson's Disease Genomics Consortium (IPDGC); The Parkinson's Disease GWAS Consortium; The Wellcome Trust Case Control Consortium 2 (WTCCC2), Young P, Tanzi RE, Khoury MJ, Zipp F, Lehrach H, Ioannidis JP, Bertram L. "Comprehensive Research Synopsis and Systematic Meta-Analyses in Parkinson's Disease Genetics: The PDGene Database." PLoS Genet. 2012; 8(3):e1002548

Nalls MA, Pankratz N, Lill CM, Do CB, Hernandez DG, Saad M, DeStefano AL, Kara E, Bras J, Sharma M, Schulte C, Keller MF, Arepalli S, Letson C, Edsall C, Stefansson H, Liu X, Pliner H, Lee JH, Cheng R, International Parkinson's Disease Genomics Consortium (IPDGC), Parkinson’s Study Group (PSG) Parkinson’s Research: The Organized GENetics Initiative (PROGENI), 23andMe, GenePD, NeuroGenetics Research Consortium (NGRC), Hussman Institute of Human Genomics (HIHG), The Ashkenazi Jewish Dataset Investigator, Cohorts for Health and Aging Research in Genetic Epidemiology (CHARGE), North American Brain Expression Consortium (NABEC), United Kingdom Brain Expression Consortium (UKBEC), Greek Parkinson’s Disease Consortium, Alzheimer Genetic Analysis Group, Ifram MA, Ioannidis JPA, Hadjigeorgiou GM, Bis JC, Martinez M, Perlmutter JS, Goate A, Marder K, Fiske B, Sutherland M, Xiromerisiou G, Myers RH, Clark LN, Stefansson K, Hardy JA, Heutink P, Chen H, Wood NW, Houlden H, Payami H, Brice A, Scott WK, Gasser T, Bertram L, Eriksson N, Foroud T, Singleton AB. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nat Genet 2014 Jul 27. [Epub ahead of print]