Updated 8 September 2009
1. Introduction
Parkinson's disease (PD) is a complex and heterogeneous disorder in both nosology and genetics (Forman, 2005). To date, mutations in several genes have been identified to cause predominantly early-onset parkinsonism, which typically follows Mendelian inheritance (Farrer, 2006; Gasser, 2009). PD without obvious familial aggregation ("idiopathic PD"), on the other hand, is likely governed by a variety of genetic and non-genetic factors that define an individual's risk to develop the disorder, as well as onset age, clinical presentation and progression (Pardo, 2005). It is still a matter of debate to which extent genetic factors contribute to PD risk in cases with onset beyond 50 years (e.g. Tanner, 1999; Sveinbjornsdottir, 2000; Maher, 2002; de la Fuente-Fernandez, 2003; Wirdefeldt, 2004; Lin, 2005).
In the past decade, literally hundreds of reports have been published claiming or refuting genetic association between putative PD genes and disease risk, onset age variation, or other phenotypic variables. Currently, 40 to 80 PD genetic association studies are being published annually from research groups worldwide. For the public but also for the PD genetics research community this wealth of information is becoming increasingly more difficult to follow, evaluate, and much less to interpret.
2. Database Organization and Methods
Overview
The goal of the PDGene database is to serve as a comprehensive, unbiased, publicly available and regularly updated collection of published genetic association studies performed on PD. Eligible publications are identified following systematic searches of scientific literature databases, as well as the table of contents of journals in genetics, neurology, and psychiatry. Data selected for display summarize key characteristics of the investigated study cohorts (e.g., gene overview), as well as genotype distributions in cases and controls (e.g., polymorphism details). For polymorphisms with genotype data in at least
four case-control samples, continuously updated random-effects meta-analyses
are presented (see
meta-analysis methods). Note that data obtained from family-based studies
are not included in the meta-analyses, as crude odds ratios cannot be readily
calculated from overall genotype distributions. However, these studies and
their qualitative results are still listed on the gene-summary pages of the
PDGene website (see
Table 2 for example).
To ensure the highest degree of scientific objectivity, only studies
published in peer-reviewed journals available in English are considered for
inclusion into the database. In particular, this precludes the inclusion of
data presented only in abstracted form, e.g. at scientific meetings. We
encourage authors of original reports fulfilling the above criteria to submit their data as soon as
their work is accepted for publication.
Meta-Analysis Methods
For all polymorphisms with minor allele
frequencies in healthy controls >1%, and for which case-control genotype
data are available in four or more independent samples, crude odds ratios (ORs)
and 95 percent confidence intervals (CIs) are calculated from the reported
allele distributions for each study. Summary ORs and 95 percent CIs are calculated using the DerSimonian and Laird (1986) random-effects model, which utilizes weights that
incorporate both within-study and between-study variance. This procedure is
done including all studies irrespective of ethnicity (denoted by "All
studies" on the meta-analysis figures), and repeated after exclusion of
the initial study ("All excl initial"), after exclusion of
studies in which a deviation of Hardy-Weinberg Equilibrium (HWE) was detected
in controls ("All excl HWE deviations"). In addition, ethnicity-specific meta-analyses are performed whenever genotype data is available from at least three independent case-control populations. Visually, the results of these meta-analyses are displayed for each polymorphism in form of forest plots. Overlapping samples (of which usually only the largest is included), studies with missing data, or control samples deviating from HWE are indicated on these graphs. Note that when only a few studies are included in the meta-analyses (i.e. less than ~10), the random effects model may yield summary ORs and confidence bounds that are slightly anti-conservative.
Along with the forest plots, which depict summary ORs only for the most current set of available data, the results of cumulative meta-analyses are displayed for each polymorphism with data available in at least four independent case-control datasets. These graphs display summary ORs (for the "All studies" paradigm using the same allele-based random-effects analyses as outlined above) recalculated after each study. Thus, these graphs allow an evaluation of the estimated summary effects over time. Note that the most current summary OR (listed at the very top of the cumulative meta-analysis plots) is identical to the "All studies" results depicted on the forest plots (see above).
Inclusion of Genome-wide Association Studies (GWAS)
The systematic inclusion of data from GWAS and other large-scale studies represents a conceptual and computational challenge for any genetic database. We have devised the following step-wise
protocol, which we believe allows us to capture the most relevant genetic information
without the need to include every data-point from these studies. Note that this
feature of PDGene is new and still under development. Please visit this page to see a summary of all published large-scale
studies currently included in PDGene.
Stage I: Represents the inclusion
of genes and polymorphisms “featured” or
highlighted by the authors of the GWAS or large-scale study, usually because they show
some degree of genetic association after completion of all analyses, e.g.
testing multiple independent samples. These genes and polymorphisms probably
represent the most important findings of each large-scale analysis and are
therefore included here with highest priority. Genomic loci that do not map within any known gene are represented by a surrogate name specifying the cytogenetic location (e.g. “GWA_1q25.12”). This stage has already been
implemented in the current version of PDGene (e.g. for the SEMA5A gene featured in the GWAS by Maraganore et al. [2005]).
For GWAS/large-scale
studies that have made their genotype data publicly available, we will also make use of “non-featured” genotype
distributions, i.e. of polymorphisms not believed to be associated with PD in
the original publications:
Stage II: Will add GWAS
genotype data for polymorphisms already available in PDGene, i.e. usually derived from candidate gene studies. GWAS data for
such overlapping polymorphisms will be added to the gene-specific entries and,
if genotype data is then available in a total of at least four independent
case-control samples, included and displayed in the meta-analyses. This stage adds valuable information to the existing PDGene meta-analyses as it is derived from assessments that are largely
unbiased with respect to gene function, in contrast to most conventional
candidate gene studies. See the entries for LINGO1 as an example of such "Stage II" analyses. Here, the original data published by Vilarino-Guell, 2009 on two independent cohorts would have precluded meta-analysis. However, one (rs9652490) of the two tested polymorphisms was also included in the GWAS by Fung, 2006 and Pankratz, 2008, which were added to the gene entry and then allowed meta-analysis of this SNP across all four datasets.
Stage III: Focuses on published meta-analyses of the existing GWAS datasets. Genes and loci resulting from these analyses are treated equivalently to the "featured genes" of Stage I (above). Genotype data on the gene-specific pages will be extracted from the primary GWAS studies (if their data is publicly available) and displayed alongside a "GWAS meta-analysis" entry. For an example of this stage, please consult the entry for TPRG1. This locus was originally identified in the GWAS meta-analysis by Evangelou, 2007. The genotype data used for this GWAS meta-analysis can be found in the respective entries from the original GWAS, i.e. Maraganore, 2005 (family-based and case-control), and Fung, 2006. This stage also entails the inclusion of more complex genetic analyses, e.g. those jointly analyzing large numbers of polymorphisms at different loci based on assumptions regarding the functional interconnection of these loci, e.g. in forms of "pathways". To the degree that it can be achieved in this context, these pathway-based results are labeled as such and stored in separate "unmapped" section of the database (see this entry as example).
Association studies on mitochondrial genes
Studies assessing a potential association between PD and genetic variants in the mitochondrial (mt) genome are subject to the same inclusion criteria as studies investigating markers from the nuclear genome, and are displayed on a separate "chromosome graph" (which is adapted from imagery on the "Mito Map" website [http://www.mitomap.org/]). Owing to the specific characteristics of human mt-inheritance (e.g. its multicopy nature and the high frequency of somatic mutation events) and the innate heterogeneity of mt-association studies, however, genotype data from these studies are not included on PDGene and therefore not subject to meta-analysis.
For more details on inclusion criteria, literature searches, data-management
procedures, statistical analyses, and online database structure, please see Bertram et
al. (2007).
3. Summary of Meta-analysis Highlights:
The "Top Results" List
In an effort to facilitate the
identification of the most promising meta-analysis results available in
PDGene, a continuously updated list displaying the most strongly associated
genes ("Top Results") has been added to the homepage. The list is ranked by effect size, and only includes genes that contain at least one variant showing a nominally significant summary OR in the analysis of all ethnic groups (“All”), or ethnicity-specific meta-analyses with at least three available datasets (see above). While we believe that this list represents an up-to-date summary of particularly promising PD candidate genes that warrant follow-up with high priority, we note that many of these may represent false-positive findings, in particular those based on small (<10) sample sizes.
4. Mendelian PD Genes
Note that the PDGene database will not sample and catalogue reports of rare causal mutations leading to PD or parkinsonism of classic Mendelian inheritance (such as disease-causing mutations in a-synuclein or parkin), but will only focus on genetic association studies using polymorphisms with a minor-allele frequency of 1% or greater in at least one of the studied control populations. For a collection of Mendelian PD genes, please consult the Parkinson's disease Mutation Database curated by the Parkinson's Institute from Leiden University Medical Center, or the Mutation Database for Parkinson's Disease curated by the Institute for Infocomm Research in Singapore. Note that despite the exclusion of disease-causing mutations, data from bona fide association studies using common polymorphisms in any of the Mendelian PD genes will be included and meta-analyzed for PDGene (e.g. see the entries for SNCA or LRRK2).
References
Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE. (2007)
"Systematic meta-analyses of Alzheimer disease genetic association
studies: the AlzGene database." Nat Genet 39(1): 17-23. Abstract
de la Fuente-Fernandez, R. (2003). "A note of caution on correlation between sibling pairs." Neurology 60(9): 1561; author reply 1561.
DerSimonian R, Laird N. "Meta-analysis in clinical trials. Control Clin
Trials." 1986 Sep;7(3):177-88.
Egger M, Davey Smith G, Schneider M, Minder C. "Bias in meta-analysis
detected by a simple, graphical test." BMJ. 1997 Sep 13;315(7109):629-34.
Evangelou E, Maraganore DM, Ioannidis JP. "Meta-analysis in genome-wide association datasets: strategies and application in Parkinson disease." PLoS ONE. 2007 Feb 7;2:e196.
Farrer, M.J. (2006). "Genetics of Parkinson disease: paradigm shifts and future prospects." Nat Rev Genet 7(4):306-18.
Forman MS, Lee VM, Trojanowski JQ. (2005). "Nosology of Parkinson's disease: looking for the way out of a quagmire." Neuron 47(4):479-82.
Fung HC, Scholz S, Matarin M, Simon-Sanchez J, Hernandez D, et al. "Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data." Lancet Neurol. 2006 Nov;5(11):911-6.
Gasser T " Molecular pathogenesis of Parkinson disease: insights from genetic studies." Expert Rev Mol Med. 2009 Jul 27;11:e22.
Lin MT Simon D.K. (2005). Comment on "No evidence for heritability of Parkinson disease in Swedish twins." Neurology.64(5):932
Maher, N. E., L. I. Golbe, et al. (2002). "Epidemiologic study of 203 sibling pairs with Parkinson's disease: the GenePD study." Neurology 58(1): 79-84.
Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ, Rocca WA, Pant PV, Frazer KA, Cox DR, Ballinger DG. "High-resolution whole-genome association study of Parkinson disease." Am J Hum Genet. 2005 Nov;77(5):685-93.
Pardo, L.M., van Duijn, C.M. (2005). "In search of genes involved in neurodegenerative disorders." Mutat Res 30;592(1-2):89-101.
Tanner, C. M., R. Ottman, et al. (1999). "Parkinson disease in twins: an etiologic study." Jama 281: 341-6.
Sveinbjornsdottir S, Hicks AA, et al (2000). "Familial aggregation of Parkinson's disease in Iceland." N Engl J Med 343(24):1765-70.
Wirdefeldt K, Gatz M, et al. (2004). "No evidence for heritability of Parkinson disease in Swedish twins." Neurology 63(2):305-11.
|