Genome-Wide Association Study (GWAS)
(Modified on the Wikipedia article)
Genome-wide association study is a novel and rapidly advancing approach in the field of molecular genetics that has led to exciting discoveries and hold great promises in delineating how our genomic information makes up what we are in health and sickness. This practical introduction describes the concept of GWA study, its approach, methods, clinical applications, its limitations and its future trends.
What is Genome-wide association study?
A genome-wide association study (GWAS), also known as whole genome association study (WGA study, or WGAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.
Any two human genomes differ in millions of different ways, ranging from variations in the individual nucleotides of the genomes (SNPs) to variations caused by deletions, insertions and copy number variations. Any of these may cause alterations in an individual's traits, or phenotype, which can be anything from disease risk to physical properties such as height. Prior to the introduction of GWA studies, the primary method of investigation was inheritance studies of genetic linkage in families. This approach had proven highly useful to identify single gene disorders. However, for common and complex diseases the results of genetic linkage studies proved hard to reproduce. The genetic association study was then developed as an alternative to linkage studies at detecting weak genetic effects.. The genetic association study asks if the allele of a genetic variant is found more often than expected in individuals with the phenotype of interest (e.g. with the disease being studied).
GWA studies normally compare the DNA of two groups of participants: people with the disease (cases) and similar people without (controls). Each person gives a sample of DNA, from which millions of genetic variants are read using SNP arrays. If one type of the variant (one allele) is more frequent in people with the disease, the SNP is said to be "associated" with the disease. The associated SNPs are then considered to mark a region of the human genome which influences the risk of disease. In contrast to methods which specifically test one or a few genetic regions, the GWA studies investigate the entire genome. GWA studies identify SNPs and other variants in DNA which are associated with a disease, but cannot on their own specify which genes are causal.
The advent of biobanks, which are repositories of human biological material, the International HapMap Project which from 2003 had identified a large number of the common SNPs, and the development of the methods to genotype all these SNPs using genotyping arrays greatly facilitate the development and progress of GWA studies. are interrogated in a GWA study. GWA studies can focus on a subset of key SNPs that would describe most of the variation.
How are GWA studies actually conducted?
The most common approach of GWA studies is the case-control setup which compares two large groups of individuals, the normal group as control and one case group affected by a disease. All individuals in each group are genotyped for the majority of common known SNPs. The number of SNPs to be interrogated depends on the genotyping technology, but are typically one million or more.
Then the number of copies of a particular allele characterized by the SNPs divided by the number of copies of all alleles at the genetic place (locus) in a population known as allele frequency is derived from the results. Whether the allele frequency is significantly altered between the case and the control group is determined by the odds ration and statistical analysis. In such setups, the fundamental unit for reporting effect sizes is the odds ratio. For the GWA studies, the odds ratio reports the ratio between the proportion of individuals in the case group having a specific allele, and the proportions of individuals in the control group having the same allele. Additionally, a P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease. Calculations are typically done using bioinformatics software such as PLINK, which also includes support for many of these alternative statistics.
In addition to the calculation of association, it is common to take several variables into account that could potentially confound the results. Common examples include sex and age geographical and ethnical background of participants.
After odds ratios and P-values have been calculated for all SNPs, a common approach is to create a Manhattan plot. In the context of GWA studies, this plot shows the negative logarithm of the P-value as a function of genomic location. Thus the SNPs with the most significant association will stand out on the plot, usually as stacks of points because of haploblock structure. Importantly, the P-value threshold for significance is corrected for multiple testing issues. The exact threshold varies by study, but typically P-values must be very low (10 to the power of -7 or -8) to be considered significant in the face of the millions of tested SNPs. Modern GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort.
What are Clinical Applications of GWA study?
The results of GWA studies are expected to help us understand the molecular genetics underlying human diseases and our biological responses to external stimuli such as causes of diseases and therapies. The first successful GWA study was published in 2005 and investigated patients age-related macular degeneration. It found two SNPs which had significantly altered allele frequency when comparing with healthy controls. As of 2011, hundreds or thousands of individuals are tested, over 1,200 human GWA studies have examined over 200 diseases and traits, and almost 4,000 SNP associations have been found. Another successful example illustrates the prognostic value in predicting therapeutic response. GWA studies have identified the genetic variant associated with response to anti-hepatitis C virus treatment. For genotype 1 hepatitis C treated with Pegylated interferon-alpha-2a or Pegylated interferon-alpha-2b combined with ribavirin, a GWA study has shown that SNPs near the human IL28B gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus.
What are the Limitations of GWA studies?
A central point of debate on GWA studies has been that most of the SNP variations found by GWA studies are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio is 1.33 per risk-SNP, with only a few showing odds ratios above 3.0. These magnitudes of effects are considered small because they do not explain much of the heritable variation. This heritable variation is known from heritability studies based on monozygotic twins. For example it is known that 80–90% of height is heritable. This means that if 29 cm separates the tallest 5% from the shortest 5% of the population, then genetics account for 27 cm. Of these 27 cm, however, the GWA studies only account for a minority. In the height example it is 6 cm, and for most other major complex phenotypes it is a similar small fraction. A small effect ultimately translates into a poor separation of cases and controls and thus only a small improvement of prognosis accuracy. This leads to a more fundamental criticism of GWA studies aiming at the assumption that common genetic variation plays a large role in explaining the heritable variation of common disease. GWAS studies identify risk-SNPs, but not risk-genes, and specification of genes is one step closer towards actionable goals.
Technically, GWA approach can also be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results. In addition, lack of well defined case and control groups, insufficient sample size, control for multiple testing and control for population stratification are common problems. However, these are generally considered to be preventable or correctable issues.
What are The New Trends and Future?
Since these first landmark GWA studies, there have been two general trends. One has been towards larger and larger sample sizes. At the end of 2011, the largest sample sizes were in the range of 200,000 individuals. The reason is the drive towards reliably detecting risk-SNPs that have smaller odds ratios and lower allele frequency. Another trend has been towards the use of more narrowly defined phenotypes, such as blood lipids, proinsulin or similar biomarkers. These are termed intermediate phenotypes and their analyses are suggested to be of value to functional research into biomarkers.
The rapidly decreasing price of complete genome sequencing have also provided a realistic alternative to genotyping array-based GWA studies. The high-throughput sequencing has potential to side-step some of the shortcomings of non-sequencing GWA. There is also increased interest in the association between risk-SNPs and the expression of nearby genes, the so-called expression quantitative trait loci (eQTL) studies. The eQTL studies can be seen as a logic extension of GWAS studies that identify risk-SNPs, but not risk-genes. In fact, major GWA studies of 2011 typically included extensive eQTL analysis. One of the strongest eQTL effects observed for a GWA-identified risk SNP is the SORT1 locus. Functional follow up studies of this locus using small interfering RNA and gene knock-out mice have shed light on the metabolism of low-density lipoproteins, which have important clinical implications for cardiovascular disease. But then will the use of this new hybrid technique still be referred to as a GWA study?
Reference / Suggested Readings
Wikipedia: Genome-wide association study. (Accessed: February 2013)
B. E. Stranger et al.: Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics. Genetics 187 (2): 367-383, 2011.
Z. K. Stadle et al: Genome-Wide Association Studies of Cancer: Principles and Potential Utility. Oncology. 24 (7): 1-2, 2010.
W. G, Feero & A. E. Guttmacher: Genomewide Association Studies and Assessment of the Risk of Disease. N Engl J Med 363 (2):166-176, 2010.
J. Hardy & A. Singleton: Genomewide Association Studies and Human Disease. N Engl J Med 360 (17): 1759-1768, 2009.
|