Academia.eduAcademia.edu

Studying the epigenome using next generation sequencing

2011, Journal of Medical Genetics

Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods REVIEW Studying the epigenome using next generation sequencing Chee Seng Ku,1,2 Nasheen Naidoo,2 Mengchu Wu,1 Richie Soong1 1 Cancer Science Institute of Singapore, National University of Singapore, Singapore 2 Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore Correspondence to Chee Seng Ku, Cancer Science Institute of Singapore, National University of Singapore, Singapore; [email protected] Received 5 June 2011 Revised 24 June 2011 Accepted 25 June 2011 Published Online First 8 August 2011 ABSTRACT The advances in next generation sequencing (NGS) technologies have had a significant impact on epigenomic research. The arrival of NGS technologies has enabled a more powerful sequencing based methoddthat is, ChIP-Seqdto interrogate whole genome histone modifications, improving on the conventional microarray based method (ChIP-chip). Similarly, the first human DNA methylome was mapped using NGS technologies. More importantly, studies of DNA methylation and histone modification using NGS technologies have yielded new discoveries and improved our knowledge of human biology and diseases. The concept that cytosine methylation was restricted to CpG dinucleotides has only been recently challenged by new data generated from sequencing the DNA methylome. Approximately 25% of all cytosine methylation identified in stem cells was in a non-CG context. The non-CG methylation was more enriched in gene bodies and depleted in protein binding sites and enhancers. The recent developments of third generation sequencing technologies have shown promising results of directly sequencing methylated nucleotides and having the ability to differentiate between 5-methylcytosine and 5-hydroxymethylcytosine. The importance of 5-hydroxymethylcytosine remains largely unknown, but it has been found in various tissues. 5-hydroxymethylcytosine was particularly enriched at promoters and in intragenic regions (gene bodies) but was largely absent from non-gene regions in DNA from human brain frontal lobe tissue. The presence of 5-hydroxymethylcytosine in gene bodies was more positively correlated with gene expression levels. The importance of studying 5-methylcytosine and 5-hydroxymethylcytosine separately for their biological roles will become clearer when more efficient methods to distinguish them are available. INTRODUCTION Epigenetics refers to the mechanisms that regulate the cell type or tissue specific transcription or gene expression levels without altering the DNA sequences, through biochemical modifications such as the addition of a methyl group to cytosines, and post-translational modifications of histone proteins. These epigenetic mechanisms play a critical role in the normal stages of cellular developmental and processes such as embryogenesis, cell differentiation (cell lineage specification), inactivation of the X chromosome and genomic imprinting through modulation of transcriptional regulation in a tissue specific manner. Abnormalities in these J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 epigenetic mechanisms have been linked to a wide range of diseases.1e6 The importance of exploring the epigenetics of human complex diseases and traits is now being increasingly recognised. Despite the success of genome-wide association studies (GWAS) in identifying thousands of new genetic variants or loci for complex diseases and traits, most of these identified genetic variants confer modest effect sizes with an OR <1.5. Therefore, these genetic variants collectively account for only a small fraction of the heritability of complex phenotypes.7e9 Epigenetic mechanisms are a potential source of the missing heritability. Theoretically, only the germline heritable epigenetic events (inherited through meiosis) may contribute towards the missing heritability as opposed to the non-germline heritable components (inherited through mitosis or somatic epigenetic events). However, it is challenging to determine which epigenetic events are germline heritable. Comparing the epigenetic profiles between the ‘disease epigenome’, such as the cancer epigenome, with the epigenome from constitutional DNA from the same individual as a reference, is required for distinguishing between germline and somatic epigenetic events. These approaches were also applied previously in cancer genome sequencing studies for a proper assessment of somatic mutations.10 11 In addition, twin studies are also a powerful study design to investigate epigenetic heritability.12e14 Molecular mechanisms of heritability may not be limited to DNA sequence differences.12 The finding that monozygotic twins have very similar epigenetic profiles has indicated a high epigenetic heritability.15 In addition, twin studies in epigenetics would be able to: (1) address the extent of and variation in epigenetic heritability across the genome; and (2) investigate whether non-germline heritable (or somatic) epigenetic events contribute to complex phenotypes in monozygotic twins discordant for the phenotypes. Epigenetic studies of disease discordant monozygotic twins offer several advantages in detecting disease related epigenetic differences over a study design involving unrelated disease cases and controls.16e21 The arrival of next generation sequencing (NGS) technologies in 2005 led to a paradigm shift in the approaches to investigating functional genomics in the human genome.22e24 Functional genomics aims to interrogate the functional elements and regulatory mechanisms in the genome including DNA methylation and histone modifications. Before 2005, the genome-wide epigenetic (epigenomic) studies were dependent mainly on DNA 721 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods microarray methods, such as the ChIP-chip method (chromatin immunoprecipitation based on microarray hybridisation of the immunoprecipitated DNA fragments), to study histone modifications and ‘DNA methylation microarrays’. However NGS technologies were swiftly integrated into epigenomic studies and several new and innovative sequencing based methods have been developed together with bioinformatic and analytical tools.25 26 For example, sequencing based approaches such as ChIP-Seq (chromatin immunoprecipitation based on sequencing of the immunoprecipitated DNA fragments) have replaced microarray experiments in the study of histone modifications, and the sequencing of the human whole genome DNA methylation (the DNA methylome) after bisulfite conversion has also become feasible and replaced the microarrays targeting preselected CpG sites.27e30 Over the past several years, NGS technologies have contributed significantly to advances in epigenomics and provided new alternatives to study the human epigenome at a much greater resolution.31e35 The aim of this paper is to review the recent advances in epigenomics using NGS technologies and the impact on our understanding of human biology and diseases. We focus mainly on DNA methylation and histone modification, which are relatively well studied epigenetic mechanisms. We also discuss the extent to which germline heritable epigenetic mechanisms explain the missing heritability, and the challenges faced by studies attempting to examine this largely unexplored epigenetic component in complex diseases and traits. Finally, we also highlight the new opportunities offered by third generation sequencing (TGS) technologies in studying epigenomics. are wrapped around by 146 bp of DNA. The N-terminal tails of histone polypeptides can be modified by more than 100 different post-translational modifications including methylation, acetylation, phosphorylation, and ubiquitination (collectively known as histone modifications). Similarly to DNA methylation, the histone modifications can regulate transcription through modification of the chromatin structure or through chromatin condensation. Although their role, function, and relationship to transcriptional regulation for most of these histone modifications remains poorly understood, considerable progress has been achieved in recent years through studies applying the ChIP-chip and ChIP-Seq approaches. For example, methylation of histone H3 lysine 4 (H3K4) and H3 lysine 36 is associated with transcription activation. In contrast, methylation of H3 lysine 9 (H3K9), H3 lysine 27 (H3K27), and H4 lysine 20 (H4K20) is correlated with repression of transcription.31e34 Figure 1 illustrates the epigenetic mechanisms. An important prerequisite for further understanding of the role and function of epigenetics in normal biology and the development of disease is the ability to investigate comprehensively the pattern and distribution of epigenetic markers in the whole genome from multiple tissue types, which requires newer and more powerful methods. In the following sections, we briefly discuss the traditional methods used to interrogate epigenetic mechanisms in order to appreciate the improvements made by sequencing based approaches enabled by NGS and TGS technologies. TRADITIONAL METHODS IN STUDYING EPIGENOMICS DNA METHYLATION AND HISTONE MODIFICATIONS Outlining genome-wide DNA methylation and histone modifications has important implications for our understanding of normal biology, as well as how their molecular aberrations contribute to the development of human diseases. The most extensively studied epigenetic mechanism is DNA methylation or, more specifically, cytosine methylation, which is the addition of a methyl group at the carbon 5 position of cytosine through DNA methyltransferase (DNMT) enzymes in human genome. These enzymes (DNMT3A and DNMT3B) catalyse the de novo covalent addition of a methyl group to cytosine in newly synthesised DNA. Cytosine methylation plays an important role in transcriptional regulation and hypermethylation of the CpG islands (regions with a high density of CpG dinucleotides) in promoter regions, which is frequently associated with gene silencing. In differentiated cells, cytosine methylation occurs almost exclusively in CpG dinucleotides, and most CpG sites in the genome are methylated. However, CpG islands in the promoter regions in the majority of human genes are not methylated, indicating a transcriptional active state (except in imprinted genes which were silenced by DNA methylation).27 35 Aberrations in DNA methylation can result in diseasedfor example, the cancer genome is usually characterised by global hypomethylation leading to genomic instability and focal hypermethylation, such as in the promoter CpG islands of tumour suppressor genes leading to gene silencing.5 6 36 37 The concept that cytosine methylation was restricted to CpG dinucleotides has only been recently challenged by new data generated from sequencing the DNA methylome.38 Histone modifications are another important epigenetic mechanism in transcriptional regulation. The nucleosome, the fundamental unit of chromatin, is composed of two copies of each of the four core histones (ie, H2A, H2B, H3, and H4) which 722 Methods developed to study DNA methylation include the use of methylation sensitive restriction enzymes, affinity enrichment using antibodies specific to 5-methylcytosine (5mC), and bisulfite conversion. However, a number of limitations are associated with each of these methodsdfor example, enzyme digestion methods are restricted to restriction enzyme recognition sites and as a result only a very small subset of all methylation sites can be interrogated. In comparison, methods that rely on affinity enrichment using antibodies are biased towards enrichment of sites containing relatively high levels of cytosine methylation, namely, CpG islands. These methods were coupled with microarray based methods to enable genome-wide analysis of DNA methylation and histone modifications based on the ChIP-chip method.27 31 39 40 A limitation of microarray based methods is that they do not allow a truly ‘comprehensive’ interrogation of DNA methylation and histone modifications throughout the whole genome, as synthesising the probes for microarrays requires prior knowledge of the regions to be targeted. Thus only those genomic regions that are probed by the microarrays will be interrogated. For example, this is evident from the conventional ChIP-chip studies where the immunoprecipitated DNA fragments that are associated with a specific histone modification could not be detected unless there are probes covering the genomic regions. In contrast, sequencing based approaches such as ChIP-Seq, in theory, are able to capture all the DNA fragments that are isolated by immunoprecipitation if the sequencing depth or coverage is sufficient.24 28 29 Similarly, the current DNA methylation microarrays are only able to interrogate a small fraction of the entire DNA methylome. High density microarrays such as the Infinium Human Methylation 450 BeadChip allow researchers to investigate >450 000 CpG sites out of the approximately 28 million CpG sites in the human genome (<0.02%).41 42 There is also ascertainment bias in selecting these J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods >450 000 CpG sites to be interrogated. This type of microarray is widely accepted as a genome-wide tool for DNA methylation yet it is still restricted to preselected CpG sites. This incomplete interrogation of DNA methylation patterns in the human genome will reduce the power of discoveries. For example, methylation in the non-CG context, revealed through sequencing of the entire DNA methylome, would be overlooked when using microarrays.38 Table 1 summarises the pros and cons of the common approaches to mapping DNA methylation and histone modifications and figure 2 illustrates the ChIP-chip and ChIP-seq methods. Readers should refer to these references for a more comprehensive review of different methodologies in studying DNA methylation and histone modifications.27e29 HIGH THROUGHPUT SEQUENCING TECHNOLOGIES Next generation sequencing technologies (ie, Roche 454 GS FLX, Illumina GA and HiSeq and Life Technologies SOLiD) have become important tools in epigenomic research. These platforms are characterised by the ability to sequence a very large number of sequence reads in paralleldthat is, massively parallel sequencing. However, the Roche 454 GS FLX can only generate approximately one million longer sequence reads (w400 bp) per instrument run, in comparison to the Illumina and Life Technologies sequencing machines where several hundred million shorter sequence reads (<150 bp) are produced. Thus, the Illumina and Life Technologies sequencing platforms offer an advantage for the ChIP-Seq experiments that require a high coverage of sequence reads to detect the enrichment of DNA fragments specific to a particular histone modification after immunoprecipitation.43e45 NGS technologies have enabled an unprecedented scale of sequencing success in epigenomics. For example, Lister et al (2009) sequenced the shotgun libraries prepared from bisulfite treated genomic DNA using the Illumina sequencing platform for the genomes of stem cells and fibroblasts. They were able to cover 94% of all cytosines in the genome and generated 87.5 and 91.0 gigabases of sequencing data, respectively. This amount of data is well beyond the capacity of Sanger sequencing.38 This clearly shows that the throughput of NGS technologies has enabled a much larger scale of DNA methylation studies to be performed. However, sequencing of DNA methylation still uses the conventional approach of bisulfite conversion to differentiate methylated from unmethylated cytosines of which there are several disadvantages (please see ‘New opportunities from TGS technologies’). Third generation sequencing technologies, such as SMRT sequencing, offer new opportunities to sequence directly the methylated cytosines without bisulfite conversion.46 47 NEW BIOLOGICAL INSIGHTS DNA methylation The arrival of NGS technologies has led to the completion of a number of DNA methylome studies at a single base resolution.38 48 The prevailing view that DNA methylation occurs predominantly at CpG dinucleotides in the human genome has been challenged with new findings from recent studies harnessing the power of NGS technologies. The comparison of the DNA methylome between human stem cells and fetal fibroblasts has revealed substantial differences in terms of the composition and pattern of cytosine methylation between the two different genomes. Approximately 25% of all cytosine methylation identified in stem cells was in a non-CG context compared to fibroblasts where almost all of the cytosine methylation was in the CG context. The substantial fraction of non-CG methylation was first revealed through this study. This suggests that embryonic stem cells may use different DNA methylation mechanisms in transcriptional regulation to maintain their pluripotency compared to differentiated cells. The methylation in the non-CG context also showed different patternsdthat is, non-CG methylation were more enriched in gene bodies and depleted in protein binding sites and enhancers. This, coupled with the finding that non-CG methylation in gene bodies was positively correlated with gene expression, offers insights into the regulation of the mechanisms of gene expression.38 More interestingly, the non-CG methylation disappeared after differentiation of the stem cells, clearly suggesting that non-CG methylation is an important epigenetic mechanism occurring Figure 1 The well studied epigenetic mechanisms are DNA methylation and histone modification. DNA methylation or, more specifically, cytosine methylation is an addition of a methyl group at the carbon 5 position of cytosine. The N-terminal tails of histone polypeptides can be modified by more than 100 different post-translational modifications (collectively known as histone modifications). Both DNA methylation and histone modification are important in transcriptional regulations, and aberrations of these epigenetic mechanisms are associated with various diseases such as cancer and autoimmune disease. J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 723 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods Table 1 Common approaches to mapping DNA methylation and histone modifications Method Pros Cons Most suitable application Bisulfite treatment < Effectively converts an epigenetic < Incomplete conversion < Degradation of DNA < Not easily adapted to array hybridisation High resolution study of DNA methylation at small or large scale difference into a genetic difference, easily detectable by sequencing < Single base pair resolution Methylation sensitive restriction enzymes digestion < Highly sensitive, simple Affinity enrichment by 5-methylcytosine antibodies < Powerful tool for comprehensive profiling ChIP-chip ChIP-seq techniques < Limited by methylation d sensitive restriction enzyme cutting sites of DNA methylation in complex genomes < Rapid and efficient genome-wide assessment of DNA methylation < No information on individual CpG dinucleotides < Varying CpG density at different regions of the Targeted, site specific study of DNA methylation Rapid, large scale, low resolution study of DNA methylation genome requires computational adjustments < Possibility of antibody cross-reactivity < Productive for genome-wide mapping of < Requires a priori knowledge of the regions to histone modifications < Bioinformaticallly and analytically less challenging be probed < High signal-to-noise ratios < Limited dynamic range < Cross-hybridisation between similar sequences < Quantification of the signal from ChIP < Difficulty in the ambiguous aligning of short is based on counting the sequence reads < Wider dynamic range < Bar coding allows sample multiplexing for NGS reads in repetitive regions < Bioinformaticallly and analytically more challenging Histone modification and other DNAeprotein interactions Histone modification and other DNAeprotein interactions The information in this table is summarised from three excellent review papers by Laird, Hurd and Nelson, and Hirst and Marra.27e29 NGS, next generation sequencing. specifically in embryonic stem cells as well as induced pluripotent stem cells to maintain pluripotency. This also suggests that changes in epigenetic mechanisms are taking place during the cell differentiation stages. The non-CG methylation was restored in induced pluripotent stem cells from differentiated cells.38 Dynamic changes in the human methylome during differentiation were also demonstrated by another study through whole genome bisulfite sequencing, where the developmental stage was reflected in both the level of global methylation and extent of non-CpG methylation. The total level of global methylation and the degree of non-CpG methylation is inversely correlated to the level of differentiation.49 Figure 2 Genome-wide mapping of histone modifications and other DNAeprotein interactions has relied on (A) chromatin immunoprecipitation (ChIP) coupled with microarrays (ie, ChIP-chip) and (B) next generation sequencing (NGS) technologies (ie, ChIP-Seq) which have provided more precise and comprehensive landscapes of histone modifications in the entire genome. Reprinted by permission from Macmillan Publishers Ltd: Nat Rev Genet (9:179-91), copyright (2008). 724 J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods Figure 3 Cytosine methylation (5-methylcytosine) is catalysed by DNA methyltransferase (DNMT) enzymes. These enzymes (DNMT3A and DNMT3B) catalyse the de novo covalent addition of a methyl group to cytosine in newly synthesised DNA as well as maintenance of 5methylcytosine (DNMT1) during mitosis. 5-hydroxymethylcytosine is generated by TET (Tet) proteins through oxidation of 5-methylcytosine. Reprinted from Clin Chim Acta, 412, Dahl C, Grønbæk K, Guldberg P, Advances in DNA methylation: 5-hydroxymethylcytosine revisited, 831e6, Copyright (2011), with permission from Elsevier. It is expected that some differences in the composition and pattern of cytosine methylation are likely to be found between the genomes of cells from different developmental stages; however, these differences remain poorly understood. Most of the targeted sequencing approaches or microarrays have focused on CG methylation, from which subsequent discoveries could not have been made without sequencing the entire DNA methylome. These discoveries will also provide new directions to study the prevalence and pattern of non-CG methylation in different adult stem cells that are found in various tissue types and differentiated cell types. These studies will also elucidate whether non-CG methylation is exclusively confined to embryonic stem cells, or if it also occurs in adult stem cells. The knowledge of the roles of non-CG methylation in maintaining the pluripotency and inducing the differentiation of adult stem cells will have important implications for regenerative medicine. Although Lister et al (2009) completed sequencing the DNA methylome of stem cells and differentiated fibroblasts, their study used bisulfite sequencing and was thus unable to distinguish between 5mC and 5-hydroxymethylcytosine (5hmC). Therefore it is unclear whether and to what extent there were differences in the composition of 5mC and 5hmC between the genomes of stem cells and fibroblasts. Furthermore, methylation beyond the CG and non-CG contexts such as 5-methyladenine or methylation in other nucleotides can also not be studied. These are the major limitations of the current methods using bisulfite conversion and NGS to study DNA methylation. The ability to investigate 5hmC and methylation of other nucleotides will be important and is anticipated to provide further biological insights into human biology and diseases.50 The absence of non-CG methylation in differentiated cells was also corroborated by sequencing the entire DNA methylome of peripheral blood mononuclear cells. The investigators found that <0.2% of non-CG sites were methylated, suggesting that non-CG methylation is minor in human peripheral blood mononuclear cells.48 In addition, this study also investigated allele specific methylation (ie, paternal and maternal alleles can exhibit different methylation patterns) between the two haploid methylomes by integrating the methylome data with the whole genome sequencing data that were generated previously.51 This led to identification of 599 haploid differentially methylated regions covering 287 genes and demonstrated that allele specific methylation is highly correlated with allele specific expression.48 The assumption that DNA methylation regulates gene expression mainly through its effects at 59 promoters has also been challenged by recent studies applying NGS based J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 approaches.52 A recent study has shown that most tissue specific gene regulation mediated by DNA methylation occurs at alternative promoters located in gene bodies instead of 59 promoters. More specifically, the majority of methylated CpG islands were shown to be in intragenic and intergenic regions. In contrast, <3% of CpG islands in 59 promoters were methylated. This was found through generating a map of DNA methylation from the human brain encompassing 24.7 million of the 28 million CpG sites. Similarly, these distribution patterns across the genome would not have been unravelled if the studies were restricted to interrogating a proportion of CpG sites. The study showed that intragenic DNA methylation plays a role in regulating transcription from alternative promoters, and this is consistent with transcriptomic studies which have shown that many genes have alternative promoters within gene bodies.52 The NGS and sequencing based methods have contributed to the advances in the epigenomics of stem cell research and the knowledge of stem cell molecular biology. For example, it was found that induced pluripotent stem cells showed significant reprogramming variability, including aberrant reprogramming of DNA methylation through whole genome profiling of DNA methylation at a single base resolution in five human induced pluripotent stem cell lines together with embryonic stem cells, somatic cells, and differentiated induced pluripotent stem cells.53 Data also revealed that the epigenomic landscapes in embryonic stem cells and lineage committed cells are vastly different through comparisons of the chromatin modification profiles and DNA methylomes in the stem cells and primary fibroblasts. This has provided new insights into the epigenetic mechanisms modulating the properties of pluripotency and cell fate commitment.54 However, the bioethical issues surrounding the use of human embryonic stem cells in research have to be given serious consideration. Adult stem cells and induced pluripotent stem cells generated from somatic cells provide an alternative source and should be considered for use in stem cell research.55 56 In addition to the ‘conventional’ DNA methylation studies (ie, 5mC), research on 5hmC is also gaining its impetus. 5hmC is generated by TET proteins through oxidation of 5mC and is present at low levels in diverse cell types in mammals (figure 3).57 58 Currently, information on the genome-wide distribution of 5hmC is limited. However, two novel and specific approaches to profile the whole genome localisation of 5hmC have been developed recently. These two methods were developed to enable the precipitation of 5hmC in genomic DNA. The first approach, termed GLIB (glucosylation, periodate oxidation, biotinylation), uses a combination of enzymatic and chemical steps to isolate DNA fragments containing a few down to a single 5hmC and entails the addition of a glucose molecule to each 5hmC. The second approach involves the conversion of 5hmC to cytosine 5-methylenesulphonate (CMS) by treatment of genomic DNA with sodium bisulfite, followed by immunoprecipitation of CMS-containing DNA with a specific antiserum to CMS. Both methods are specific to DNA containing 5hmC.59 Applying these methods to 5hmC-containing DNA from mouse embryonic stem cells showed strong enrichment within exons and near transcriptional start sites. The enrichment of 5hmC at the transcriptional start sites suggested a role for 5hmC in transcriptional regulation. Additionally, 5hmC was especially enriched at the start sites of genes whose promoters bear dual histone 3 lysine 27 trimethylation (H3K27me3) and histone 3 lysine 4 trimethylation (H3K4me3) marks. Genes with 5hmC at their start sites were disproportionately likely to contain bivalent H3K27 and H3K4 trimethylation at their promoters. 725 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods Likewise, a majority (w60%) of genes reported to contain bivalent H3K27 and H3K4 trimethylation have 5hmC at their start sites. Collectively, these findings indicate that 5hmC, similar to 5mC, has a probable role in transcriptional regulation but support a model in which 5mC and 5hmC have different roles in transcription.59 New insights into 5hmC were also gained from other studies. 5hmC is mostly associated with euchromatin, and whereas 5mC is under-represented at gene promoters and CpG islands, 5hmC is enriched and is associated with increased transcriptional levels. Most, if not all, 5hmC in the genome depends on preexisting 5mC and the balance between these two modifications is different between genomic regions.60 Other studies have provided data to support further the important roles of TET proteins. The binding of TET1 throughout the genome of embryonic stem cells, with the majority of binding sites located at transcription start sites of CpG-rich promoters and within genes, was recently shown by Williams et al (2011). The hydroxymethylcytosine modification is found in gene bodies and, in contrast to methylcytosine, is also enriched at CpG-rich transcription start sites.61 Wu et al also showed in mouse embryonic stem cells that Tet1 is preferentially bound to CpGrich sequences at promoters of both transcriptionally active and Polycomb-repressed genes.62 Histone modifications In addition, the first genome-wide mapping of histone modifications using NGS technologies has identified a number of activation marks such as mono-methylations of H3K27, H3K9, H4K20, H3K79, and H2BK5, and histone marks that are linked to transcriptional repression such as trimethylations of H3K27, H3K9, and H3K79. This has provided new insights into the function of histone modifications in transcriptional regulation.63 Cellular differentiation involves the gradual loss of pluripotency and acquisition of cell type specific features. This process is precisely controlled by cell type specific epigenetic programmes. Understanding these processes requires genomewide analysis of epigenetic and gene expression profiles. New discoveries have been made through applications of NGS technologies for profiling histone and DNA methylation, as well as gene expression patterns of normal human mammary progenitor enriched and luminal lineage committed cells.64 Integrative analysis of the gene expression, DNA methylation, and histone H3 K4 and K27 trimethylation profiles of progenitor enriched and more differentiated luminal epithelial cell populations from multiple individuals were performed to understand better the regulation of human mammary epithelial cell type specification. Significant differences in histone H3 lysine27 tri-methylation (H3K27me3) enrichment and DNA methylation of genes expressed in a cell type specific manner were observed, suggesting their regulation by epigenetic mechanisms and a dynamic interplay between the two processes that together define developmental potential. The analysis further identified key regulators of mammary epithelial and luminal lineage commitment. The epigenetically regulated genes identified will accelerate the dissection of human mammary epithelial lineage commitment and luminal differentiation. Additionally, the list of genes epigenetically regulated in a cell type specific manner provides a rich resource for the further analysis of human breast development and the role of epigenetic mechanisms in breast tumorigenesis.64 This study has generated the first comprehensive epigenomic profile of human mammary epithelial cells by analysing gene 726 expression and DNA and histone methylation patterns of progenitor and luminal lineage enriched cells through applications of NGS.64 However, a major limitation associated with the study was that the DNA methylation data were limited to a fraction of the genome, as only the methylation status of the recognition site of the BssHII enzyme was evaluated (an inherent limitation of using restriction enzymes for DNA methylation). The other limitation was that the cell fractions used in the study were not homogenously pure. However, these limitations could be overcome through whole genome sequencing of bisulfite treated genomic DNA isolated from single cells (please see ‘New opportunities from TGS technologies’ for analysis of DNA methylation patterns in single cells). Mapping and analysis of chromatin state dynamics in nine human cell types has been performed by a recent study using ChIP-Seq coupled with NGS.65 More specifically, nine chromatin marks across nine cell types were mapped to systematically characterise regulatory elements, their cell type specificities, and their functional interactions. Chromatin states showed distinct associations with transcriptional start sites, transcripts, evolutionarily conserved non-coding regions, DNase hypersensitive sites, binding sites for the regulators c-Myc (MYC) and NF-kB, and inactive genomic regions associated with the nuclear lamina. Chromatin states drastically reduced the large combinatorial space of chromatin datasets to a manageable set of biologically interpretable annotations, thus providing an efficient and robust way to track coordinated changes across cell types. This allowed the systematic identification and comparison of more than 100 000 promoter and enhancer elements. This study is of significance because chromatin profiling has emerged as a powerful means of genome annotation and detection of regulatory activity as it provides a systematic means of detecting cis-regulatory elements. Therefore, chromatin profiling is especially well suited to the characterisation of non-coding portions of the genome, which remain largely uncharacterised. The results also have implications for the interpretation of GWASs. Disease variants were frequently found to coincide with enhancer elements specific to a relevant cell type. For example, 33 enhancers were found in the 9p21 region where multiple single nucleotide polymorphisms (SNPs) within the region were associated with coronary artery disease and type 2 diabetes. More specifically, the coronary artery disease risk alleles of SNPs rs10811656 and rs10757278 are located in one of the enhancers and disrupt a binding site for STAT1.66 In addition, the 8q24 cancer risk allele (rs6983267) increases prostate enhancer activity in vivo relative to the nonrisk allele also demonstrated.67 Intersecting with non-coding SNPs from GWAS datasets has suggested potential mechanistic explanations for disease variantsdthat is, how disease variants lead to the observed disease phenotypes, either through their presence within cell type specific enhancer states or by their effect on binding motifs for predicted regulators.65 HERITABLE EPIGENETICS AND COMPLEX DISEASES AND TRAITS ‘Heritable epigenetics’ refers to the germline inherited epigenetic markers. It has been proposed that in addition to the germline genetic component, heritable epigenetics also plays an important role and contributes to the heritability of complex diseases and traits. Epigenetic heritability has been well demonstrated in twin studies.12e15 Monozygotic twins are epigenetically indistinguishable during the early years of life whereas older monozygotic twins exhibited large differences in their overall content and genomic distribution of 5mC and histone acetylation.15 This indicates that monozygotic twins inherit identical J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods epigenetic profiles from their parents and acquire different somatic epigenetic markers throughout their lifetime. The role of epigenetic changes in the mechanism of complex phenotype loci remains largely unexplored. However, in the context of GWAS there is growing evidence supporting the importance of epigenetics in complex phenotypes. For example, the finding of SNPs associated with diseases located within 500 kb of known imprinted genes has provided some evidence to support the importance of epigenetics in complex diseases. A total of five SNPs associated with breast cancer, basal cell carcinoma, and type 2 diabetes were found to have parental origin specific associations. These SNPs were located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes.68 An SNP association in the imprinted region of chromosome 14q32.2 was also found for type 1 diabetes.69 However, the importance of parental origin of sequence variants in association with complex diseases has been largely understudied in GWAS because unrelated samples were investigated. Pedigree or family information would be needed to investigate the parent-of-origin effects. In addition, studies investigating monozygotic twins who were discordant for autoimmune disease (specifically systemic lupus erythematosus) identified widespread changes in the DNA methylation status of a significant number of genes. Subsequent gene ontology analysis revealed enrichment in categories associated with immune function. Thus, this study also supports the concept that epigenetic changes may be critical in the manifestation of autoimmune disease.20 Data are accumulating to support the roles of epigenetic differences in monozygotic twins discordant for diseases.16e19 By contrast, Baranzini et al (2010) did not find evidence for epigenetic (as well as genetic and transcriptomic) differences that explained monozygotic twins who were discordant for multiple sclerosis.21 The biological effect of the SNPs associated with epigenetic markers, is mediated through a variant specific epigenetic change. The associations of these SNPs with complex phenotypes can be identified by GWAS.48 70 71 DNA methylation associated with genetic variation in HapMap cell lines was studied and association analyses of methylation levels with more than 3 million SNPs identified 180 CpG-sites in 173 genes that were associated with nearby SNPs (usually within 5 kb).70 Integration of whole genome sequencing data with whole genome DNA methylation48 and histone modification data should facilitate the identification of DNA sequence genetic variations associated with the epigenetic markers, where the contribution of these epigenetic markers to complex phenotypes can be captured indirectly through SNPs by conventional genetic association studies or GWAS. These epigenetic markers may be germline heritable and thus account for heritability. Mapping allele specific DNA methylation or SNPs associated with DNA methylation can help to extract maximum information from GWAS.72e75 If epigenetic markers (or similarly copy number variants (CNVs)) are tagged by SNPs as the surrogate markers in GWAS, then they would not, in theory, account for the missing heritability. However, the effect sizes of SNPs (surrogate markers) found to be associated with complex phenotypes in GWAS could be underestimated, if the true causal variants (epigenetic markers or CNVs) were identified and which could then account for the missing heritability. However, present data are still rudimentary and it is unclear to what extent epigenetic markers can be tagged by the SNPs genotyped in GWAS and to what extent the effect size will be found to be higher when the true causal variant is identified. J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 By contrast, germline heritable epigenetic markers that cannot be tracked by DNA sequence variations would be missed by GWAS. It is currently unclear to what extent these heritable epigenetic markers can be detected by DNA sequence variations and through the genotyping of surrogate markers. This implies that these heritable epigenetic markers have to be studied directly. Further studies would be needed to address these uncertainties. In summary, studies of germline heritable epigenetic effects on complex phenotypes are still rudimentary; however, high throughput sequencing technologies will undoubtedly contribute directly to the advances in epigenomic studies of complex phenotypes or indirectly through identification of DNA sequence variations tagging epigenetic markers. These technological developments have overcome the technical hurdles in mapping and characterising epigenetic markers in the whole genome. However, there are issues and challenges in studying somatic epigenetic events in complex phenotypesdfor example, the need for DNA samples to be extracted from the specific tissue related to the disease. ISSUES AND CHALLENGES IN STUDYING EPIGENETICS OF COMPLEX DISEASES AND TRAITS The study of somatic epigenetics of complex diseases and traits is challenging, despite the availability of advanced sequencing technologies, as somatic epigenetic alterations or aberrations are tissue specific and undergo dynamic changes in response to the cellular environment and various other stimuli.76e78 These characteristics have created two substantial challenges when studying complex diseases: (1) the need for a specific tissue (related to the particular disease) to investigate; and (2) the need for multiple tissue types to be collected at different time points to interrogate the temporal changes in order to distinguish the ‘cause’ or the ‘consequence’. Ideally, the relevant tissue should be collected before the onset of the disease in order to establish a causal relationship. However, tissue samples are usually collected after the disease has occurred and thus it is unclear whether the aberrant epigenetic changes are the cause or a consequence of the disease. Currently, experiments to profile epigenomic changes are technologically feasible. Therefore, the greatest challenge is rather in determining and collecting the most appropriate tissue to be studied for each complex disease and trait. Animal models would be very useful for collecting tissues to investigate diseases; however, not all human diseases have a suitable animal model. Epigenetic studies are more feasible for certain diseases such as cancer, because cancer tissue is accessible after surgical resection or from diagnostic biopsies, from which the DNA can then be extracted. However, tissue heterogeneity is still a significant problem in epigenetic studies of primary cancer tissue,5 36 37 whereas in other diseases such as type 2 diabetes, the pancreatic b-cell is likely the most appropriate cell type to be studied to interrogate how aberrant epigenetics contribute to the impairment in insulin production. Nonetheless, this is only feasible in animal models at present. Type 2 diabetes is a systematic disease caused by impairment in insulin production and also insulin resistance in peripheral tissue, such as muscle, which leads to impairment in blood glucose uptake. As a result multiple peripheral tissues, in addition to pancreatic b-cells, are also required to provide a more complete picture of how epigenetic aberrations contribute to the pathogenesis of type 2 diabetes. This is in contrast to studying germline inherited epigenetic events, where DNA samples derived from any tissue can be used. Further adding to the complexity is the uncertainty of which 727 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods tissue to prioritise when multiple cell types or tissue are involved in a complex disease, and whether it is feasible in terms of sample collection, cost effectiveness and time to study all tissue types.76e78 Furthermore, it is also unclear whether the disease associated epigenetic changes can be revealed by a simple casee control study design (like conventional genetic association studies) or by other more robust study designs which need to be developed. It is also unclear what sample size is required to achieve adequate statistical power to detect the disease associated epigenetic changes. NEW OPPORTUNITIES FROM TGS TECHNOLOGIES The ‘gold standard’ for detection of cytosine methylation is sodium bisulfite conversion of DNA followed by sequencing. However, several shortcomings of this method must be noted. First, this method cannot distinguish between 5mC and 5hmC. This also means that previous studies have grouped both types of cytosine methylation into one group for analysis.79 Second, it is unable to detect methyladenine. Third, sodium bisulfite causes damage to DNA, resulting in fragmentation of DNA molecules which will limit this method to sequencing only shorter DNA sequences and subsequently hinder the ability to study the haplotype (a combination of multiple methylcytosines at adjacent loci) or specific patterns (or differences in patterns) of DNA methylation in the parental chromosomes.27 In addition to bisulfite conversion, methods based on methylated DNA immunoprecipitation (MeDIP) with anti-5mC antibody or methods based on proteins that bind to methylated CpG sequences do not detect 5hmC.80 The importance of 5hmC remains largely unknown, but it has been found in various tissues.57 58 81 Jin et al (2011) examined the distribution of 5hmC in DNA from human brain frontal lobe tissue and found that 5hmC was particularly enriched at promoters and in intragenic regions (gene bodies), but was largely absent from non-gene regions. Moreover, the presence of 5hmC in gene bodies was more positively correlated with gene expression levels.81 Single molecule real time sequencing technologies overcome these problems by directly sequencing the methylated nucleotides without the need for bisulfite conversion. This is accomplished through monitoring the kinetics of incorporation of nucleotides into newly synthesised DNA strands by polymerase.47 Nanopore sequencing technologies also have the ability to detect methylated cytosine directly.82 However, in comparison with SMRT sequencing, nanopore sequencing technologies may take several years to become commercially available to end users. In addition to directly sequencing the methylated nucleotides (which overcome the limitations of bisulfite conversion of DNA), TGS technologies also hold promise for studying the single cell epigenome. DNA samples are usually derived from a collection of the same cells (a homogeneity collection) or different cells (a heterogeneity collection such as primary cancer tissue) in which the DNA methylation patterns (ie, distributions and levels) may vary between the cells. The ability to interrogate the epigenome of a single cell will directly overcome the problem of tissue heterogeneity, especially for primary cancer tissue. Experimental data have shown that tumour content significantly influences the interpretation of methylation levels.83 The rationale for single cell analysis is that DNA methylation profiles could be highly variable across individual cells, even within the same tissue. Methods have been previously developed for the analysis of DNA methylation patterns in single cells, thus addressing the problems of cell 728 heterogeneity in epigenetics research.84 More powerful sequencing technologies in the future will accelerate research studies by providing greater resolution at the single cell level,84 single molecule DNA sequencing level,85 and the single nucleotide level. Third generation sequencing technologies are characterised by single molecule DNA sequencing without amplification. As a result, this has simplified the steps in library preparation for sequencing with minimal sample manipulation in experiments. For example, the Helicos sequencing platform was used to sequence chromatin immunoprecipitated DNA directly.86 87 A further advantage is that only a small quantity of immunoprecipitated DNA (approximately 50 pg) is required and in contrast to nanogrammes of DNA required for NGS platforms. The requirement of a small amount of DNA is particular crucial for single cell sequencing and for clinical samples with a limited amount of DNA. As a proof-of-principle study using TGS for ChIP-Seq experiments, a good agreement was obtained for the ChIP-Seq data produced by the Helicos and Illumina sequencing platforms,87 suggesting that TGS yields comparable results and is able to overcome the limitations associated with NGS in ChIP-Seq experiments. FUTURE DIRECTIONS AND CONCLUSIONS The tissue specificity of the epigenome creates substantial challenges for researchers. Furthermore, epigenetic events are dynamic and responsive to cellular environments and subject to changes, as opposed to the human genome which has ‘static’ genetic variations in the DNA sequences. Thus, generating a comprehensive human epigenomic map will involve multiple tissue samples and ‘time points’ (to study the temporal changes) requiring large scale international initiatives for this undertaking. The US National Institutes of Health Roadmap Epigenomics Mapping Consortium was established in response to this.88 The primary aim of the Consortium is to provide a publicly accessible resource of epigenomic maps in stem cells and primary ex vivo tissue. The Consortium will apply leverage on new experimental approaches enabling NGS technologies to explore and map the epigenome that is not restricted to DNA methylation and histone modifications, but also includes chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissue. The somatic epigenomic profile varies from one tissue to another, and it is important to study the ‘correct or appropriate’ tissue for the specific diseases. Therefore, the tissue studied by the Consortium will be chosen to represent the normal counterparts of tissue and organ systems frequently involved in human disease. The arrival of NGS and TGS technologies has caused a paradigm shift in the approaches to epigenomic studies and also improved our knowledge of their impacts on human biology and diseases. Comprehensive studies of epigenetics will also enhance our understanding of the interactions and relationships between DNA methylation and histone modification, with the importance and implications of these relationships on normal biology and diseases being increasingly recognised.89 These technologies and methods will continue to explore the presently unanswered questions in epigenomics. Competing interests None. Contributors CSK wrote this review paper and did the literature search. NN, WM and RS were involved in editing and critical review of the manuscript. CSK and RS had final responsibility for the decision to submit the paper for publication. Provenance and peer review Not commissioned; externally peer reviewed. J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. Feinberg AP. Phenotypic plasticity and the epigenetics of human disease. Nature 2007;447:433e40. Portela A, Esteller M. Epigenetic modifications and human disease. Nat Biotechnol 2010;28:1057e68. Urdinguio RG, Sanchez-Mut JV, Esteller M. Epigenetic mechanisms in neurological diseases: genes, syndromes, and therapies. Lancet Neurol 2009;8:1056e72. Ballestar E. Epigenetic alterations in autoimmune rheumatic diseases. Nat Rev Rheumatol 2011;7:263e71. Esteller M. Epigenetics in cancer. N Engl J Med 2008;358:1148e59. Rodrı́guez-Paredes M, Esteller M. Cancer epigenetics reaches mainstream oncology. Nat Med 2011;17:330e9. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature 2009;461:747e53. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 2010;11:446e50. Clarke AJ, Cooper DN. GWAS: heritability missing in action? Eur J Hum Genet 2010;18:859e61. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P, Gordon D, Chinwalla A, Zhao Y, Ries RE, Payton JE, Westervelt P, Tomasson MH, Watson M, Baty J, Ivanovich J, Heath S, Shannon WD, Nagarajan R, Walter MJ, Link DC, Graubert TA, DiPersio JF, Wilson RK. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008;456:66e72. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames DS, Gentleman R, de Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan Z, Seshagiri S, Zhang Z. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010;465:473e7. Kaminsky ZA, Tang T, Wang SC, Ptak C, Oh GH, Wong AH, Feldcamp LA, Virtanen C, Halfvarson J, Tysk C, McRae AF, Visscher PM, Montgomery GW, Gottesman II, Martin NG, Petronis A. DNA methylation profiles in monozygotic and dizygotic twins. Nat Genet 2009;41:240e5. Petronis A. Epigenetics and twins: three variations on the theme. Trends Genet 2006;22:347e50. Bell JT, Spector TD. A twin approach to unraveling epigenetics. Trends Genet 2011;27:116e25. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, Heine-Suñer D, Cigudosa JC, Urioste M, Benitez J, Boix-Chornet M, Sanchez-Aguilera A, Ling C, Carlsson E, Poulsen P, Vaag A, Stephan Z, Spector TD, Wu YZ, Plass C, Esteller M. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci USA 2005;102:10604e9. Singh SM, Murphy B, O’Reilly R. Epigenetic contributors to the discordance of monozygotic twins. Clin Genet 2002;62:97e103. Poulsen P, Esteller M, Vaag A, Fraga MF. The epigenetic basis of twin discordance in age-related diseases. Pediatr Res 2007;61:38Re42. Haque FN, Gottesman II, Wong AH. Not really identical: epigenetic differences in monozygotic twins and implications for twin studies in psychiatry. Am J Med Genet C Semin Med Genet 2009;151C:136e41. Ballestar E. Epigenetics lessons from twins: prospects for autoimmune disease. Clin Rev Allergy Immunol 2010;39:30e41. Javierre BM, Fernandez AF, Richter J, Al-Shahrour F, Martin-Subero JI, RodriguezUbreva J, Berdasco M, Fraga MF, O’Hanlon TP, Rider LG, Jacinto FV, Lopez-Longo FJ, Dopazo J, Forn M, Peinado MA, Carreño L, Sawalha AH, Harley JB, Siebert R, Esteller M, Miller FW, Ballestar E. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res 2010;20:170e9. Baranzini SE, Mudge J, van Velkinburgh JC, Khankhanian P, Khrebtukova I, Miller NA, Zhang L, Farmer AD, Bell CJ, Kim RW, May GD, Woodward JE, Caillier SJ, McElroy JP, Gomez R, Pando MJ, Clendenen LE, Ganusova EE, Schilkey FD, Ramaraj T, Khan OA, Huntley JJ, Luo S, Kwok PY, Wu TD, Schroth GP, Oksenberg JR, Hauser SL, Kingsmore SF. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature 2010;464:1351e6. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet 2008;24:133e41. Morozova O, Marra MA. Applications of next generation sequencing technologies in functional genomics. Genomics 2008;92:255e64. Werner T. Next generation sequencing in functional genomics. Brief Bioinform 2010;11:499e511. Horner DS, Pavesi G, Castrignanò T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform 2010;11:181e97. Huss M. Introduction into the analysis of high-throughput-sequencing based epigenome data. Brief Bioinform 2010;11:512e23. Laird PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet 2010;11:191e203. J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. Hurd PJ, Nelson CJ. Advantages of next-generation sequencing versus the microarray in epigenetic research. Brief Funct Genomic Proteomic 2009;8:174e83. Hirst M, Marra MA. Next generation sequencing based approaches to epigenomics. Brief Funct Genomics 2010;9:455e65. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669e80. Schones DE, Zhao K. Genome-wide approaches to studying chromatin modifications. Nat Rev Genet 2008;9:179e91. Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 2011;12:7e18. Izzo A, Schneider R. Chatting histone modifications in mammals. Brief Funct Genomics 2010;9:429e43. Wang Z, Schones DE, Zhao K. Characterization of human epigenomes. Curr Opin Genet Dev 2009;19:127e34. Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 2008;9:465e76. Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet 2007;8:286e98. Jones PA, Baylin SB. The epigenomics of cancer. Cell 2007;128:683e92. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009;462:315e22. Schumacher A, Weinhäusl A, Petronis A. Application of microarrays for DNA methylation profiling. Methods Mol Biol 2008;439:109e29. Lister R, Ecker JR. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 2009;19:959e66. Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, Gunderson KL. Genome-wide DNA methylation profiling using Infinium assay. Epigenomics 2009;1:177e200. Bibikova M, Fan JB. Genome-wide DNA methylation profiling. Wiley Interdiscip Rev Syst Biol Med 2010;2:210e23. Metzker ML. Sequencing technologiesdthe next generation. Nat Rev Genet 2010;11:31e46. Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 2011;470:198e203. Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Hum Mol Genet 2010;19:R227e40. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133e8. Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 2010;7:461e5. Li Y, Zhu J, Tian G, Li N, Li Q, Ye M, Zheng H, Yu J, Wu H, Sun J, Zhang H, Chen Q, Luo R, Chen M, He Y, Jin X, Zhang Q, Yu C, Zhou G, Sun J, Huang Y, Zheng H, Cao H, Zhou X, Guo S, Hu X, Li X, Kristiansen K, Bolund L, Xu J, Wang W, Yang H, Wang J, Li R, Beck S, Wang J, Zhang X. The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol 2010;8:e1000533. Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, Wei CL. Dynamic changes in the human methylome during differentiation. Genome Res 2010;20:320e31. Dahl C, Grønbæk K, Guldberg P. Advances in DNA methylation: 5-hydroxymethylcytosine revisited. Clin Chim Acta 2011;412:831e6. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, Ma L, Li G, Yang Z, Zhang G, Yang B, Yu C, Liang F, Li W, Li S, Li D, Ni P, Ruan J, Li Q, Zhu H, Liu D, Lu Z, Li N, Guo G, Zhang J, Ye J, Fang L, Hao Q, Chen Q, Liang Y, Su Y, San A, Ping C, Yang S, Chen F, Li L, Zhou K, Zheng H, Ren Y, Yang L, Gao Y, Yang G, Li Z, Feng X, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H, Wang J. The diploid genome sequence of an Asian individual. Nature 2008;456:60e5. Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D’Souza C, Fouse SD, Johnson BE, Hong C, Nielsen C, Zhao Y, Turecki G, Delaney A, Varhol R, Thiessen N, Shchors K, Heine VM, Rowitch DH, Xing X, Fiore C, Schillebeeckx M, Jones SJ, Haussler D, Marra MA, Hirst M, Wang T, Costello JF. Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 2010;466:253e7. Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O’Malley R, Castanon R, Klugman S, Downes M, Yu R, Stewart R, Ren B, Thomson JA, Evans RM, Ecker JR. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 2011;471:68e73. Hawkins RD, Hon GC, Lee LK, Ngo Q, Lister R, Pelizzola M, Edsall LE, Kuan S, Luu Y, Klugman S, Antosiewicz-Bourget J, Ye Z, Espinoza C, Agarwahl S, Shen L, Ruotti V, Wang W, Stewart R, Thomson JA, Ecker JR, Ren B. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 2010;6:479e91. 729 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Methods 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. Condic ML, Rao M. Alternative sources of pluripotent stem cells: ethical and scientific issues revisited. Stem Cells Dev 2010;19:1121e9. Hyun I. The bioethics of stem cell research and therapy. J Clin Invest 2010;120:71e5. Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, Agarwal S, Iyer LM, Liu DR, Aravind L, Rao A. Conversion of 5-methylcytosine to 5hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 2009;324:930e5. Kriaucionis S, Heintz N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 2009;324:929e30. Pastor WA, Pape UJ, Huang Y, Henderson HR, Lister R, Ko M, McLoughlin EM, Brudno Y, Mahapatra S, Kapranov P, Tahiliani M, Daley GQ, Liu XS, Ecker JR, Milos PM, Agarwal S, Rao A. Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature 2011;473:394e7. Ficz G, Branco MR, Seisenberger S, Santos F, Krueger F, Hore TA, Marques CJ, Andrews S, Reik W. Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 2011;473:398e402. Williams K, Christensen J, Pedersen MT, Johansen JV, Cloos PA, Rappsilber J, Helin K. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature 2011;473:343e8. Wu H, D’Alessio AC, Ito S, Xia K, Wang Z, Cui K, Zhao K, Eve Sun Y, Zhang Y. Dual functions of Tet1 in transcriptional regulation in mouse embryonic stem cells. Nature 2011;473:389e93. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell 2007;129:823e37. Maruyama R, Choudhury S, Kowalczyk A, Bessarabova M, Beresford-Smith B, Conway T, Kaspi A, Wu Z, Nikolskaya T, Merino VF, Lo PK, Liu XS, Nikolsky Y, Sukumar S, Haviv I, Polyak K. Epigenetic regulation of cell type-specific expression patterns in the human mammary epithelium. PLoS Genet 2011;7:e1001369. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M, Ku M, Durham T, Kellis M, Bernstein BE. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011;473:43e9. Harismendy O, Notani D, Song X, Rahim NG, Tanasa B, Heintzman N, Ren B, Fu XD, Topol EJ, Rosenfeld MG, Frazer KA. 9p21 DNA variants associated with coronary artery disease impair interferon-g signalling response. Nature 2011;470:264e8. Wasserman NF, Aneas I, Nobrega MA. An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res 2010;20:1191e7. Kong A, Steinthorsdottir V, Masson G, Thorleifsson G, Sulem P, Besenbacher S, Jonasdottir A, Sigurdsson A, Kristinsson KT, Jonasdottir A, Frigge ML, Gylfason A, Olason PI, Gudjonsson SA, Sverrisson S, Stacey SN, Sigurgeirsson B, Benediktsdottir KR, Sigurdsson H, Jonsson T, Benediktsson R, Olafsson JH, Johannsson OT, Hreidarsson AB, Sigurdsson G, Ferguson-Smith AC, Gudbjartsson DF, Thorsteinsdottir U, Stefansson K; DIAGRAM Consortium. Parental origin of sequence variants associated with complex diseases. Nature 2009;462:868e74. Wallace C, Smyth DJ, Maisuria-Armer M, Walker NM, Todd JA, Clayton DG. The imprinted DLK1-MEG3 gene region on chromosome 14q32.2 alters susceptibility to type 1 diabetes. Nat Genet 2010;42:68e71. Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y, Pritchard JK. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011;12:R10. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. Verlaan DJ, Berlivet S, Hunninghake GM, Madore AM, Larivière M, Moussette S, Grundberg E, Kwan T, Ouimet M, Ge B, Hoberman R, Swiatek M, Dias J, Lam KC, Koka V, Harmsen E, Soto-Quiros M, Avila L, Celedón JC, Weiss ST, Dewar K, Sinnett D, Laprise C, Raby BA, Pastinen T, Naumova AK. Allele-specific chromatin remodeling in the ZPBP2/GSDMB/ORMDL3 locus associated with the risk of asthma and autoimmune disease. Am J Hum Genet 2009;85:377e93. Tycko B. Mapping allele-specific DNA methylation: a new tool for maximizing information from GWAS. Am J Hum Genet 2010;86:109e12. Kerkel K, Spadola A, Yuan E, Kosek J, Jiang L, Hod E, Li K, Murty VV, Schupf N, Vilain E, Morris M, Haghighi F, Tycko B. Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation. Nat Genet 2008;40:904e8. Zhang Y, Rohde C, Reinhardt R, Voelcker-Rehage C, Jeltsch A. Non-imprinted allele specific DNA methylation on human autosomes. Genome Biol 2009;10:R138. Shoemaker R, Deng J, Wang W, Zhang K. Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. Genome Res 2010;20:883e9. Petronis A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 2010;465:721e7. Bell CG, Beck S. The epigenomic interface between genome and environment in common complex diseases. Brief Funct Genomics 2010;9:477e85. Maunakea AK, Chepelev I, Zhao K. Epigenome mapping in normal and disease states. Circ Res 2010;107:327e39. Huang Y, Pastor WA, Shen Y, Tahiliani M, Liu DR, Rao A. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS One 2010;5:e8888. Jin SG, Kadam S, Pfeifer GP. Examination of the specificity of DNA methylation profiling techniques towards 5-methylcytosine and 5-hydroxymethylcytosine. Nucleic Acids Res 2010;38:e125. Jin SG, Wu X, Li AX, Pfeifer GP. Genomic mapping of 5-hydroxymethylcytosine in the human brain. Nucleic Acids Res 2011;39:5015e24. Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 2009;4:265e70. Loh M, Liem N, Lim PL, Vaithilingam A, Cheng CL, Salto-Tellez M, Yong WP, Soong R. Impact of sample heterogeneity on methylation analysis. Diagn Mol Pathol 2010;19:243e7. Kantlehner M, Kirchner R, Hartmann P, Ellwart JW, Alunni-Fabbroni M, Schumacher A. A high-throughput DNA methylation analysis of a single cell. Nucleic Acids Res 2011;39:e44. Thompson JF, Milos PM. The properties and applications of single-molecule DNA sequencing. Genome Biol 2011;12:217. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z. Singlemolecule DNA sequencing of a viral genome. Science 2008;320:106e9. Goren A, Ozsolak F, Shoresh N, Ku M, Adli M, Hart C, Gymrek M, Zuk O, Regev A, Milos PM, Bernstein BE. Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA. Nat Methods 2010;7:47e9. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 2010;28:1045e8. Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat Rev Genet 2009;10:295e304. PAGE fraction trail=9.75 730 J Med Genet 2011;48:721e730. doi:10.1136/jmedgenet-2011-100242 Downloaded from http://jmg.bmj.com/ on November 3, 2016 - Published by group.bmj.com Studying the epigenome using next generation sequencing Chee Seng Ku, Nasheen Naidoo, Mengchu Wu and Richie Soong J Med Genet 2011 48: 721-730 originally published online August 8, 2011 doi: 10.1136/jmedgenet-2011-100242 Updated information and services can be found at: http://jmg.bmj.com/content/48/11/721 These include: References Email alerting service Topic Collections This article cites 89 articles, 22 of which you can access for free at: http://jmg.bmj.com/content/48/11/721#BIBL Receive free email alerts when new articles cite this article. Sign up in the box at the top right corner of the online article. Articles on similar topics can be found in the following collections Molecular genetics (1250) Notes To request permissions go to: http://group.bmj.com/group/rights-licensing/permissions To order reprints go to: http://journals.bmj.com/cgi/reprintform To subscribe to BMJ go to: http://group.bmj.com/subscribe/