Clonal Sequencing for Genetics and Cancer in a Research and Diagnostic Setting Graham Taylor Leeds People Leeds University • Joanne Morgan • David Parry • Claire Logan • David Bonthron • Colin Johnson • Eamonn Sheriden • Chris Inglehearn • Ian Carr • Alex Markham Leeds NHS • Nick Camm • Helen Lindsay • Antigone Tzika • Josie Hayes • Christopher Watson • Lampros Mavrogiannis • Ruth Charlton • Paul Roberts • Leeds Health Stars Why bother… On résiste à l'invasion des armées; on ne résiste pas à l'invasion des idées. Victor Hugo, Histoire d'un Crime (1852) There is one thing stronger than all the armies in the world, and that is an idea whose time has come. Sequencing in 2007 The scale of activity The rate of change courtesy Stephanie Cohen & Steve Brenner Hype Cycle Cost‐effective Genetic tests $1,000 genome Projects Leeds University • Gene dosage • Targetted re‐sequencing • ChIP‐seq • Transcriptome • Tumour resequencing Leeds NHS • Re‐sequencing Long PCR products • Gene dosage • Targetted reseqencing • Tumour resequencing The Pathway to Terabase sequencing • Generation 1: – Sanger/Capillary Format/Nanomolar scale – 96 reads per run • Generation 2 – Clonal/array format 454, Illumina, SOLiD • Generation 2.1 – Higher cluster density Clonal/Array Format/Zeptomolar scale – 1 billion reads per run • Generation 2.1b – Mini versions of 2.1 • Generation 3: – Single molecule Generation 2.1 sequencers: 1012 base sequence capacity • Reagent cost approx £3,000 • Avge genome coverage >250 • Avge exome coverage >20,000 – £300 (reagent) exome at 2,000‐fold coverage • aCGH will not compete – Reduction of CNV‐seq costs by at least 10‐fold – Ability to identify translocations – Ability to report SNPs (uniparental disomy,autozygosity) • Further reduction in cost of single/few gene analyses will not have much impact as major costs are now outside of the sequencing workflow 2.1b (mini‐sequencers) • 109‐10 base capacity • Roche Junior (14 Gbase/£3,000) • Illumina MiSeq (1.5 Gbase/£500) • Ion torrent (1 Gbase/?) • Mid range sequencers would replace most conventional sequencing applications but not achieve the economies of scale of the large scale sequencers e.g. exomes. CNV‐seq, mapping re‐arrangements Approx cost per base Q1 2012 Whole Genome CNV‐seq Resolution comparable to that obtained from array‐CGH platforms (240Kb for ICSA 8x60Kb, 225Kb for NGRL 4x44Kb19) can be achieved using 2 million reads per sample. With increased reads, resolution to the level of the base pair is feasible with NGS Read pairs will map translocations to the exact position Multiplex NGS for pre‐ and post‐natal genetic diagnosis of copy number variation in phenotypically‐abnormal constitutional cases Antigoni Tzika, Kelly Cohen, Paul Roberts 0.5Mb BlueGnome BAC array Aneuploidy detection CNV‐seq 5X multiplex 10X multiplex 80X multiplex Multiplexing and effect on resolution • Increasing multiplexing decreases resolution • Basic patterns visible and higher throughput and lower cost Next Generation Sequencing (NGS) and Genetic Diagnostics • NGS diagnostics is an inevitable trend because: – Staffing and reagent costs are reduced – Data output is increased and automated • There will gradually be a transition in diagnostics from testing one or two genes to several, many and possibly all genes • Since February 2010 the Leeds Genetics service has delivered BRCA1 & BRCA2 sequence using CPA‐ accredited NGS based on the Illumina GAIIx – Improved reporting times – Reduced costs – Reduced Retest rate The case for gene‐centric analysis (or, “We have run out of money, we shall have to think”‐ Rutherford) Coverage • 1 flow cell (GAIIx) = 7 channels • 1 channel = 40 million reads • 1 read = 36 – 300 bases • 1 channel = 1 – 12 Gigabases • 1 flow cell = 24 haploid genome equivalents • At mean coverage of 100, one channel can cover 120 Megabases (NB exome is approx 50Mbases) Reagent Costs • 1 genome (x24) $10,000 (1‐plex) • 1 exome (x50) $1,000 (1‐plex) • 1 cardiome (1Mbase) $ 100 (10‐plex)* • 1 gene (e.g BRCA1) $15 (96‐plex)* *Library preparation weighting not fully costed GGTGGC Standard adaptor sequence Library insert Barcode sequence Sequencing primer Multiplexing for cost efficiency Sample tagging – 6 base barcodes (potentially 1024 variations) BRCA1 & BRCA2 analysis • 671 variants (326 BRCA1, 345 BRCA2) • 77 variants identified previously were all detected • All pathogenic variants were detected NGS in the Leeds Genetics lab • Familial breast/ovarian cancer – NGS replaced existing Sanger service for BRCA1 and BRCA2 for all diagnostic referrals in February 2010 – First UK NGS diagnostic reports issued in March 2010 • Hereditary non-polyposis colorectal cancer – NGS replaced existing Sanger service for MLH1, MSH2 and MSH6 for all diagnostic referrals in October 2010 • Hypertrophic cardiomyopathy – MYBPC3, MYH7, TNNI3, TNNT2 • Pheochromocytoma & paraganglioma – PRKAR1A, RET, SDH5, SDHB, SDHC, SDHD, TMEM127, VHL • Marfan syndrome – FBN1 New services introduced May 2011 Results / benefits • Multi‐gene testing • Reduced lab costs • Increased capacity • Increased reliability • Improvement in turnaround times • Improvements in patient care pathways • Close working relationship with research groups Average BRCA reporting times 0 10 20 30 40 50 60 No v-0 9 De c-0 9 Ja n-1 0 Fe b-1 0 Ma r-1 0 Ap r-1 0 Ma y-1 0 Ju n-1 0 Ju l-1 0 Au g-1 0 Se p-1 0 Oc t-1 0 No v-1 0 De c-1 0 Ja n-1 1 Fe b-1 1 R e p o r t i n g t i m e ( w o r k i n g d a y s ) The deliverable, the desirable and the fundable • $1K genomes/exomes • Targetted resequencing – Incremental revision of existing services – Stratified medicine • Germline • Somatic • Syndrome‐omes – Cardiome – Retinome – Ciliome – “Cancer chip” • Screening – Prenatal – Carrier • Autozygous • General recessive What does this mean for clinical diagnostics? The currency of genetic analysis will become DNA sequence – Sequence count – Sequence variation – Sequence arrangement – The currency is dropping to commodity prices – Skill set required is less lab, more informatics biased – genes < pathways < “omes” < genomes Genome Informatics • Use the co‐ordinates of the reference genome • Use the reference sequence • Use the annotations to the reference sequence • Overlay experimental data • Filter and output results • Use scripts, e.g. Perl, Java, Python • Use web applications, e.g. Galaxy Handling the data locally Python driven Web front end for Genome Informatics Now getting up to speed with NGS tools Array capture Switches from over 20 PCRs to get two genes to one PCR to enrich over 24 genes (pathway or syndrome‐driven testing) Phenome • Cardiome • Ciliome • Retinome • Kinome • … Samples need to multiplex to be cost‐effective, but multiplexing reduces capture efficiency Autozygome Bradford has a population of 470,000, with 6,000 live births in 2006. Although only 18% of the Bradford population is of south Asian origin, 50% of births are to south Asian families Autosomal recessive (AR) diseases occur when a child inherits two copies of a gene, one from each parent, both genes carrying a harmful mutation. The chance of having a child with an AR condition is increased if both parents are blood relatives. In communities in which consanguineous marriage is common, there is a significant increase in the prevalence of AR disease . A 1993 study from Birmingham recorded a 16‐ fold increase in AR diseases in the offspring of consanguineous Pakistani couples, compared to non‐consanguineous couples . Exome • One exome 22,000 variants • Filtering results • Unknown success rate • Exome sequencing is now within reach of Regional Genetics Services • Recent experience in Leeds: 3 exomes, 3 pathogenic variants PCR‐ome Preliminary(!) data on accuracy <50 bases 1/700, proportional to read length >50 bases, less accurate than 1/700, proportional to read length and square of read length Detection and quantification of rare mutations with massively parallel sequencing Isaac Kinde, Jian Wu, Nick Papadopoulos, Kenneth W. Kinzler, and Bert Vogelstein NGS Genotypes • Generated using custom Perl script converts FASTA or FASTQ files to xls genotype report • Good signal:noise (noise <1%) • Direct reporting of genotypes • Forward and reverse reads support QC • Unmatched reads available for further processing Scale up: From 5 amplicons, 10 cases to 350 amplicons, 10 cases de novo variants kras codon 12 6 base insertion kras codon 11 C>T Custom Perl script groups and counts unmatched reads which are then used in BLASTn queries against reference genome build How do I transfer cheap sequencing into high value diagnostics? • Demand/Targets – Phenotype driven – Public Health driven – Clinical demand driven – Clinical utility driven • Commissioners • Providers – State funded or commercial? – Staff Training and accreditation needs Consequences • Any large scale re‐sequencing effort will find many variants of unknown clinical significance • These need to be recorded and to be searchable • In this way, over time, we will learn more about their frequency and clinical significance • But only if we have the database! • Support your local/national/international mutation database Thanks to Academic • Joanne Morgan • David Parry • Claire Logan • David Bonthron • Colin Johnson • Eamonn Sheriden • Chris Inglehearn • Ian Carr • Alex Markham Service • Nick Camm • Helen Lindsay • Antigone Tzika • Josie Hayes • Christopher Watson • Lampros Mavrogiannis • Ruth Charlton • Paul Roberts • Leeds Health Stars