Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary,

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
From Genes to Genomes: Concepts and Applications of DNA Technology, Jeremy W. Dale, Malcolm von Schantz and Nick Plant. © 2012 John Wiley & Sons, Ltd.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Genome Assembly and Annotation Erik Arner Omics Science Center, RIKEN Yokohama, Japan
Whole Genome Sequencing, Comparative Genomics, & Systems Biology Gene Myers University of California Berkeley.
CSE182-L12 Gene Finding.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Bioinformatics for next-generation DNA sequencing Gabor T. Marth Boston College Biology Department BC Biology new graduate student orientation September.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Eukaryotic Gene Finding
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Sequencing a genome (a) outline the steps involved in sequencing the genome of an organism; (b) outline how gene sequencing allows for genome-wide comparisons.
Gene Structure and Identification
Large-scale genome projects
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
By Zemin Ning & Adam Spargo Informatics Division The Wellcome Trust Sanger Institute The SSAHA2 Application Pack.
Genomes and Genomics.
Genome Annotation Rosana O. Babu.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Human Genome.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Notes: Human Genome (Right side page)
Virginia Commonwealth University
bacteria and eukaryotes
Very important to know the difference between the trees!
Genes, Genomes, and Genomics
CSE182-L12 Gene Finding.
Genome sequencing informatics
Discovery tools for human genetic variations
Databases BI420 – Introduction to Bioinformatics Gabor T. Marth
Genome organization and Bioinformatics
Introduction to Bioinformatics II
Biological Databases BI420 – Introduction to Bioinformatics
BIOL 433 Plant Genetics Term 2,
Introduction to Sequencing
Genome Annotation and the Human Genome
Databases BI420 – Introduction to Bioinformatics Gabor T. Marth
Genome Annotation and the Human Genome
Presentation transcript:

Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006

Lecture overview 1. Genome sequencing strategies, sequencing informatics 2. Genome annotation, functional and structural features in the human genome 3. Genome variability, DNA nucleotide, structural, and epigenetic variations

1. The Human genome sequence

The nuclear genome (chromosomes)

The genome sequence the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

Completed genomes ~1 Mb ~100 Mb >100 Mb ~3,000 Mb

Main genome sequencing strategies Clone-based shotgun sequencing Whole-genome shotgun sequencing Human Genome ProjectCelera Genomics, Inc.

Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

Clone mapping – “sequence ready” map

Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

Shotgun subclone library construction BAC primary clone cloning vector sequencing vector subclone insert

Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

Sequencing

Robotic automation Lander et al. Nature 2001

Base calling PHRED base = A Q = 40

Vector clipping

Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

Sequence assembly PHRAP

Repetitive DNA may confuse assembly

Sequence completion (finishing) CONSED, AUTOFINISH gap region of low sequence coverage and/or quality

2. Human genome annotation

Genome annotation – Goals protein coding genesRNA genes repetitive elements GC content

The starting material AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

Coding genes – ab initio predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA Open Reading Frame = ORF Stop codon Start codon PolyA signal

Ab initio predictions Gene structure

Ab initio predictions …AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG… splice donor site splice acceptor site

Ab initio predictions Genscan Grail Genie GeneFinder Glimmer etc… EST_genome Sim4 Spidey EXALIN

Homology based predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA ACGGAAGTCT known coding sequence from another organism GGACTATAAA expressed sequence genes predicted by homology Genomescan Twinscan etc…

Consolidation – gene prediction systems Otto Ensembl FgenesH Genscan Grail Genewise Sim4 dbEst

ncRNA genes prediction based on structure (e.g. tRNAs) for other novel ncRNAs, only homology-based predictions have been successful

Repeat annotations Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

The landscape of the human genome

Gene annotations – # of coding genes Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene annotations – gene length Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Gene annotations – gene function Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

GC content and coding potential Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

ncRNAs Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Segmental duplications Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Repeat elements Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

Genes and repeats

Physical vs. genetic map (Mb/cM) 0.4 cM1.3 cM0.7 cM 0.4 Mb0.7 Mb0.3 Mb

3. Human genome variability

DNA sequence variations the reference Human genome sequence is 99.9% common to each human being sequence variations make our genetic makeup unique SNP the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known

DNA sequence variations insertion-deletion (INDEL) polymorphisms

Structural variations Speicher & Carter, NRG 2005

Structural variations Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi: /nrg1767

Detection of structural variants Feuk et al. Nature Reviews Genetics 7, 85–97 (February 2006) | doi: /nrg1767

Epigenetic changes: chromatin structure Sproul, NRG 2005

Epigenetic changes: DNA methylation Laird, NRC 2003