Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing.

Slides:



Advertisements
Similar presentations
Introduction to genomes & genome browsers
Advertisements

The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & )Gene content 2)Proteome content 3)SNP identification.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
ECE 501 Introduction to BME
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
What is genomics? Study of genomes. What is the genome? Entire genetic compliment of an organism.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Chapter 11 Table of Contents Section 1 Control of Gene Expression
Introduction to genomes Content  the human genome CNVs SNPs Alternative splicing  genome projects Celia van Gelder CMBI UMC Radboud June 2009
Eukaryotic Gene Expression The “More Complex” Genome.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Sackler Medical School
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Mark D. Adams Dept. of Genetics 9/10/04
Introduction to genomes Content  the human genome CNVs SNPs Alternative splicing  genome projects Celia van Gelder CMBI UMC Radboud June 2009
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Changes in the Eukaryotic Genome By: Sergio Aguilar.
Diving into the gene pool: Chromosomes, genes and DNA
Genetic Testing Amniocentesis Until recently, most genetic testing occurred on fetuses to identify gender and genetic diseases. Amniocentesis is one technique.
The Secret of Life! DNA. 2/4/20162 SOMETHING HAPPENS GENE PROTEIN.
Microbial Genetics.  In bacteria genetic transfer (recombination) can happen three ways:  Transformation  Transduction  Conjugation  The result is.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Chapter 19 The Organization & Control of Eukaryotic Genomes.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Who is smarter and does more tricks you or a bacteria? YouBacteria How does my DNA compare to a prokaryote? Show-off.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Gene structure and function
© 2007 McGraw-Hill Higher Education. All rights reserved. Chapter 2 Genetics: You and Your Family Health History.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
The Organization and Control of Eukaryotic Genomes Ch. 19 AP Biology Ms. Haut.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Week-6: Genomics Browsers
Human Cells Human genomics
School of Pharmacy, University of Nizwa
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genomes and Their Evolution
Gene Density and Noncoding DNA
School of Pharmacy, University of Nizwa
Genome Annotation and the Human Genome
SNPs and CNPs By: David Wendel.
Presentation transcript:

Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation SNPs CNVs Alternative splicing  Browsing the human genome Celia van Gelder CMBI UMC Radboud December 2012

Exponential Growth in Genomic Sequence Data # of genomes Currently completed genomes First 2 bacterial genomes complete First eukaryote complete (yeast) First metazoan complete (flatworm)

Exponential Growth in Genomic Sequence Data © Pevzner 2011

The cow genome Houston Chronicle Houston scientists milk cow genome for its secrets Weekly Times Now Bovine genome to revolutionise food production National Geographic Cow Genome Decoded -- Cheaper Beef for Everybody? BBC News Cow genome 'to transform farming

The pig genome

The human genome Genome: the entire sequence of DNA in a cell 3 billion basepairs (3Gb) 22 chromosome pairs + X en Y chromosomes Chromosome length varies from ~50Mb to ~250Mb About protein-coding genes ( average gene length 3000 bases, but largest known gene is 2.4 Mb (dystrophin)) Human genome is 99.9% identical among individuals This means that every 2 persons differ in 3 million nts!!

Eukaryotic Genomes: more than collections of genes Genes & regulatory sequences make up 5% of the genome – Protein coding genes – RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA) – Structural DNA (centromeres, telomeres) – Regulation-related sequences (promoters, enhancers, silencers, insulators) – Parasite sequences (transposons) – Pseudogenes (non-functional gene-like sequences) – Simple sequence repeats

The human genome cntnd From: Molecular Biology of the Cell (4 th edition) (Alberts et al., 2002) Only 1.2% codes for proteins Long introns, short exons Large spaces between genes More than half consists of repetitive DNA Alu repeat ~300 bp > million copies

Variation along genome sequence Nucleotide usage varies along chromosomes – Protein coding regions tend to have high GC levels Genes are not equally distributed across the chromosomes – Housekeeping generally in gene- dense areas – Gene-poor areas tend to have many tissue specific genes Karyotype: – Gene rich areas = light – Gene poor areas = dark From: Ensembl

Chromosome organisation From: Lodish (4 th edition) DNA packed in chromatin Non-active genes often in densely packed chromatin (30-nm fiber) Active genes in less dense chromatin (beads-on-a-string) Gene regulation by changing chromatin density, methylation/acetylation of the histones Genes that are OFF Genes that are ON

Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation CNVs SNPs Alternative splicing  Browsing the human genome

Human Genetic Variation Every human has essentially the same set of genes, but there are different forms of each gene -- known as alleles Genetic variation explains some of the differences among people, such as: – Blood group – Eye color – Skin color – Hair color – Higher or lower risk for getting particular diseases Cystic fibrosis, Sickle cell disease, Diabetes, Cancer, Arthritis, Asthma Stroke, Heart disease Alzheimer's disease, Parkinson's disease Depression, Alcoholism

Variations in the Genome Common Sequence Variations Polymorphism Deletions Translocations Insertions Chromosome

Today’s focus 1.Single Nucleotide Polymorphisms (SNPs) 2.Copy number variations (CNV) 3.Alternative transcripts

Single Nucleotide Polymorphisms (SNPs) SNPs are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. For a variation to be considered a SNP, it must occur in at least 1% of the population. SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome. SNPs can occur in coding (gene) and non coding regions of the genome; <1% alter the protein sequence

SNPs & medicine Although more than 99% of human DNA sequences are the same, variations in DNA sequence can have a major impact on how humans respond to: – disease; – environmental factors such as bacteria, viruses, toxins, and chemicals; – and drugs (& side-effects). This makes SNPs valuable for biomedical research and for developing pharmaceutical products or medical diagnostics.

SNP & disease, Alzheimer Alzheimer's disease (AD) & apolipoprotein E The APOE gene encodes the protein apolipoprotein E, a cholesterol carrier that is found in the brain and other organs. Its exact role in the development of AD is unclear. Several studies have indicated a role of APOE in amyloid beta aggregation and clearance, influencing the onset of amyloid beta deposition.

SNP & disease, Alzheimer (2) Two SNPs - three APOE variants APOE contains 2 SNPs that result in 3 possible alleles: E2, E3, E4. Variant rs rs7412 E2 T + T E3 T + C E4C + C A person who inherits at least one E4 allele will have a greater chance of developing AD.

Today’s focus 1.Single Nucleotide Polymorphisms (SNPs) 2.Copy number variations (CNV) 3.Alternative transcripts

Copy Number Variation People do not only vary at the nucleotide level (SNPs) Copy Number Variations (CNVs): gains and losses of large chunks of DNA sequence (10kB – 5Mb) When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals CNVs contribute to our uniqueness. CNVs can also influence the susceptibility to disease. CNVs may either be inherited or caused by de novo mutation

Copy Number Variation Normal cell deletion amplification CN=0 CN=1 CN=3 CN=4 CN=2

CNVs & disease Many inherited genetic diseases result from CNVs; – Gene copy number can be elevated in cancer cells – Autism – Schizophrenia (dept. human genetics) – Mental retardation (dept. human genetics) – Parkinsons disease There are CNVs that protect against HIV infection and malaria. The contribution of CNV to the common, complex diseases, such as diabetes and heart disease, is currently less well understood

Today’s focus 1.Copy number variations (CNV) 2.Single Nucleotide Polymorphisms (SNPs) 3.Alternative transcripts

Alternative splicing

Defects of the machinery of alternative splicing have been implicated in many diseases, including: – neuropathological conditions such as Alzheimer disease – cystic fibrosis, those involving growth and developmental defects – many human cancers, e.g. BRCA1 in breast cancer – Beta-globin in Beta-thalassemia

Introduction to genomes & genome browsers Content  Introduction  The human genome  Human genetic variation CNVs SNPs Alternative splicing  Browsing the human genome

Annotating the genome A genome sequence is of limited use without functional annotation. Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: identifying elements on the genome attaching biological information to these elements. Annotating the genome – Bioinformatics! The genome browser is a tool for visualizing genome annotation. It provides context to understand genomic regions of interest

Basic & Advanced Genome Annotation Basic: – Genomic location – Gene features: Exons, Introns, UTRs – Transcript(s) – Pseudogenes, Non-coding RNA – Protein(s) – Links to other sources of information Advanced – Cytogenetic bands – Polymorphic markers – Genetic variation, including SNPs & CNVs – Repetitive sequences – cDNAs or mRNAs from related species – Genomic sequence variation – Regulation sequences (enhancers, silencers, insulators)

Possible research questions P. Schattner, Genomics 93 (2009):

[Human] Genome Browsers EBI Ensembl NCBI Map Viewer UCSC Genome Browser Not limited to only human data

Other Ensembl Installations

genes & predictions variations & repeats cross-species comparative data & many more types of data from expression & regulation to mRNA and ESTs… Gene X Description Transcript data Structure Gene Ontology Pathway Data Homologous Genes Expression Data Etc…. Organized Data Based on Chromosome Location tracks

Ensembl Genes – biological basis All Ensembl transcripts are based on proteins and mRNAs in: – UniProt/Swiss-Prot (manually curated) – UniProt/TrEMBL – NCBI RefSeq (manually curated)

34 Ensembl Homepage ↔

HGNC – a unique name and symbol for every gene in human ENSG### Ensembl Gene ID ENST### Ensembl Transcript ID ENSP### Ensembl Peptide ID ENSE### Ensembl Exon ID

Ensembl: An Example Click for more details tracks

Direction of transcription Above blue line: forward strand Below blue line: reverse strand

Synopsis- What can I do with Ensembl ? View, examine & explore annotated information for any chromosomal region: – Genes, – ESTs, mRNAs, alternative transcripts – Proteins – SNPs, and SNPs across strains (rat, mouse), populations (human), or even breeds (dog) – homologues and phylogenetic trees across more than 40 species – whole genome alignments – conserved regions across species – gene expression profiles Upload your own data and use BLAST/BLATagainst any Ensembl genome Export sequence, or create a table of gene information