1 Inside the Genome. 2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409:

Slides:



Advertisements
Similar presentations
Genomics – The Language of DNA Honors Genetics 2006.
Advertisements

Introduction to genomes & genome browsers
Genome organization Lesk, Ch 2 (Lesk, 2008). Genomes and proteomes Genome of a typical bacterium comes as a single DNA molecule of about 5 million characters.
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
RNA and Protein Synthesis
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
ECE 501 Introduction to BME
1 Lessons 5-6 Classifying a protein / Inside the genome.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
chromosome organization, what about genome organization?
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
1 Previously on: Biological Sequences Analysis. 2 Motifs.
Eukaryotic Gene Finding
RNA.
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Gene Structure and Identification
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Human Genetics The Human Genome 1.
Selfish DNA Honors Genetics.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Genomics Lecture 8 By Ms. Shumaila Azam. 2 Genome Evolution “Genomes are more than instruction books for building and maintaining an organism; they also.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
RNA and Protein Synthesis
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Marco Magistri , Journal Club. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein “Structural genes encode proteins.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Chapter 21 Eukaryotic Genome Sequences
Fig.1.8 DNA STRUCTURE 5’ 3’ Antiparallel DNA strands Hydrogen bonds between bases DOUBLE HELIX 5’ 3’
Genetics 3: Transcription: Making RNA from DNA. Comparing DNA and RNA DNA nitrogenous bases: A, T, G, C RNA nitrogenous bases: A, U, G, C DNA: Deoxyribose.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
REVIEW. Protein Synthesis AT-A-GLANCE Translation.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Control of Eukaryotic Genome
Genomics Chapter 18.
Motif Search and RNA Structure Prediction Lesson 9.
The Secret of Life! DNA. 2/4/20162 SOMETHING HAPPENS GENE PROTEIN.
How many genes are there?
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Molecular structure of gene and chromosome Gene: In molecular terms, a gene is the entire DNA sequence required for synthesis of functional protein or.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
The Transcriptional Landscape of the Mammalian Genome
Thursday, March 2, 2017 GOALS: Finish Ghost in your Genes
Human Genome Project.
Genomes and Their Evolution
Genes, Genomes, and Genomics
Introduction to Bioinformatics II
Evolution of eukaryote genomes
Genome Annotation and the Human Genome
Gene Structure.
Gene Structure.
Presentation transcript:

1 Inside the Genome

2 2001: The Human Genome Venter et. al., Science 292: (2001) International Human Genome Sequencing Consortium, Nature, 409: (2001) The club resident JD Watson Back2back with DJ. Venter and

3 Prologue RNA word – the dark matter of genomics  How many coding genes in the human genome? –The Bet of 2000: –Mean –Range – 30,000 – 150,000 –By the end of the genome project the estimated number of human protein-coding genes declined to only ~25,000 –What is the source for that discrepancy?  ESTs based estimation Vs. Whole Genome annotation

4 RNA revolution  The majority of the transcriptional output comes from non coding RNA –an average of 10% of the human genome (compared with ~1.5% exonic sequences) resulted in transcripts [Cheng et al. 2005] –Or even more... 62% of the mouse genome is transcribed [FANTOM3: Science 2005]

5 Various RNAs – A partial list…  messenger RNA (mRNA)  Ribosomal RNA (rRNA)  Transfer RNA (tRNA)  Small nuclear RNA (snRNA)  Small nucleolar RNA (snoRNA)  Short interfering RNA (siRNA)  Micro RNA (miRNA)

6 RNAs are not merely the intermediary cousins of proteins - The Central dogma of molecular biology Revisited Transcription RNA Translation Protein Genome Transcriptome Proteome Regulation by proteins miRNA Regulation by RNA

7 Research in Biology is complex…  Deciphering Biological Systems –The advantage (what makes this quest feasible) and the hindrance (what makes this quest inherently difficult) – both explained by evolution.

8  The difficulties in our research fundamentally owe their complexity to the designer – natural selection.  What is it - a “ Robot ” or a “ UFO ” ? –The reason lies in the profound difference between systems “ designed ” by natural selection and those designed by intelligent engineers [Langton 1989 Artificial Life]. The Hindrance – Topological Entanglement of functional interconnections

9  Bottom line: we investigate an outrageously complex weave of interconnections –The “ textbook networks ” represent only the tip of the iceberg.  miRNAs and “ Regolomics ” –microRNAs - Expected to represent ~1% of predicted genes [Lim et al., 2003] –Lewis et al., (2003) estimate average of five targets per miRNA –Many targets are transcription factors - miRNAs regulate the regulators

10 The advantage – universal homology, thus enabling comparative biology.  Bottom line: the research in biology advances through a reductionist approach - using simple model organisms to infer functionality of homologous systems.

billion base pairs 24,000 protein coding genes (>30,000 non-coding genes ???) 1.5% exons (127 nucleotides) 24% introns (~3,000 nucleotides) 75% intergenic (no genes) Repetitive elements rule (~ 45% dispersed repeat ) Average size of a gene is 27,894 bases Contains an average of 8.8 exons *Titin contains 234 exons. Ave. of 4 diff. proteins per gene (alternative splicing) Human genome statistics

12 Detecting genes in the human genome Gene finding methods:  Ab initio use general knowledge of gene structure: rules and statistics The challenge: small exons in a sea of introns  Homology-based The problem: will not detect novel genes

13 Genscan (ab initio)  Based on a probabilistic model of a gene structure  Takes into account: - promoters - gene composition – exons/introns - GC content - splice signals  Goes over all 6 reading frames Burge and Karlin, 1997, Prediction of complete gene structure in human genomic DNA, J. Mol. Biol. 268 \\|// (o o) oOOo~(_)~oOOo ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-'

14 Splicing

15 Eukaryotic splice sites Poly-pyrimidine tract

16 CpG Islands: another signal  CpG islands are regions of the genome with a higher frequency of CG dinucleotides (not base-pairs!) than the rest of the genome  CpG islands often occur near the beginning of genes  maybe related to the binding of the TF Sp1

17 Gene Ontology  GO describes proteins in terms of : biological process (e.g. induction of apoptosis by external signals) cellular component ( e.g. membrane fraction) molecular function ( e.g. protein kinase) nucleus Nuclear chromosome cell

18 Comparative proteome analysis Functional categories based on GO

19 Comparative proteome analysis  Humans have more proteins involved in cytoskeleton, immune defense, and transcription

20 Evolutionary conservation of human proteins ???

21 Horizontal (lateral) gene transfer   Lateral Gene Transfer (LGT) is any process in which an organism transfers genetic material to another organism that is not its offspring

22 Mechanisms:  Transformation  Transduction (phages/viruses)  Conjugation

23 Bacteria to vertebrate LGT detection  E-value of bacterial homolog X9 better than eukaryal homolog Human query: Hit ……………… e-value Frog ………….. 4e-180 Mouse ………… 1e-164 E.Coli ………….. 7e-124 Streptococcus.. 9e-71 Worm ……………….0.1

24 Bacteria to vertebrate LGT vertebrates Bacteria Non- vertebrates

25

26 Bacteria to vertebrate LGT??  Hundreds of sequenced bacterial genome vs. handful of eukaryotes  Gene finding in bacteria is much easier than in eukaryotes  On the practical side: rigid mechanical barriers to LGT in eukaryotes (nucleus, germ line)

27 Repetitive Elements in the Human Genome

28 Repeats statistics  The human genome is ~45% dispersed repeat  20% LINEs, (AT rich)  13% is SINES (11% Alu), (GC rich)  8% LTR (retrovirus like) and  2% DNA transposons  Another 3% is tandem simple sequence repeats (e.g. triplet)  And another 3-5% is segmentally duplicated at high similarity (over 1kb over 90% id)  Identifying and screening these out is essential to avoid fake matches

29 LINEs and SINEs  Highly successful elements in eukaryotes  LINE - Long Interspersed Nuclear Element (>5,000 bp)  SINE - Short Interspersed Nuclear Element (< 500 bp)  SINEs are freeriders on the backs of LINEs – encode no proteins

30 The C-value paradox  Genome size does not correlate with organism complexity AmoebaRiceHumanYeast 670 billion 4.3 billion 3 billion 12 million Genome size ?~30, ,0006,275 Number of genes

31 Repetitive elements  The C-value mystery was partially resolved when it was found that large portions of genomes contain repetitive elements

32 Are Alus functional??  SINEs are transcribed under stress  SINE RNAs may bind a protein kinase  promote translation under stress Need to be in regions which are highly transcribed  Role in alternative splicing

33 Segment duplications  1077 segmental duplications detected  Several genes in the duplicated regions associated with diseases (may be related to homologous recombination)  Most are recent duplications (conservation of entire segment, versus conservation of coding sequences only)

34 Genome-wide studies

35 Sequenced genomes

36  481 segments > 200 bp absolutely conserved (100% identity) between human, rat and mouse

37 Comparison with a neutral substitution rate  Compare the substitution rate in a any 1Mb region  Probability of of obtaining 1 ultranconserved element (UE) by chance

UEs 111 UE overlap a known mRNA: exonic UEs no overlap (non- exonic) inconclusive 100 intronic 156 inter- genic

39 Who are the genes? Type 1: exonic Type 2: genes which are near non- exonic UEs (???)

40 Intergenic UEs  Genes which flank intergenic UEs are enriched for early developmental genes  Are UEs distal enhancers of these genes?

41 Gene enhancer  A short region of DNA, usually quite distant from a gene (due to chromatin complex folding), which binds an activator  An activator recruits transcription factors to the gene

42 Experimental studies of UEs Tested 167 UEs (both mouse-human UEs and fish-human UEs) for enhancer activity: cloned before a reporter gene to test their activity 45% functioned as enhancers

43 A bioinformatic success  Ultraconservation can predict highly important function!

44 Ahituv PLoS Biol Sep;5(9):e234 Chose 4 UEs which are near specific genes: genes which show a specific phenotype when knocked-out Performed complete deletion of these UEs … the mice were viable and did not show any different phenotype BUT …

45 Conclusions…  Ultraconservation can be indicative of important function  …  And sometimes not: - gene redundancy - long-range phenotypes - laboratories cannot mimic life