Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Inside the Genome. 2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409:

Similar presentations


Presentation on theme: "1 Inside the Genome. 2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409:"— Presentation transcript:

1 1 Inside the Genome

2 2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001) The club resident JD Watson Back2back with DJ. Venter and

3 3 Prologue RNA word – the dark matter of genomics  How many coding genes in the human genome? –The Bet of 2000: –Mean 61710 –Range – 30,000 – 150,000 –By the end of the genome project the estimated number of human protein-coding genes declined to only ~25,000 –What is the source for that discrepancy?  ESTs based estimation Vs. Whole Genome annotation

4 4 RNA revolution  The majority of the transcriptional output comes from non coding RNA –an average of 10% of the human genome (compared with ~1.5% exonic sequences) resulted in transcripts [Cheng et al. 2005] –Or even more... 62% of the mouse genome is transcribed [FANTOM3: Science 2005]

5 5 Various RNAs – A partial list…  messenger RNA (mRNA)  Ribosomal RNA (rRNA)  Transfer RNA (tRNA)  Small nuclear RNA (snRNA)  Small nucleolar RNA (snoRNA)  Short interfering RNA (siRNA)  Micro RNA (miRNA)

6 6 RNAs are not merely the intermediary cousins of proteins - The Central dogma of molecular biology Revisited Transcription RNA Translation Protein Genome Transcriptome Proteome Regulation by proteins miRNA Regulation by RNA

7 7 Research in Biology is complex…  Deciphering Biological Systems –The advantage (what makes this quest feasible) and the hindrance (what makes this quest inherently difficult) – both explained by evolution.

8 8  The difficulties in our research fundamentally owe their complexity to the designer – natural selection.  What is it - a “ Robot ” or a “ UFO ” ? –The reason lies in the profound difference between systems “ designed ” by natural selection and those designed by intelligent engineers [Langton 1989 Artificial Life]. The Hindrance – Topological Entanglement of functional interconnections

9 9  Bottom line: we investigate an outrageously complex weave of interconnections –The “ textbook networks ” represent only the tip of the iceberg.  miRNAs and “ Regolomics ” –microRNAs - Expected to represent ~1% of predicted genes [Lim et al., 2003] –Lewis et al., (2003) estimate average of five targets per miRNA –Many targets are transcription factors - miRNAs regulate the regulators

10 10 The advantage – universal homology, thus enabling comparative biology.  Bottom line: the research in biology advances through a reductionist approach - using simple model organisms to infer functionality of homologous systems.

11 11 2.91 billion base pairs 24,000 protein coding genes (>30,000 non-coding genes ???) 1.5% exons (127 nucleotides) 24% introns (~3,000 nucleotides) 75% intergenic (no genes) Repetitive elements rule (~ 45% dispersed repeat ) Average size of a gene is 27,894 bases Contains an average of 8.8 exons *Titin contains 234 exons. Ave. of 4 diff. proteins per gene (alternative splicing) Human genome statistics

12 12 Detecting genes in the human genome Gene finding methods:  Ab initio use general knowledge of gene structure: rules and statistics The challenge: small exons in a sea of introns  Homology-based The problem: will not detect novel genes

13 13 Genscan (ab initio)  Based on a probabilistic model of a gene structure  Takes into account: - promoters - gene composition – exons/introns - GC content - splice signals  Goes over all 6 reading frames Burge and Karlin, 1997, Prediction of complete gene structure in human genomic DNA, J. Mol. Biol. 268 \\|// (o o) -..-..-oOOo~(_)~oOOo-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-'

14 14 Splicing

15 15 Eukaryotic splice sites Poly-pyrimidine tract

16 16 CpG Islands: another signal  CpG islands are regions of the genome with a higher frequency of CG dinucleotides (not base-pairs!) than the rest of the genome  CpG islands often occur near the beginning of genes  maybe related to the binding of the TF Sp1

17 17 Gene Ontology  GO describes proteins in terms of : biological process (e.g. induction of apoptosis by external signals) cellular component ( e.g. membrane fraction) molecular function ( e.g. protein kinase) nucleus Nuclear chromosome cell

18 18 Comparative proteome analysis Functional categories based on GO

19 19 Comparative proteome analysis  Humans have more proteins involved in cytoskeleton, immune defense, and transcription

20 20 Evolutionary conservation of human proteins ???

21 21 Horizontal (lateral) gene transfer   Lateral Gene Transfer (LGT) is any process in which an organism transfers genetic material to another organism that is not its offspring

22 22 Mechanisms:  Transformation  Transduction (phages/viruses)  Conjugation

23 23 Bacteria to vertebrate LGT detection  E-value of bacterial homolog X9 better than eukaryal homolog Human query: Hit ……………… e-value Frog ………….. 4e-180 Mouse ………… 1e-164 E.Coli ………….. 7e-124 Streptococcus.. 9e-71 Worm ……………….0.1

24 24 Bacteria to vertebrate LGT vertebrates Bacteria Non- vertebrates

25 25

26 26 Bacteria to vertebrate LGT??  Hundreds of sequenced bacterial genome vs. handful of eukaryotes  Gene finding in bacteria is much easier than in eukaryotes  On the practical side: rigid mechanical barriers to LGT in eukaryotes (nucleus, germ line)

27 27 Repetitive Elements in the Human Genome

28 28 Repeats statistics  The human genome is ~45% dispersed repeat  20% LINEs, (AT rich)  13% is SINES (11% Alu), (GC rich)  8% LTR (retrovirus like) and  2% DNA transposons  Another 3% is tandem simple sequence repeats (e.g. triplet)  And another 3-5% is segmentally duplicated at high similarity (over 1kb over 90% id)  Identifying and screening these out is essential to avoid fake matches

29 29 LINEs and SINEs  Highly successful elements in eukaryotes  LINE - Long Interspersed Nuclear Element (>5,000 bp)  SINE - Short Interspersed Nuclear Element (< 500 bp)  SINEs are freeriders on the backs of LINEs – encode no proteins

30 30 The C-value paradox  Genome size does not correlate with organism complexity AmoebaRiceHumanYeast 670 billion 4.3 billion 3 billion 12 million Genome size ?~30,00020-25,0006,275 Number of genes

31 31 Repetitive elements  The C-value mystery was partially resolved when it was found that large portions of genomes contain repetitive elements

32 32 Are Alus functional??  SINEs are transcribed under stress  SINE RNAs may bind a protein kinase  promote translation under stress Need to be in regions which are highly transcribed  Role in alternative splicing

33 33 Segment duplications  1077 segmental duplications detected  Several genes in the duplicated regions associated with diseases (may be related to homologous recombination)  Most are recent duplications (conservation of entire segment, versus conservation of coding sequences only)

34 34 Genome-wide studies

35 35 Sequenced genomes

36 36  481 segments > 200 bp absolutely conserved (100% identity) between human, rat and mouse

37 37 Comparison with a neutral substitution rate  Compare the substitution rate in a any 1Mb region  Probability of 10 -22 of obtaining 1 ultranconserved element (UE) by chance

38 38 481 UEs 111 UE overlap a known mRNA: exonic UEs 256 - no overlap (non- exonic) 114 - inconclusive 100 intronic 156 inter- genic

39 39 Who are the genes? Type 1: exonic Type 2: genes which are near non- exonic UEs (???)

40 40 Intergenic UEs  Genes which flank intergenic UEs are enriched for early developmental genes  Are UEs distal enhancers of these genes?

41 41 Gene enhancer  A short region of DNA, usually quite distant from a gene (due to chromatin complex folding), which binds an activator  An activator recruits transcription factors to the gene

42 42 Experimental studies of UEs Tested 167 UEs (both mouse-human UEs and fish-human UEs) for enhancer activity: cloned before a reporter gene to test their activity 45% functioned as enhancers

43 43 A bioinformatic success  Ultraconservation can predict highly important function!

44 44 Ahituv PLoS Biol. 2007 Sep;5(9):e234 Chose 4 UEs which are near specific genes: genes which show a specific phenotype when knocked-out Performed complete deletion of these UEs … the mice were viable and did not show any different phenotype BUT …

45 45 Conclusions…  Ultraconservation can be indicative of important function  …  And sometimes not: - gene redundancy - long-range phenotypes - laboratories cannot mimic life


Download ppt "1 Inside the Genome. 2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409:"

Similar presentations


Ads by Google