Download presentation
Presentation is loading. Please wait.
1
1 Inside the Genome
2
2 2001: The Human Genome Venter et. al., Science 292:1304-1351 (2001) International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001) The club resident JD Watson Back2back with DJ. Venter and
3
3 Prologue RNA word – the dark matter of genomics How many coding genes in the human genome? –The Bet of 2000: –Mean 61710 –Range – 30,000 – 150,000 –By the end of the genome project the estimated number of human protein-coding genes declined to only ~25,000 –What is the source for that discrepancy? ESTs based estimation Vs. Whole Genome annotation
4
4 RNA revolution The majority of the transcriptional output comes from non coding RNA –an average of 10% of the human genome (compared with ~1.5% exonic sequences) resulted in transcripts [Cheng et al. 2005] –Or even more... 62% of the mouse genome is transcribed [FANTOM3: Science 2005]
5
5 Various RNAs – A partial list… messenger RNA (mRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA) Small nuclear RNA (snRNA) Small nucleolar RNA (snoRNA) Short interfering RNA (siRNA) Micro RNA (miRNA)
6
6 RNAs are not merely the intermediary cousins of proteins - The Central dogma of molecular biology Revisited Transcription RNA Translation Protein Genome Transcriptome Proteome Regulation by proteins miRNA Regulation by RNA
7
7 Research in Biology is complex… Deciphering Biological Systems –The advantage (what makes this quest feasible) and the hindrance (what makes this quest inherently difficult) – both explained by evolution.
8
8 The difficulties in our research fundamentally owe their complexity to the designer – natural selection. What is it - a “ Robot ” or a “ UFO ” ? –The reason lies in the profound difference between systems “ designed ” by natural selection and those designed by intelligent engineers [Langton 1989 Artificial Life]. The Hindrance – Topological Entanglement of functional interconnections
9
9 Bottom line: we investigate an outrageously complex weave of interconnections –The “ textbook networks ” represent only the tip of the iceberg. miRNAs and “ Regolomics ” –microRNAs - Expected to represent ~1% of predicted genes [Lim et al., 2003] –Lewis et al., (2003) estimate average of five targets per miRNA –Many targets are transcription factors - miRNAs regulate the regulators
10
10 The advantage – universal homology, thus enabling comparative biology. Bottom line: the research in biology advances through a reductionist approach - using simple model organisms to infer functionality of homologous systems.
11
11 2.91 billion base pairs 24,000 protein coding genes (>30,000 non-coding genes ???) 1.5% exons (127 nucleotides) 24% introns (~3,000 nucleotides) 75% intergenic (no genes) Repetitive elements rule (~ 45% dispersed repeat ) Average size of a gene is 27,894 bases Contains an average of 8.8 exons *Titin contains 234 exons. Ave. of 4 diff. proteins per gene (alternative splicing) Human genome statistics
12
12 Detecting genes in the human genome Gene finding methods: Ab initio use general knowledge of gene structure: rules and statistics The challenge: small exons in a sea of introns Homology-based The problem: will not detect novel genes
13
13 Genscan (ab initio) Based on a probabilistic model of a gene structure Takes into account: - promoters - gene composition – exons/introns - GC content - splice signals Goes over all 6 reading frames Burge and Karlin, 1997, Prediction of complete gene structure in human genomic DNA, J. Mol. Biol. 268 \\|// (o o) -..-..-oOOo~(_)~oOOo-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-'
14
14 Splicing
15
15 Eukaryotic splice sites Poly-pyrimidine tract
16
16 CpG Islands: another signal CpG islands are regions of the genome with a higher frequency of CG dinucleotides (not base-pairs!) than the rest of the genome CpG islands often occur near the beginning of genes maybe related to the binding of the TF Sp1
17
17 Gene Ontology GO describes proteins in terms of : biological process (e.g. induction of apoptosis by external signals) cellular component ( e.g. membrane fraction) molecular function ( e.g. protein kinase) nucleus Nuclear chromosome cell
18
18 Comparative proteome analysis Functional categories based on GO
19
19 Comparative proteome analysis Humans have more proteins involved in cytoskeleton, immune defense, and transcription
20
20 Evolutionary conservation of human proteins ???
21
21 Horizontal (lateral) gene transfer Lateral Gene Transfer (LGT) is any process in which an organism transfers genetic material to another organism that is not its offspring
22
22 Mechanisms: Transformation Transduction (phages/viruses) Conjugation
23
23 Bacteria to vertebrate LGT detection E-value of bacterial homolog X9 better than eukaryal homolog Human query: Hit ……………… e-value Frog ………….. 4e-180 Mouse ………… 1e-164 E.Coli ………….. 7e-124 Streptococcus.. 9e-71 Worm ……………….0.1
24
24 Bacteria to vertebrate LGT vertebrates Bacteria Non- vertebrates
25
25
26
26 Bacteria to vertebrate LGT?? Hundreds of sequenced bacterial genome vs. handful of eukaryotes Gene finding in bacteria is much easier than in eukaryotes On the practical side: rigid mechanical barriers to LGT in eukaryotes (nucleus, germ line)
27
27 Repetitive Elements in the Human Genome
28
28 Repeats statistics The human genome is ~45% dispersed repeat 20% LINEs, (AT rich) 13% is SINES (11% Alu), (GC rich) 8% LTR (retrovirus like) and 2% DNA transposons Another 3% is tandem simple sequence repeats (e.g. triplet) And another 3-5% is segmentally duplicated at high similarity (over 1kb over 90% id) Identifying and screening these out is essential to avoid fake matches
29
29 LINEs and SINEs Highly successful elements in eukaryotes LINE - Long Interspersed Nuclear Element (>5,000 bp) SINE - Short Interspersed Nuclear Element (< 500 bp) SINEs are freeriders on the backs of LINEs – encode no proteins
30
30 The C-value paradox Genome size does not correlate with organism complexity AmoebaRiceHumanYeast 670 billion 4.3 billion 3 billion 12 million Genome size ?~30,00020-25,0006,275 Number of genes
31
31 Repetitive elements The C-value mystery was partially resolved when it was found that large portions of genomes contain repetitive elements
32
32 Are Alus functional?? SINEs are transcribed under stress SINE RNAs may bind a protein kinase promote translation under stress Need to be in regions which are highly transcribed Role in alternative splicing
33
33 Segment duplications 1077 segmental duplications detected Several genes in the duplicated regions associated with diseases (may be related to homologous recombination) Most are recent duplications (conservation of entire segment, versus conservation of coding sequences only)
34
34 Genome-wide studies
35
35 Sequenced genomes
36
36 481 segments > 200 bp absolutely conserved (100% identity) between human, rat and mouse
37
37 Comparison with a neutral substitution rate Compare the substitution rate in a any 1Mb region Probability of 10 -22 of obtaining 1 ultranconserved element (UE) by chance
38
38 481 UEs 111 UE overlap a known mRNA: exonic UEs 256 - no overlap (non- exonic) 114 - inconclusive 100 intronic 156 inter- genic
39
39 Who are the genes? Type 1: exonic Type 2: genes which are near non- exonic UEs (???)
40
40 Intergenic UEs Genes which flank intergenic UEs are enriched for early developmental genes Are UEs distal enhancers of these genes?
41
41 Gene enhancer A short region of DNA, usually quite distant from a gene (due to chromatin complex folding), which binds an activator An activator recruits transcription factors to the gene
42
42 Experimental studies of UEs Tested 167 UEs (both mouse-human UEs and fish-human UEs) for enhancer activity: cloned before a reporter gene to test their activity 45% functioned as enhancers
43
43 A bioinformatic success Ultraconservation can predict highly important function!
44
44 Ahituv PLoS Biol. 2007 Sep;5(9):e234 Chose 4 UEs which are near specific genes: genes which show a specific phenotype when knocked-out Performed complete deletion of these UEs … the mice were viable and did not show any different phenotype BUT …
45
45 Conclusions… Ultraconservation can be indicative of important function … And sometimes not: - gene redundancy - long-range phenotypes - laboratories cannot mimic life
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.