Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Genomics of the Eukaryotes

Similar presentations


Presentation on theme: "Comparative Genomics of the Eukaryotes"— Presentation transcript:

1 Comparative Genomics of the Eukaryotes
Ishay Ben-Zion Comparative Genomics of the Eukaryotes A paper by : Rubin, Yandell, Wortman,…

2 Motivation Evolution – Charles Darwin (1838)
Similarity between different species Model organisms A human shares 50% of his genes with a banana. How ? Humans and bananas are multi-cellular Other Similarities Humans share 23% of their genes with Yeast Could banana be a good model organism ?

3 Model Organisms Heavily Studied – used as examples for other species
Once it is studied enough – It is a good candidate Important requirements: Size Generation Time (for genetic research) Manipulation (genetic and not) Little “Junk DNA” (easy for sequencing) Money

4 This paper describes: A comparison between the genomes of 3 Eukaryotes: Eukaryote – Cell has inner structures with membranes (nucleus) 1) A fruit fly - Drosophila melanogaster 2) A worm – C. elegans 3) Yeast – S. cerevisiae Other model organisms (E. coli, mouse, Zebrafish, Arabidopsis)

5 Taxonomic classification
Cellular life Domain: Bacteria Archaea Eukaryota Kingdom: Animalia Plantae Protista Fungi H. influenzae Fly worm yeast Species:

6 Drosophila melanogaster
Popular model organism (for developmental biology) A trial for the human genome (sequenced at 2000) Easily induce mutations

7 Caenorhabditis elegans
Transparent, 1-mm long Simple – 959 cells (300 neurons) Eat, sleep & have sex (or self-fertilize) Hermaphrodites – 99.95%, Males – 0.05%

8 Caenorhabditis elegans
Good as a model organism for: Genetics: First multi-cellular sequenced genome Developmental biology: cell fate mapping Neurobiology: neurons connectivity map

9 Saccharomyces cerevisiae
Also called Baker’s yeast Single-celled Diameter: 5-10 μ Popular model organism Simplest Eukaryote First Eukaryotic sequenced genome

10 The 1st comparison Instead of counting genes - count gene families
What are gene families ? Paralogs = highly similar proteins in the same genome Similar functionality – but not always Remark: proteins = genes Sets of paralogs

11 Findings H. influenzae Yeast Fly worm Total # of genes 1700 6200
13,600 18,400 # of gene families 1400 4400 8100 9400 # of duplicates 300 1800 5,500 9000 Size of a family: one or more No. of families – not a good measure for complexity

12 The 2nd comparison Pool genes of large families of 3 species:
For each protein – search for orthologs Orthologs = Similar proteins in other species Among families found in flies and worms (but not yeast): Responsible for multi-cellular development Among families found only in flies: Responsible for immune response and fly specific

13 Methods – BLAST algorithm
Basic Local Alignment Search Tool For comparing biological sequences (to find Homology) Example: Proteins, DNA sequences Query Library of sequences (In the library – sequences of different lengths) In the paper: Paralogy, Orthology - kinds of Homology C G A C G T T C A C G A C G T

14 BLAST – Step 1 Separate query to k-letter words Example:
Proteins – Letters are Amino acids (L=Leucine) Query sequence: RPPQGLF (k=3) 3-letter words: RPP PPQ PQG QGL GLF

15 Use scoring matrix for two k-letter words
BLAST – Step 2 Take one k-letter word – PQG Search library for similar words – LGMCPQA, DPPEGVV Define similarity: High score for 2 words Have common ancestor PQG – PQA : PQG – PEG : 15 Save similar words above a threshold T (save positions) Repeat for all k-letter words in query Use scoring matrix for two k-letter words

16 BLAST – Step 3 Align at saved positions: - - - R P P Q G L F - - -
- - - D P P E G V V - - - Scores: Extend match right and left for positive score New pairs are called High-scoring Segment Pairs (HSP) Save significant HSPs (above a threshold S) Total: = 23

17 BLAST – Step 4 Align saved HSPs (with gaps)
Example: 2 Sequences with 2 HSPs Insert gap Compute total score (involves gap penalties) Report all matches above a threshold E . R P Q G L F T S A M K H Y . D P E G V - M K S F Y N C . D P E G V M K S F Y N C

18 BLAST – Whole process Separate query to k-letter words
Search library for similar k-letter words and save Extend to HSPs and save Align whole sequences and compute total score Return sequences with score above E These are homologous to query

19 The 3rd comparison Compare all genes of three species with length limitation (80% of length) 20% of the fly appear in worm and yeast They perform functions common to all eukaryotic cells

20 The 4th comparison Compare all genes of three species to mammalian sequences (without length limitation) 50% of the fly proteins appear in mammals 36% of the worm proteins appear in mammals Fly is closer to mammals Most of mammalian sequences used here were short The similarities reflect conserved domains

21 What are conserved domains ?
Domains – independent parts that construct proteins Appear in different combinations in different proteins Similarity to short sequences Conserved domains Closeness in evolution ABC ADEG

22 To conclude Significant similarity between genomes of ”distant” species (Man – Yeast 23%) Similarity increases for taxonomically close species ( ) No. of genes or gene families – bad measure for complexity Why ? More information that is not encoded in the genome (Protein interactions – e.g. physical proximity of genes) How to define complexity ?


Download ppt "Comparative Genomics of the Eukaryotes"

Similar presentations


Ads by Google