Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.

Slides:



Advertisements
Similar presentations
GenomePixelizer - a visualization tool for comparative genomics within and between species. A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable.
Advertisements

BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Lecture 12 Splicing and gene prediction in eukaryotes
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Supplementary Figure S1 Percentage of peaks from Trf1 +/+ p53 -/- -Cre vs Trf1  /  p53 -/- -Cre comparison that are located in non subtelomeric and subtelomeric.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Introduction to Phylogenetics
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Chapter 3 The Interrupted Gene.
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Indexing genomic sequences 逢甲大學 資訊工程系 許芳榮. Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST to genome.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
WRKY transcription factors in potato genome factors in potato genome
Reconstructing the Evolutionary History of Complex Human Gene Clusters
RNA-seq Replicate 1 RNA-seq Replicate 2 DNA
SGN23 The Organization of the Human Genome
Tests for Gene Clustering
Ab initio gene prediction
Novel PMS2 Pseudogenes Can Conceal Recessive Mutations Causing a Distinctive Childhood Cancer Syndrome  Michel De Vos, Bruce E. Hayward, Susan Picton,
Molecular Clocks Rose Hoberman.
Sensitivity of RNA‐seq.
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Ultraconserved Elements in the Human Genome
Evolution of eukaryote genomes
Volume 11, Issue 3, Pages (March 2018)
Volume 30, Issue 1, Pages (July 2014)
Chapter 4 The Interrupted Gene.
WRKY transcription factors in potato genome factors in potato genome
Discovery and Characterization of piRNAs in the Human Fetal Ovary
Volume 146, Issue 6, Pages (September 2011)
Gene structures, positions of mutations, and protein domains of PRP18 paralogs in Arabidopsis. Gene structures, positions of mutations, and protein domains.
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Hox Gene Loss during Dynamic Evolution of the Nematode Cluster
Volume 133, Issue 3, Pages (May 2008)
Volume 6, Issue 2, Pages (January 2014)
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Ortholog identification and summaries.
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Volume 5, Issue 4, Pages (November 2013)
Unit Genomic sequencing
Predicting Gene Expression from Sequence
Novel PMS2 Pseudogenes Can Conceal Recessive Mutations Causing a Distinctive Childhood Cancer Syndrome  Michel De Vos, Bruce E. Hayward, Susan Picton,
Volume 11, Issue 3, Pages (March 2018)
Complex evolutionary trajectories of sex chromosomes across bird taxa
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
The Toy Exon Finder.
Fig. 2. —Phylogenetic relationships and motif compositions of some representative MORC genes in plants and animals. ... Fig. 2. —Phylogenetic relationships.
Volume 7, Issue 4, Pages (April 2014)
Volume 21, Issue 23, Pages (December 2011)
Volume 11, Issue 7, Pages (May 2015)
Origins and Impacts of New Mammalian Exons
Volume 6, Issue 3, Pages (May 2013)
C-Lineage-Dependent CRC Expression and Nectary Development in Arabidopsis and Petunia. C-Lineage-Dependent CRC Expression and Nectary Development in Arabidopsis.
Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF
Presentation transcript:

Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for O. sativa and orange bars for A. thaliana. Our data suggests that both exons and introns are on average longer in rice than their counterparts in Arabidopsis thaliana (Supplementary Table 2). This tendency was especially prevalent among the first and last exons (Supplementary Table 2). It is possible that transposon insertions in their UTRs may have led to the observed differences in exon lengths between the two species. Another possibility is that even though the average exon length is the same we would observe a different image; since the A. thaliana mRNA dataset used contained a lower proportion of FLcDNAs than that of O. sativa, the A. thaliana mRNAs perhaps contain a percentage of incomplete UTRs which may have biased the average. However, the almost identical mean numbers of exons in the two species (Supplementary Table 2) suggested that the A. thaliana mRNA dataset used was similar to its rice counterpart in composition. The distributions of the predicted exon lengths were fairly similar between the two species (Supplementary Fig. 1A), but the predicted introns displayed different distributions (Supplementary Fig. 1B), which implied that only a limited number of exons were elongated in the rice genome or incomplete in the A. thaliana genome. It appears that the rice introns may have accepted more transposon inserts than the A. thaliana introns.

Supplementary Fig. 1 (Cont.) (A) Exon length Supplementary Fig. 1 (Cont.)

Supplementary Fig. 1 (Cont.) (B) Intron length Supplementary Fig. 1 (Cont.)

Supplementary Fig. 2 Supplementary Figure 2. Proportion of protein lengths in five categories. The first three categories were combined. Short proteins (<300 a.a.) appear to be enriched in Categories IV and V.

Supplementary Fig. 3 Antisense npRNA candidate BTP/POZ domain-containing protein NAM-like protein Supplementary Figure 3. Antisense npRNA in the rice genome. The Os08g0103700 (AK071064) npRNA encoded in the forward strand on chromosome 8 overlaps two sense genes: Os08g0103600 (AK067168; BTP/POZ domain-containing protein gene) and Os08g0103900 (AK110611; NAM-like protein gene). Other features such as A. thaliana mRNAs and expressed sequence tags (ESTs) are also shown. See the following URL: http://rappub.lab.nig.ac.jp/g-integra/cgi-bin/f_genemap.cgi?id=AK071064

Supplementary Fig. 4 Supplementary Figure 4. Distribution of evolutionary distances (p distance) of orthologs detected between O. sativa and A. thaliana.

Supplementary Fig. 5 Supplementary Figure 5. Distributions of evolutionary distances between paralogs in O. sativa (black bars) and A. thaliana (white bars). The distances were estimated by the Poisson-gamma correction with the shape parameter of 2.25. Even though the distributions of gene duplicates were quite similar between O. sativa and A. thaliana (Fig. 2), the process of genome evolution in each species may have been quite different. If genes are duplicated and deleted on a purely random basis without selection pressure, exponential decay of the duplicate genes over time should be observed. If a large-scale duplication event occurred, we would see a unimodal distribution that peaks at the point of the duplication event (Blanc, G and Wolfe K. H. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. The Plant Cell 16: 1667-1678). We estimated the Poisson-gamma distances (shape parameter = 2.25) for duplicate protein pairs created after the divergence event between O. sativa and A. thaliana. Here we used only paralog clusters for two members because the evolutionary distances between these paralog clusters could be calculated unambiguously. The distribution of the O. sativa proteins appeared to be a combination of the two aforementioned distributions, whereas in A. thaliana there seems to be a single large peak as noted by Blanc and Wolfe (2004), which may be characteristic of a large-scale duplication. Hence, the different patterns of duplication events are likely to have led to the similar patterns of paralog cluster sizes observed (Fig. 2).

Supplementary Fig. 5 (Cont.)

Supplementary Fig. 6 Supplementary Figure 6. Numbers of lineage-specific and other proteins in five ORF categories.

Supplementary Fig. 7 Supplementary Figure 7. Distribution of protein lengths in lineage-specific and other proteins.