Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.

Similar presentations


Presentation on theme: "Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences."— Presentation transcript:

1 Rice Sequence and Map Analysis Leonid Teytelman

2 Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences FPC Map FPC I-Map EnsEMBL Pipeline Automated Annotation Compute Farms

3 Rice Genome Annotation

4 Non-Rice Coding Sequences Maize Unigene Clusters Maize TIGR GIs Maize dbEST ESTs Barley dbEST ESTs Wheat dbEST ESTs Sorghum dbEST ESTs Aligned Data Sets: Rice CUGI BAC ends Rice JRGP/Cornell RFLP Markers Rice Coding Sequences Rice Complete CDSs Rice TIGR GIs Rice BGI EST Clusters Rice dbEST ESTs Rice BGI ESTs Rice Cornell SSRs

5 BLAT: search & alignment pslReps: filtering of low-quality matches e-PCR: matches based on near-identity to the PCR primers, and correct order Alignment Tools: Target Queries

6 BLAT: search & alignment pslReps: filtering of low-quality matches e-PCR: matches based on near-identity to the PCR primers, and correct order Alignment Tools: Target Queries

7 Rice Coding Sequences: BLAT search & alignment pslReps filtering of repetitive matches Accept based on percent of EST length matched Non-Rice Coding Sequences : BLAT search & alignment pslReps filtering of repetitive matches Accept based on hit length and hit frequency Rice BAC ends: BLAT search & alignment Accept based on gap length, percent of BAC end length matched, percent identity, and hit frequency. Alignment Methods:

8 Rice Markers: BLAT search & alignment Accept based on percent of marker length matched and the gap length in case of genomic markers. Utilize genetic map information; accept those whose genetic & physical chromosome assignment is concordant. Rice SSRs: e-PCR with default parameters, allowing 0 mismatches in the primers Alignment Methods:

9 Total BACs/PACs: 1,847 Total bp: 250,879,896 (250MB ) Phase 1:78 Phase 2:1,238 Phase 3:531 Annotated Phase 3:330 Annotated Genes:8,034 February 2002 BAC/PAC Dataset

10 Alignment Totals DATASETTOTAL COMPARED TOTAL MAPPED % MAPPED Rice Complete CDSs1,35850537% Rice TIGR Gis12,3546,29051% Rice BGI EST Clusters24,17912,13550% Rice dbEST ESTs104,54949,77348% Rice BGI ESTs86,62340,04946% Maize Unigene Clusters10,6783,97237% Maize TIGR Gis27,6426,94125% Maize dbEST ESTs147,65738,71826% Barley dbEST ESTs148,65150,57934% Wheat dbEST ESTs 166,51349,146 29% Sorghum dbEST ESTs84,71128,04433% Rice CUGI BAC ends88,05318,26021% Rice JRGP/Cornell RFLP Markers2,6821,32049% Rice Cornell SSRs52422844%

11

12 For each group of data sets, there is a script to automatically: Run pslReps Load results into the database Discard low-quality matches Update documentation Automating Alignments:

13

14

15 Comparative Maps

16 Same marker on multiple mapping studies Name-identity Curated evidence Sequence-based correspondences for JRGP and Cornell markers: BLAT search & alignment Utilize genetic mapping information, accepting matches on same chromosome and less than 30cM apart. Map Correspondences

17 curator same name sequence-based

18 curator same name

19 FPC data from CUGI, synchronized with the latest release.

20 Discordant

21 Cornell/JRGP markers mapped to sequenced clones were assigned positions on the FPC contigs.

22 Total:2,2724,417

23 EnsEMBL Pipeline in a Nutshell

24 Can take advantage of a compute farm EnsEMBL Pipeline Overview System for automated genome annotation Executes and keeps track of computational jobs Analysis job execution is serial, allowing stage dependencies Jobs are user-defined RepeatMaskerGenscanBlastGenomeBuilderHmmer RepeatMaskerBLATGeneWiseHmmer

25 Organization Utilizes and expands on the EnsEMBL-core modules and database schema Database stores: analysis program names and parameters analysis results rules for job dependencies and progress status for each job Perl modules: access the database execute specified analysis programs parse and load into the database the analysis results

26 Cluster Utilization How to split up tasks? Load management an scheduling (LSF, PBS, etc) Contig-by-contig approach How to execute jobs on slave nodes? Management of management: Automatic job submission Error/completion checking


Download ppt "Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences."

Similar presentations


Ads by Google