Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.

Slides:



Advertisements
Similar presentations
Genetic Map to Physical Map This activity is intended to supplement the workshop session entitled Integrating the Genetic and Physical Maps of Maize. However,
Advertisements

Maize Genetics, Genomics, Bioinformatics workshop
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara 1111, Apurva Narechania 1, Joshua Stein 1, William Spooner 1, Sharon Wei.
Why NCBI Tools are important for breeding plants studies genetically modified organism: the impossibility of intergenic crosses caused by the genetic incompatibility.
Homology Based Analysis of the Human/Mouse lncRNome
Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Gramene Scientific Advisory Board Meeting January 2005.
Chr9 A ntonio Granell IBMCP-Valencia Spain Tomato Sequencing, Madison July 2006.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Lecture 2.21 Retrieving Information: Using Entrez.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Genome Annotation BCB 660 October 20, From Carson Holt.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Dorrie Main, Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy and Don Jones.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Genetic Map to Physical Map This activity is intended to supplement the workshop session entitled “Integrating the Genetic and Physical Maps of Maize”.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
What is SGN? S GN is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such.
Maps and Markers Gramene SAB Report Jan CMap Improvements Expanded, reorganized and hidden menus New map glyphs –Number of features –Crop map –Magnify.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy, Don Jones, Dorrie Main.
A Comparative Genomics Resource for Grains. Tutorial Tips If you are viewing this tutorial with Adobe Acrobat Reader, click the "bookmarks" on the left.
Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
INDIAN INITIATIVE FOR TOMATO GENOME SEQUENCING Nagendra Singh National Research Centre on Plant Biotechnology Indian Agricultural Research Institute New.
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Overview of Bioinformatics 1 Module Denis Manley..
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
HeterochromatinEuchromatin Relative chromosome length Relative bivalent diameter X 1.23 X 1.00 Relative area Relative optical density.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
A Comparative Genomics Resource for Grains V26. Tutorial Tips If you are viewing this tutorial with Adobe Acrobat Reader, click the "bookmarks" on the.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
Transcriptomics: GeneSpring/EST integration Joe Wood.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Welcome to the combined BLAST and Genome Browser Tutorial.
CURRENT STATUS ON SEQUENCING OF CHROMOSOME 12 Mara Ercolano Ischia, 2005.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Computing challenges in working with genomics-scale data
Gramene Technical Improvements
Basics of BLAST Basic BLAST Search - What is BLAST?
University of Pittsburgh
Comparative Genomics.
Welcome to the Markers Database Tutorial
Presentation transcript:

Rice Sequence and Map Analysis Leonid Teytelman

Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences FPC Map FPC I-Map EnsEMBL Pipeline Automated Annotation Compute Farms

Rice Genome Annotation

Non-Rice Coding Sequences Maize Unigene Clusters Maize TIGR GIs Maize dbEST ESTs Barley dbEST ESTs Wheat dbEST ESTs Sorghum dbEST ESTs Aligned Data Sets: Rice CUGI BAC ends Rice JRGP/Cornell RFLP Markers Rice Coding Sequences Rice Complete CDSs Rice TIGR GIs Rice BGI EST Clusters Rice dbEST ESTs Rice BGI ESTs Rice Cornell SSRs

BLAT: search & alignment pslReps: filtering of low-quality matches e-PCR: matches based on near-identity to the PCR primers, and correct order Alignment Tools: Target Queries

BLAT: search & alignment pslReps: filtering of low-quality matches e-PCR: matches based on near-identity to the PCR primers, and correct order Alignment Tools: Target Queries

Rice Coding Sequences: BLAT search & alignment pslReps filtering of repetitive matches Accept based on percent of EST length matched Non-Rice Coding Sequences : BLAT search & alignment pslReps filtering of repetitive matches Accept based on hit length and hit frequency Rice BAC ends: BLAT search & alignment Accept based on gap length, percent of BAC end length matched, percent identity, and hit frequency. Alignment Methods:

Rice Markers: BLAT search & alignment Accept based on percent of marker length matched and the gap length in case of genomic markers. Utilize genetic map information; accept those whose genetic & physical chromosome assignment is concordant. Rice SSRs: e-PCR with default parameters, allowing 0 mismatches in the primers Alignment Methods:

Total BACs/PACs: 1,847 Total bp: 250,879,896 (250MB ) Phase 1:78 Phase 2:1,238 Phase 3:531 Annotated Phase 3:330 Annotated Genes:8,034 February 2002 BAC/PAC Dataset

Alignment Totals DATASETTOTAL COMPARED TOTAL MAPPED % MAPPED Rice Complete CDSs1, % Rice TIGR Gis12,3546,29051% Rice BGI EST Clusters24,17912,13550% Rice dbEST ESTs104,54949,77348% Rice BGI ESTs86,62340,04946% Maize Unigene Clusters10,6783,97237% Maize TIGR Gis27,6426,94125% Maize dbEST ESTs147,65738,71826% Barley dbEST ESTs148,65150,57934% Wheat dbEST ESTs 166,51349,146 29% Sorghum dbEST ESTs84,71128,04433% Rice CUGI BAC ends88,05318,26021% Rice JRGP/Cornell RFLP Markers2,6821,32049% Rice Cornell SSRs %

For each group of data sets, there is a script to automatically: Run pslReps Load results into the database Discard low-quality matches Update documentation Automating Alignments:

Comparative Maps

Same marker on multiple mapping studies Name-identity Curated evidence Sequence-based correspondences for JRGP and Cornell markers: BLAT search & alignment Utilize genetic mapping information, accepting matches on same chromosome and less than 30cM apart. Map Correspondences

curator same name sequence-based

curator same name

FPC data from CUGI, synchronized with the latest release.

Discordant

Cornell/JRGP markers mapped to sequenced clones were assigned positions on the FPC contigs.

Total:2,2724,417

EnsEMBL Pipeline in a Nutshell

Can take advantage of a compute farm EnsEMBL Pipeline Overview System for automated genome annotation Executes and keeps track of computational jobs Analysis job execution is serial, allowing stage dependencies Jobs are user-defined RepeatMaskerGenscanBlastGenomeBuilderHmmer RepeatMaskerBLATGeneWiseHmmer

Organization Utilizes and expands on the EnsEMBL-core modules and database schema Database stores: analysis program names and parameters analysis results rules for job dependencies and progress status for each job Perl modules: access the database execute specified analysis programs parse and load into the database the analysis results

Cluster Utilization How to split up tasks? Load management an scheduling (LSF, PBS, etc) Contig-by-contig approach How to execute jobs on slave nodes? Management of management: Automatic job submission Error/completion checking