The progress of Glossina genomics at RIKEN GSC Todd Taylor RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori)

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Homology Based Analysis of the Human/Mouse lncRNome
Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Sequencing Status of the Chromosome 8 and New Marker Development toward a Genetic Map Construction between Micro-Tom and Ailsa Craig SOL Genomics Workshop.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary,
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Sequencing a genome (a) outline the steps involved in sequencing the genome of an organism; (b) outline how gene sequencing allows for genome-wide comparisons.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Mouse Genome Sequencing
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Chromosome 8 Sequencing: Current Status and Future Prospects toward Finishing Shusei Sato, Erika Asamizu, Takakazu Kaneko, Hiroyuki Fukuoka, Satoshi Tabata.
Tomato genome annotation pipeline in Cyrille2
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Screening a Library Plate out library on nutrient agar in petri dishes. Up to 50,000 plaques or colonies per plate.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Genomic activities for Glossina at SANBI 2006 Win Hide, Mario Jonas and the SANBI team, University of the Western Cape South Africa.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Chapter 21 Eukaryotic Genome Sequences
RNA Sequencing I: De novo RNAseq
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Genomics.
Human Genome.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
It will help in preparing for the exam to read:
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
Finding genes in the genome
Accessing and visualizing genomics data
Drosophila Genomics Where are we now? Where are we going? Christopher Shaffer, Wilson Leung, Sarah Elgin Dept of Biology; Washington University in St.
What is BLAST? Basic BLAST search What is BLAST?
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
Welcome to the combined BLAST and Genome Browser Tutorial.
 DNA- genetic material of eukaryotes.  Are highly variable in size and complexity.  About 3.3 billion bp in humans.  Complexity- due to non coding.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
What is BLAST? Basic BLAST search What is BLAST?
The Transcriptional Landscape of the Mammalian Genome
Tomato Sequencing Project Meeting at SOL 2008, Oct. 15, 2008
Basics of BLAST Basic BLAST Search - What is BLAST?
Very important to know the difference between the trees!
The Human Genome Project
Henrik Lantz - NBIS/SciLife/Uppsala University
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Today… Review a few items from last class
Introduction to Bioinformatics II
Gene Prediction.
What do you with a whole genome sequence?
Introduction to Sequencing
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Presentation transcript:

The progress of Glossina genomics at RIKEN GSC Todd Taylor RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori) December 15, 2006, IGGI, Sanger, UK

Background Sequencing and analysis of human chromosomes 11, 18 and 21 Contributed about 4-5% of human genome sequence Sequencing and analysis of chimpanzee genomic regions including Whole-genome BAC-end sequence analysis Chimpanzee chromosome 22 Found differences (most minor) in nearly all of the coding genes between human and chimp Chimpanzee Y chromosome Development of novel methods for gene and promoter prediction Identifying genes missed by other high-throughput methods Identification of unique regulatory mechanisms

Phase III sequence-related activities BAC ends Finished BAC clones Full length cDNAs Whole-genome shotgun

BAC end sequencing The first BAC library has been constructed (Yale) and 100,000 BAC end sequences are being produced (RIKEN) Not yet We will be able to sequence the ends of up to 50,000 BACs (100,000 reads) Or possibly more if fosmid ends instead? Can start from April 2007 Will take about one month

Finished BAC clone sequencing Five BACs have been fully sequenced (RIKEN) and no serious 'issues' have arisen. VMRC29 library (CHORI) 97H16, 39G22, 36N9, 31O6, 3E11 759,387 bp GC level: 38.89% Repeat content: 6.10% Using the Drosophila fruit fly genus repeat library

file name: gmm_clones sequences: 5 total length: bp GC level: % bases masked: bp ( 6.10 %) ===================================================== number of length percentage elements occupied of sequence Retroelements bp 1.63 % SINEs: 0 0 bp 0.00 % Penelope bp 0.38 % LINEs: bp 1.01 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex bp 0.42 % R1/LOA/Jockey bp 0.15 % R2/R4/NeSL 1 51 bp 0.01 % LTR elements: bp 0.62 % BEL/Pao bp 0.03 % Gypsy/DIRS bp 0.59 % DNA transposons bp 0.57 % Tc1-IS630-Pogo bp 0.28 % Other (Mirage, bp 0.02 % P-element, Transib) Total interspersed repeats: bp 2.20 % Small RNA: bp 0.18 % Simple repeats: bp 1.67 % Low complexity: bp 2.05 % The query species was assumed to be "Drosophila fruit fly genus". Homo sapiens ( 4.08 %) Anopheles genus ( 4.52 %) Repeat Masker

Full-length cDNA sequencing Full length cDNAs for G. m morsitans (RIKEN) will be constructed and Sanger will perform a few hundred full length sequences on these. RIKEN will do some 5´ end sequencing. Full-length cDNA libraries were prepared by Junichi Watanabe (Univ. Tokyo) Sequencing of 9,462 cDNA clones (5' one pass) was recently completed

Whole-genome shotgun sequencing RIKEN has applied to Japanese sources for funding for a further 3 million shotgun sequences (~3X coverage). We failed to get the funding At present, we have no money for WGS or additional BAC finishing Will try for more Japanese-African collaborative projects looking somewhat hopeful

Library Sample Information Sequences TC Fat Body/Milk Gland 3,059 GMSG Salivary Gland 7,493 GMREReproductive1,502 GMMMidgut7,015 cDNA Full Length cDNA Sequences 190 TUM/TUF Tsetse Fly Whole Genome cDNA Libraries 9,462 Total Number of Sequences 28,721 Dataset containing ESTs and partial cDNA sequences

Strategy and results obtained from preliminary analysis 28,721 sequences were assembled into contigs and identified singletons Total Contigs made=3,857; Total Singletons= 10,213 Translated contigs and singletons into Six Reading Frames Homology searched in SwissProt and NR protein databases Annotated 2,569 ORFs out of 3,857 contigs Annotated 2,783 ORFs out of 10,213 singletons CAP3 3,857contigs30,942ORFsTranseq 10,213singletonsTranseq57,860ORFs 33% sequence identity BLAT Selected continuous ORFs containing atleast 50 amino acids

Drosophila (84%) Anopheles (2%) Aedes (3%)Others (6%) Glossina (5%) A large percent of ORFs from TseTse fly contigs resemble those of ‘fruit fly’

A large percent of ORFs from TseTse fly Singletons resemble those of ‘fruit fly’ Drosophila (81%) Anopheles (2%) Aedes (5%) Others (9%) Glossina (3%)

METABROWSER : a resource to analyse the metagenome GENEPREDICTIONFUNCTIONALANNOTATION Metagenome Analysis PipeLine USER INPUT Genomic Contigs & Sequences Query the Metagnome Data Browser BROWSE ADVANCED ANALYSIS PredictedGenes AnnotatedGenes GLIMMER GENEMARK GETORF CRITICA MetaGene BLAST INTERPROSCAN PLHOST PROSITESCAN COGs Manatee (GO) FingerPRINTscan JAFA ? HT-GO-FAT PubSearch BLIMPS (BLOCKS) Pfam MetabolicPathways ComparativeGenomics PhylogeneticClassification ProteinInteraction EnzymeClassification 16s ribosomal RNA analysis TaxonomicClassification Pathogenicityindex Origin of Replication SecondaryStructurePrediction Fold Prediction OtherAnalysis

Metagenome Data Browser : Data from our internal projects METABROWSER : a resource to analyse the metagenome Metagenome Data Browser Data Browser Genes Proteins NovelPathways ComparativeAnalysis Download Sequence NovelGenomes NovelProteins Other Related Information

Current & Future Plans Sequencing More if funding allows Analysis We can contribute to the informatics of the Glossina genome, including cDNA analysis and annotation But we don’t want to duplicate anyone’s efforts Also BES mapping and comparative analysis with Drosophila, mosquito, etc. ???

Acknowledgements Informatics (RIKEN) Tulika Prakash Srivastava Vineet K. Sharma Todd D. Taylor Sequencing & Data Access Atsushi Toyoda (RIKEN) Junichi Watanabe (Univ. Tokyo) Hiroyuki Wakaguri (Univ. Tokyo) Yamashita (Kitasato Univ.) Serap Aksoy (Yale) Geoff Attardo (Yale) Other Masahira Hattori (Univ. Tokyo/RIKEN) Yoshiyuki Sakaki (RIKEN)