The progress of Glossina genomics at RIKEN GSC Todd Taylor RIKEN Genomic Sciences Center, Yokohama, Japan (on behalf of Masahira Hattori) December 15, 2006, IGGI, Sanger, UK
Background Sequencing and analysis of human chromosomes 11, 18 and 21 Contributed about 4-5% of human genome sequence Sequencing and analysis of chimpanzee genomic regions including Whole-genome BAC-end sequence analysis Chimpanzee chromosome 22 Found differences (most minor) in nearly all of the coding genes between human and chimp Chimpanzee Y chromosome Development of novel methods for gene and promoter prediction Identifying genes missed by other high-throughput methods Identification of unique regulatory mechanisms
Phase III sequence-related activities BAC ends Finished BAC clones Full length cDNAs Whole-genome shotgun
BAC end sequencing The first BAC library has been constructed (Yale) and 100,000 BAC end sequences are being produced (RIKEN) Not yet We will be able to sequence the ends of up to 50,000 BACs (100,000 reads) Or possibly more if fosmid ends instead? Can start from April 2007 Will take about one month
Finished BAC clone sequencing Five BACs have been fully sequenced (RIKEN) and no serious 'issues' have arisen. VMRC29 library (CHORI) 97H16, 39G22, 36N9, 31O6, 3E11 759,387 bp GC level: 38.89% Repeat content: 6.10% Using the Drosophila fruit fly genus repeat library
file name: gmm_clones sequences: 5 total length: bp GC level: % bases masked: bp ( 6.10 %) ===================================================== number of length percentage elements occupied of sequence Retroelements bp 1.63 % SINEs: 0 0 bp 0.00 % Penelope bp 0.38 % LINEs: bp 1.01 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex bp 0.42 % R1/LOA/Jockey bp 0.15 % R2/R4/NeSL 1 51 bp 0.01 % LTR elements: bp 0.62 % BEL/Pao bp 0.03 % Gypsy/DIRS bp 0.59 % DNA transposons bp 0.57 % Tc1-IS630-Pogo bp 0.28 % Other (Mirage, bp 0.02 % P-element, Transib) Total interspersed repeats: bp 2.20 % Small RNA: bp 0.18 % Simple repeats: bp 1.67 % Low complexity: bp 2.05 % The query species was assumed to be "Drosophila fruit fly genus". Homo sapiens ( 4.08 %) Anopheles genus ( 4.52 %) Repeat Masker
Full-length cDNA sequencing Full length cDNAs for G. m morsitans (RIKEN) will be constructed and Sanger will perform a few hundred full length sequences on these. RIKEN will do some 5´ end sequencing. Full-length cDNA libraries were prepared by Junichi Watanabe (Univ. Tokyo) Sequencing of 9,462 cDNA clones (5' one pass) was recently completed
Whole-genome shotgun sequencing RIKEN has applied to Japanese sources for funding for a further 3 million shotgun sequences (~3X coverage). We failed to get the funding At present, we have no money for WGS or additional BAC finishing Will try for more Japanese-African collaborative projects looking somewhat hopeful
Library Sample Information Sequences TC Fat Body/Milk Gland 3,059 GMSG Salivary Gland 7,493 GMREReproductive1,502 GMMMidgut7,015 cDNA Full Length cDNA Sequences 190 TUM/TUF Tsetse Fly Whole Genome cDNA Libraries 9,462 Total Number of Sequences 28,721 Dataset containing ESTs and partial cDNA sequences
Strategy and results obtained from preliminary analysis 28,721 sequences were assembled into contigs and identified singletons Total Contigs made=3,857; Total Singletons= 10,213 Translated contigs and singletons into Six Reading Frames Homology searched in SwissProt and NR protein databases Annotated 2,569 ORFs out of 3,857 contigs Annotated 2,783 ORFs out of 10,213 singletons CAP3 3,857contigs30,942ORFsTranseq 10,213singletonsTranseq57,860ORFs 33% sequence identity BLAT Selected continuous ORFs containing atleast 50 amino acids
Drosophila (84%) Anopheles (2%) Aedes (3%)Others (6%) Glossina (5%) A large percent of ORFs from TseTse fly contigs resemble those of ‘fruit fly’
A large percent of ORFs from TseTse fly Singletons resemble those of ‘fruit fly’ Drosophila (81%) Anopheles (2%) Aedes (5%) Others (9%) Glossina (3%)
METABROWSER : a resource to analyse the metagenome GENEPREDICTIONFUNCTIONALANNOTATION Metagenome Analysis PipeLine USER INPUT Genomic Contigs & Sequences Query the Metagnome Data Browser BROWSE ADVANCED ANALYSIS PredictedGenes AnnotatedGenes GLIMMER GENEMARK GETORF CRITICA MetaGene BLAST INTERPROSCAN PLHOST PROSITESCAN COGs Manatee (GO) FingerPRINTscan JAFA ? HT-GO-FAT PubSearch BLIMPS (BLOCKS) Pfam MetabolicPathways ComparativeGenomics PhylogeneticClassification ProteinInteraction EnzymeClassification 16s ribosomal RNA analysis TaxonomicClassification Pathogenicityindex Origin of Replication SecondaryStructurePrediction Fold Prediction OtherAnalysis
Metagenome Data Browser : Data from our internal projects METABROWSER : a resource to analyse the metagenome Metagenome Data Browser Data Browser Genes Proteins NovelPathways ComparativeAnalysis Download Sequence NovelGenomes NovelProteins Other Related Information
Current & Future Plans Sequencing More if funding allows Analysis We can contribute to the informatics of the Glossina genome, including cDNA analysis and annotation But we don’t want to duplicate anyone’s efforts Also BES mapping and comparative analysis with Drosophila, mosquito, etc. ???
Acknowledgements Informatics (RIKEN) Tulika Prakash Srivastava Vineet K. Sharma Todd D. Taylor Sequencing & Data Access Atsushi Toyoda (RIKEN) Junichi Watanabe (Univ. Tokyo) Hiroyuki Wakaguri (Univ. Tokyo) Yamashita (Kitasato Univ.) Serap Aksoy (Yale) Geoff Attardo (Yale) Other Masahira Hattori (Univ. Tokyo/RIKEN) Yoshiyuki Sakaki (RIKEN)