ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs: Class-I non-long terminal repeat retrotransposons (non-LTRs), by building a semi-automated pipeline 9. Initially we conducted BLAST searches to find the similarity to the known non-LTRs using amino acid sequences of Reverse-Transcriptase (RT) of known non-LTRs as the starting queries 5,6. Consequently Blast- hits (DNA sequences) were combined and extracted utilizing PERL scripts, to obtain non-LTR candidates of Culex. These sequences were than assembled using SEQMAN module of DNA-STAR, manually truncated, adjusted, and annotated. Annotation was done by two steps: I.- we annotated all the sequences using BLAST to nr database (NCBI), and identified some of Culex non-LTR consensuses as belonging to known non-LTR families; II.- we conducted phylogenetic analysis on all Culex non-LTRs, allowing us to further annotate our consensus sequences. Some of the elements were deteriorated and not possible to classify as a specific clade. Upon completing preliminary annotation, a copy number of each element in the genome within the threshold was found. Comparison between Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, has shown different non-LTR clade composition, suggesting different evolutionary development of these species. INTRODUCTION Culex quinquefasciatus is an important vector of human pathogens in the United States and world-wide, including West Nile encephalitis and lymphatic filariases. Genomic analysis can help us better understand the adapting capabilities of this mosquito to various climatic environments and to the parasite. A significant part of any eukaryotic genome consists of the various types of repeats, including DNA and RNA Transposable Elements (TEs). The presence of TEs makes genomes difficult to assemble because of their repetitive nature and mobile activity. Thus it is one of the essential tasks of any genome project to annotate and characterize TEs. The recent Culex quinquefasciatus genome sequencing project provided us an opportunity to identify and annotate non-LTR retrotransposons. RESULTS Phylogram produced by PhyloDraw 7 visualizing tool, using as input Multiple Alignment file created by ClustalX 8 (N.J. algorithm) number of elements in: non-LTR clade Aedes aegypti 3 Anopheles gambiae 2 Culex quinquefasciatus L CR L I Jockey LOA 7 9 RTE 626 Loner 432 R CM-gag 2 Outcast 1511 R4 11 non-LTR% of thecopy cladegenomenumber Jockey unclassified CM-gag RTE CR L R Loner LOA L I total genomenon-LTR %Total TE % Aedes aegypti Anopheles gambiae Culex quinquefasciatus4.827 This Work Was Supported by the US National Institute of Allergy and Infectious Diseases (NIAID) contract HHSN C. REFERENCES 1. R. Holt, et al., The Genome Sequence of the Malaria Mosquito Anopheles gambiae, Science, 298: , J. Biedler, Z. Tu, Non-LTR Retrotransposons in the African Malaria Mosquito, Anopheles gambiae: Unprecedented Diversity and Evidence of Recent Activity. Molecular Biology and Evolution, 20(11): , V. Nene, et al., Genome sequence of Aedes aegypti, a Major Arbovirus Vector. Science, 316:1718, D. Lawson, et al., VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Research, 37:D58307, Repbase TEfam PhiloDraw 8. ClustalX2: clustalx win 9. VectorBase. CR1 L2 Jockey I Loner R1 RTE L1 LOA DISCUSSION Using only protein sequences in our semi-automated pipeline as starting queries, a large portion of elements (for which protein sequences were not available from TEfam 6 or Repbase 5 ) was overlooked. This problem was fixed by adding DNA sequences as BLAST queries to our pipeline, and we were able to identify and classify most of the overlooked elements. There is a rich diversity of non-LTRs present in Culex quinquefasciatus genome. Although there is no evidence of Outcast and R4 clades members in C. quinquefasciatus genome, there is a CM-gag, a unique Gag-only non-LTR retrotransposon, and LOA (which is not present in A. gambiae). Non-LTR clades vary widely in copy number. Jockey, CR1 and CM-gag have thousands of copies, while I, L2, LOA, Loner and R1 have only hundreds. Jockey contributes more to the genome size then any other non-LTR clade, 1.76% of the genome. The total non-LTR number makes up 4.8% of the Culex genome. CONCLUSIONS Using a semi-automated pipeline approach we identified 9 non-LTR clades in Culex quinquefasciatus genome. Phylogenetic analysis classifies C. quinquefasciatus non-LTR clades representatives, in the same way as semi-automated pipeline does. This supports the correctness of the semi-automated pipeline. L1, CR1, and Jockey clades have a wide variety of elements and a high copy number in the genome, which suggests the recent non-LTR activity. ACKNOWLEDGEMENTS We thank James Biedler, Vladimir Kapitonov, Scott Christley, Karine Mouline, members of Frank H. Collins and Nora J. Besansky labs and VectorBase for helpful discussions and support. Comparison of number of elements per clade within three mosquito genomes. Comparative contribution of TEs to mosquito the genome sizes Culex quinquefasciatus non-LTR: genome distribution Fig.1 Phylogenetic analysis classifies Culex quinquefasciatus non-LTR clades same way as semi-automated pipeline does. (C. quinquefasciatus non-LTRs indicated as light green leaves.) FUTURE GOALS Identify all possible protein sequences of the elements and conduct phylogenetic analysis. Identify, if possible, active non-LTRs in Culex quinquefasciatus genome. Bioinformatic detection and annotation of non-LTR retrotransposons in the Culex quinquefasciatus mosquito genome. Maria F. Unger ×, Ryan C. Kennedy *, Jenica L. Abrudan ×, Peter Arensburger ¤, Greg Madey * ×, Frank H. Collins × * × Eck Institute of Global Health, University of Notre Dame * Department of Computer Science & Engineering, University of Notre Dame ¤ Department of Entomology, University of California, Riverside