An Introduction to Bioinformatics Molecular Biology Databases
AIMS OBJECTIVES To introduce the major databases - nucleotide - protein To explain how to search the appropriate databases To explain how to retrieve information from databases Choose appropriate databases for information retrieval Use of Boolean operators to search databases Retrieve nucleotide and protein sequence files
Introduction Hundreds! Databases of databases! Acronym rich! Subcomponents organisms structure metabolism……. Searched text, sequences
Historically 1960s Mary Dayhoff - Protein Sequences (Eck, R. V., and M. O. Dayhoff Atlas of Protein Sequence and Structure National Biomedical Research Foundation, Silver Spring, Maryland.) 1980s - explosion in DNA sequences EMBL (European Molecular Biology Laboratory) NIH (National Institute of Health) Genbank DDBJ (DNA database of Japan) 1988 agreed on international collaboration
Experimentally determined nucleotide sequence, Inferred protein sequence –EMBL, GenBank, DDBJ nucleotides –GenPept –PIR Protein Identification Resource proteins –SWISS-PROT Which to choose? Primary Databases }
Composite Databases SWISS-PROT + PIR + GenPept + SWISS-PROT, Swissnew, Trembl, Tremblnew, Genbank, PIR, Wormpep and PDB
Secondary Databases Analytical results of primary databases Searching for related patterns –Prosite –Pfam More on these later
Sub-Databases EST - Expressed Sequence Tags STS - Sequence Tagged Sites SNP - Single Nucleotide Polymorphisms OMIM - Online Medelian Inheritance in Man
Searching and Retrieval Entrez- National Center for Biotechnology Information SRS - European Bioinformatics Institute DBGET - Japan’s GenomeNet. Capable of retrieving specific nucleotide or protein sequence. Provide links to additional related information.
Entrez
Entrez Tutorial Search for penicillin-binding genes Search for Mycobacterium tuberculosis Combine the searches Scan the output Q/ Are there any genes that code for penicillin binding in the Mycobacterium genome? Example of a text based search to identify genes that have already been annotated.
#1 AND #2
SRS guide
Searching the Databases Subject Accession Numbers Author e.g. AF208262AF208262
Boolean Operators AND will locate all records containing both the words e.g. human AND protease OR will locate all records containing either word not necessarily both e.g. human OR protease) NOT will locate records containing one word, but NOT the other word e.g. human NOT protease