Lesson 3 Bioinformatics Laboratory Text-Based Searching Lesson 3 Bioinformatics Laboratory Text-Based Searching 24-Feb-19
EMBnet European Molecular Biology Network In 1988 a network was established to link European laboratories that used bio-computing and bioinformatics in molecular biology research. In each country a national node provides local bio-computing services INN serves as Israel’s National Node Text-Based Searching 24-Feb-19
Israel National Node INN serves as Israel’s National Node Authorized by the Ministry of Science in 1990. INN is located at the Biological Computing Unit, Weizmann Institute of Science. Text-Based Searching 24-Feb-19
Bioinformatics Units at Universities In Israel, in the mid 1990s Bioinformatics Units arose at Universities to serve local needs At TAU – http://www.tau.ac.il/lifesci/bioinfo Text-Based Searching 24-Feb-19
Database Interrogation Two ways to search databases Database interrogation – searches textual information contained in header sections of database entries Database searching – searches sequence information with sequence queries Text-Based Searching 24-Feb-19
Database Interrogation Problem of EMBnet No effective way of interrogating all the resources together at a particular site, since formats differ A research project was undertaken with EMBnet to address problems inherent in interfacing complex environments – resulting in SRS – sequence retrieval system, a network browser for databases in molecular biology Text-Based Searching 24-Feb-19
SRS SRS allows any flat file database to be indexed to any other. Powerful tool that allows users to formulate queries across a range of different database types via a single interface, without having to worry about underlying data structures, query languages, etc. Sequence Retrieval System Text-Based Searching 24-Feb-19
SRS – List of Public SRS Servers Text-Based Searching 24-Feb-19
SRS – List of Public SRS Servers Text-Based Searching 24-Feb-19
Searching SRS Text-Based Searching 24-Feb-19
SRS Tutorial Text-Based Searching 24-Feb-19
Search SRS Databases Text-Based Searching 24-Feb-19
SRS Standard Query Form Text-Based Searching 24-Feb-19
SRS Standard Query Form Text-Based Searching 24-Feb-19
SRS Extended Query Form Text-Based Searching 24-Feb-19
NCBI National Center for Biotechnology Established in 1988 and located at the campus of NIH as a subdivision of NLM (National Library of Medicine) Since 1992 one of NCBI’s major tasks has been maintenance of GenBank Text-Based Searching 24-Feb-19
Entrez Entrez allows retrieval of molecular biology data and bibliographic citations from NCBI’s integrated databases Entrez, unlike SRS, does not allow customization with an institute’s preferred databases Text-Based Searching 24-Feb-19
Entrez Most records are linked to other records, within a given database and between databases Sequence databases are linked to the Medline databases so that one can move from paper to sequence and vice versa seamlessly “Neighboring” allows related papers in Medline, with similar subjects, and sequence entries, found through blast searches, to be grouped together Text-Based Searching 24-Feb-19
Entrez at NCBI Text-Based Searching 24-Feb-19 Web Servers
Entrez at NCBI Text-Based Searching 24-Feb-19
Entrez at NCBI Text-Based Searching 24-Feb-19
Entrez Pros and Cons Pros Cons Integrates reference database with sequence database seamlessly Cons Very dependent on the network link as databases being searched are in the US Text-Based Searching 24-Feb-19
GCG Software Package Similar syntax to Unix commands Write GCG in every new window to start the program Same principles for all programs: Write command and arguments Choose Parameters (default parameters) Receive an output (screen and file) Text-Based Searching 24-Feb-19
Searching with GCG Stringsearch: a simple text-search through local databases. Searching through definitions or through full annotations. The definitions contain a minimal amount of the information for each entry: accession, organism name, gene name, sequence length, date. Text-Based Searching 24-Feb-19
Searching with GCG Annotations take much longer to search through The annotations contain the complete documentation for each entry in the sequence database, including journal and author names, sequence features, comments, etc. Annotations take much longer to search through Text-Based Searching 24-Feb-19
Getting a sequence Fetch: Get a sequence file to your account using the accession number or the id code. Example: fetch hum_hbb Fetches all the files with the given accession number. Can be limited to a certain data base using database code: Example: fetch embl:u01613 Text-Based Searching 24-Feb-19
Sequence formats Different applications use different sequence format. GCG FASTA/Pearson Text-Based Searching 24-Feb-19
Changing file formats Two GCG commands are used to convert file format. tofasta formfasta Similar commands (fromembl, topir etc) Text-Based Searching 24-Feb-19