Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated.

Similar presentations


Presentation on theme: "Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated."— Presentation transcript:

1 Databases מאגרי מידע אחסון שליפה

2 DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated nuc sequences Protein domains Protein structure protein Diseases polymorhism Gene expression Prot-prot interactions Different kinds of DBs dealing with biological information retrieved by various means

3 A database is a structured collection of information. A database is composed of basic objects called records or entries ( רשומות ). Each record is composed of fields ( שדות ), which hold defined data that is related to that record. Common to all databases Let’s consider the following database of students learning bioinfo in HUJI

4 Databases A database can be thought of as a large table, where the rows represent records and the columns represent fields. CommentsGenderLast NameFirst Name ID Likes scuba diving femaleAsulinSharon 0775523/7 Comes from Cubafemale…NivNurit 020304/4 -female…SharonNurit03321/3 Father of sharon – must go home earlier male…YarkonYossi88924/5 ID (Accession Numbers): Unique identifiers of the database records.

5 What can we learn about fields? More defined (male female), less defined (comments) A better database will try to store info in well defined fields. Some records contain similar data in some of the fields For some records there is only partial information – some fields contain no data (quality of DB) Each record needs a unique identifier

6 Data Retrieval The purpose of databases is not merely to collect and organize data, but mainly to allow advanced data retrieval. A query ( שאילתא ) is a method to retrieve information from the database. The organization of each record into predetermined fields, allows us to use queries on fields.

7 The best search strategy…

8 Boolean operatorsKeywords Fields Syntax 4. Access additional entries discussing same or similar entities by links to additional databases (DBXref) 2. Choose appropriate database 3. 5. Think, evaluate. The computer is just a machine. You are (hopefully) a thinking organism. 1. Think – phrase your scientific question. Phrase your query Today

9 The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon CommentsGenderLast Name First Name Field ID Likes scuba diving femaleAsulinSharon 0775523/7 Comes from Cubafemale…NivNurit 020304/4 Receives scholarship female…SharonNurit03321/3 Proud father of sharon male…YarkonYossi88924/5 The search was not limited to a certain field Sharon[all fields] Keyword synthax (NCBI) field definition

10 OOPS !! Retrieved too many records that don’t match the required data - too much noise.

11 Found (+) Not found (-) True positive False negative Related False positive True negative Unrelated Search results “ s c i e n ti fi c t r u t h ” Evaluating Search Results

12 CommentsGenderLast NameFirst Name Field ID Likes scuba diving femaleAsulinSharon True positive 0775523/7 Comes from Cubafemale…NivNurit 020304/4 Receives scholarship female…Sharon False positive Nurit03321/3 Proud father of sharon False positive male…YarkonYossi88924/5 What can we do to reduce/eliminate false positives without reducing true positives?

13 Sensitivity Ability of a method to detect positives, irrespective of how many false positives are reported. Selectivity Ability of a method to reject negatives, irrespective of how many false negatives are rejected. SensitivitySelectivity

14 Find all students whose first name is Sharon Sharon[first name] Keyword synthax (NCBI) field definition Let’s refine our search CommentsGenderLast Name First Name ID Likes scuba diving femaleAsulinSharon 0775523/ 7 Comes from Cuba female…NivNurit 020304/4 Receives scholarship female…SharonNurit03321/3 Father of sharon – must go home earlier male…YarkonYossi88924/5

15 CommentsGenderLast Name First Name ID Likes scuba diving femaleAsulinSharom 0775523/ 7 Comes from Cuba female…NivNurit 020304/4 Receives scholarship female…SharonNurit03321/3 Father of sharon – must go home earlier male…YarkonYossi88924/5 Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise. The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.

16 cell OR cycle cell NOT cycle 1 AND 2 1 OR 2 1 NOT 2 1 1 2 2 cell AND cycle12 “cell cycle” Boolean Operators Cell* - cell, cells, cellular etc)

17 The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name. Search female[gender] AND *cuba*[comments] Keyword synthax (NCBI) field definition Boolean operator CommentsGenderLast Name First Name Field ID Likes scuba diving – false positive femaleAsulinSharon 0775523/7 Comes from Cuba true positive female…NivNurit 020304/4 Receives scholarship female…SharonNurit03321/3 Proud father of sharon male…YarkonYossi88924/5

18 והעיקר, והעיקר : לא לפחד כלל


Download ppt "Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated."

Similar presentations


Ads by Google