Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated.

Similar presentations


Presentation on theme: "Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated."— Presentation transcript:

1 Databases מאגרי מידע אחסון שליפה

2 DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated nuc sequences Protein domains Protein structure protein Diseases polymorhism Gene expression Prot-prot interactions Different kinds of DBs dealing with biological information retrieved by various means

3 A database is a structured collection of information. A database is composed of basic objects called records or entries ( רשומות ). Each record is composed of fields ( שדות ), which hold defined data that is related to that record. Common to all databases Let’s consider the following database of students learning bioinfo in HUJI

4 A database can be thought of as a large table, where the rows represent records and the columns represent fields. Databases CommentsGenderLast NameFirst Name ID Likes scuba diving femaleAsulinSharon 0775523/7 Comes from CubafemaleNivNurit 020304/4 -femaleSharonNurit 03321/3 Father of sharon – must go home earlier maleYarkonYossi 88924/5 ID (Accession Numbers): Unique identifiers of the database records. Each record has unique identifier For some records there is only partial information – some fields contain no data (quality of DB) Some records contain similar data in some of the fields

5 Data Retrieval The purpose of databases is not merely to collect and organize data, but mainly to allow advanced data retrieval. A query ( שאילתא ) is a method to retrieve information from the database. The organization of each record into predetermined fields, allows us to use queries on fields.

6 The best search strategy…

7 1. Think – phrase your scientific question. 2. Choose appropriate database Boolean operatorsKeywords Fields Syntax Phrase your query 4. Access additional entries discussing same or similar entities by links to additional databases. 5. Think, evaluate. The computer is just a machine. You are (hopefully) a thinking organism.

8 Terms/words for search [field] + (BOLLEAN OPERATORS) Terms/words for Search [field] Phrasing a query…

9 cell OR cycle cell NOT cycle 1 AND 2 1 OR 2 1 NOT 2 1 1 2 2 cell AND cycle12 “cell cycle” Boolean Operators Cell* - cell, cells, cellular etc)

10 The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon CommentsGenderLast Name First Name Field ID Likes scuba diving femaleAsulinSharon 0775523/7 Comes from CubafemaleNivNurit 020304/4 Receives scholarship femaleSharonNurit03321/3 Proud father of sharon maleYarkonYossi88924/5 The search was not limited to a certain field Sharon[all fields]

11 OOPS !! Retrieved too many records that don’t match the required data - too much noise.

12 Found (+) Not found (-) True positive False negative Related False positive True negative Unrelated Search results “ s c i e n ti fi c t r u t h ” Evaluating Search Results

13 CommentsGenderLast NameFirst Name Field ID Likes scuba diving femaleAsulinSharon True positive 0775523/7 Comes from CubafemaleNivNurit 020304/4 Receives scholarship femaleSharon False positive Nurit03321/3 Proud father of sharon False positive maleYarkonYossi88924/5 What can we do to reduce/eliminate false positives without reducing true positives?

14 Sensitivity Ability of a method to detect positives, irrespective of how many false positives are reported. Selectivity Ability of a method to reject negatives, irrespective of how many false negatives are rejected. SensitivitySelectivity

15 Find all students whose first name is Sharon Sharon[first name] Keyword synthax (NCBI) field definition Let’s refine our search CommentsGenderLast Name First Name ID Likes scuba diving femaleAsulinSharon 0775523/ 7 Comes from Cuba femaleNivNurit 020304/4 Receives scholarship femaleSharonNurit03321/3 Father of sharon – must go home earlier maleYarkonYossi88924/5

16 CommentsGenderLast Name First Name ID Likes scuba diving femaleAsulinSharom 0775523/ 7 Comes from Cuba femaleNivNurit 020304/4 Receives scholarship femaleSharonNurit03321/3 Father of sharon – must go home earlier maleYarkonYossi88924/5 Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise. The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.

17 The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name. Search female[gender] AND *cuba*[comments] Keyword synthax (NCBI) field definition Boolean operator CommentsGenderLast Name First Name Field ID Likes scuba diving – false positive femaleAsulinSharon 0775523/7 Comes from Cuba true positive femaleNivNurit 020304/4 Receives scholarship femaleSharonNurit03321/3 Proud father of sharon maleYarkonYossi88924/5

18 והעיקר, והעיקר : לא לפחד כלל


Download ppt "Databases מאגרי מידע אחסון שליפה. DNARNA cDNA ESTs Non-coding RNA phenotype DNA sequences (individual genes or complete genomes) Protein sequences Translated."

Similar presentations


Ads by Google