Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dialog Databases Structure & Indexing Dr. Dania Bilal IS 530 Fall 2009.

Similar presentations


Presentation on theme: "Dialog Databases Structure & Indexing Dr. Dania Bilal IS 530 Fall 2009."— Presentation transcript:

1 Dialog Databases Structure & Indexing Dr. Dania Bilal IS 530 Fall 2009

2 Definition A database is a collection of information organized in a way that a computer program can quickly retrieve desired pieces of data.

3 Database Components FieldsRecordsFiles

4 Database Fields Pieces of information a user can access Author Author Title Title Journal name Journal name Abstract Abstract Descriptors Descriptors Other Other

5 Fields Attributes Numeric (e.g., accession number) Textual (e.g., author name)

6 Data Structure A scheme for organizing related pieces of information. Basic types of data structures Files Files Records Records Trees Trees Tables Tables

7 Files File A collection of records A collection of records In Dialog, a file also refers to a specific database In Dialog, a file also refers to a specific database Every file/database has a number and/or a name ERIC is a database with a file no. 1 in Dialog. ERIC is a database with a file no. 1 in Dialog.

8 Records Record A collection of fields which constitutes a complete set of information A collection of fields which constitutes a complete set of information Author, title, journal name, abstract, etc. A collection of records constitutes a file. A collection of records constitutes a file.

9 Trees Data is organized in a hierarchical structure Each element is attached to one or more elements that is directly beneath it. Each element is attached to one or more elements that is directly beneath it. Connections between elements ->branches Connections between elements ->branches Elements at bottom of a tree with no elements below them -> leaves Elements at bottom of a tree with no elements below them -> leaves Example: Yahoo directory. Example: Yahoo directory.

10 Tables Data is organized in rows and columns Example: Excel spreadsheet Example: Excel spreadsheet Relational database management systems store data in the form of related tables Aleph system (Hodges online catalog) is based on a relational database management system called Oracle. Aleph system (Hodges online catalog) is based on a relational database management system called Oracle.

11 Dialog Database Documents or surrogates are stored in a linear file Example of linear organization is cassette tape Example of linear organization is cassette tape Access to songs on the tape is not “direct” or “random” in nature. Access to songs on the tape is not “direct” or “random” in nature. Linear file is transformed into an inverted file (in Dialog)

12 Dialog Database Structure Linear file Composed of document surrogates (abstracts) stored in their full, original form. Composed of document surrogates (abstracts) stored in their full, original form. Inverted file Composed of all words included in document surrogates excluding stop words. Composed of all words included in document surrogates excluding stop words.

13 Problem with Linear File Documents or surrogates will have to be searched in their entirety to locate specific information needed. Slow Slow Inefficient Inefficient Access to information may cause frustration Access to information may cause frustration

14 Inverted File Words in all document surrogates can be searched instead of the whole text of the documents themselves Music CD is an analogy to an inverted structure. Music CD is an analogy to an inverted structure. Divided into tracks Divided into tracks Random and direct access to each track is easy Random and direct access to each track is easy Faster access to information

15 Dialog Inverted File A list of words in each document surrogate is made. Each word is numbered, including phrases and excluding stop words (the, a, an, etc.). Words that are numbered are alphabetized (numbers precede letters).

16 Dialog Inverted File Alphabetized entries are followed by document number (based on its acquisition and addition to database) document number (based on its acquisition and addition to database) field entry or entries appeared in field entry or entries appeared in Author field Title field Abstract field Descriptor field Other fields, as applicable

17 Linear File: Example 101. The origins of Don Giovanni. Discusses the history and sources Mozart used in his opera Don Giovanni. DE: Mozart, Opera, Historical Analysis.

18 Inverted File 101. The origins of Don Giovanni. Discusses the history and sources Mozart used in his opera Don Giovanni. DE: Mozart, Opera, Historical Analysis. WordDoc no.FieldWord sequence Origins101Ti2 Don101Ti4 Giovanni101Ti5 Discusses101Ab1 History101Ab3 Sources101Ab5 Mozart101Ab6 Used101Ab7

19 Inverted File Cont’d. 101. The origins of Don Giovanni. Discusses the history and sources Mozart used in his opera Don Giovanni. DE: Mozart, Opera, Historical Analysis. WordDoc no.FieldWord sequence Mozart101DE1 Opera101DE2 Historical101DE3 Analysis101DE4 Historical Analysis101DE3,4

20 Indexing Words (keywords) Every important word in a document is indexed Every important word in a document is indexed Example: Historical analysis Example: Historical analysis Indexed as 2 separate words and as a phrase Historical (word) Analysis (word) Historical analysis (phrase)

21 Google Indexing

22 Example 1. Google Phrase/Sentence Indexing.

23 Example 2. Google Phrase/Keywords Indexing.

24 Example 3. Google Natural Language Search and Retrieval???

25 Demos Dialog - ERIC database EBSCO - ERIC database Discussion of differences in interface features


Download ppt "Dialog Databases Structure & Indexing Dr. Dania Bilal IS 530 Fall 2009."

Similar presentations


Ads by Google