Download presentation
Presentation is loading. Please wait.
1
Final Exam Review SIMS 202 Profs. Hearst & Larson UC Berkeley SIMS Fall 2000
2
Final Exam l Monday Dec 11 –9:30-12:30 –Room 202 and 205 l Bring –Pens/pencils –Calculator –Notes/Books (optional)
3
Final Exam l Topics –Comprehensive, but –Emphasis on materials since the midterm l Types of questions –Similar to those on homeworks and the midterm, but less time-consuming –Probably a design problem.
4
Relationships among Language, Concepts, and Categories
5
Symbols and Language l Abstract concepts are difficult to express in a computer. l Combinations of abstract concepts are even more difficult to express: –time –shades of meaning –social and psychological concepts –causal relationships
6
Symbols and Language As the man walks the cavorting dog, thoughts arrive unbidden of the previous spring, so unlike this one, in which walking was marching and dogs were baleful sentinels outside unjust halls. What is the relation between the symbols and the meaning?
7
Symbols and Language l Language only hints at meaning. l Most meaning of text lies within our minds and common understanding. –“How much is that doggy in the window?” »how much: social system of barter and trade (not the size of the dog) »“doggy” implies childlike, plaintive, probably cannot do the purchasing on their own »“in the window” implies behind a store window, not really inside a window, requires notion of window shopping
8
Lexical Relations l Conceptual relations link concepts l Lexical relations link words l How do they differ? l How are they similar?
9
Major Lexical Relations l Synonymy l Polysemy l Metonymy l Hyponymy/Hyperonymy l Meronymy l Antonymy
10
Relationships among Meanings l Homonymy: same word, different meanings –bank (river bank) vs bank (financial institution) l Polysemy: same word, different senses of meaning –slightly different concepts expressed similarly –bank (institution vs building) l Synonyms: different words, related senses of meanings –different ways to express similar concepts –jail, prison, penitentiary
11
Defining Category Membership Necessary and Sufficient Conditions: –(This used to be a very influential definition of category membership; it is ok for math and logic but out-of-date for human categories) –Every condition must be met. –No other conditions can be required.
12
Can category membership be crisply defined? What are the necessary and sufficient conditions for something to be a game?
13
Properties of Categorization l Family Resemblance –Members of a category may be related to one another without all members having any property in common. »Instead, they may share a large subset of traits. »Some attributes are more likely given that others have been seen. –Example: feathers, wings, twittering,... »Likely to be a bird, but not all features apply to “emu” »Unlikely to see an association with “barks”
14
Properties of Categorization l Centrality –Some members of a category may be “better examples” than others. »Example: robins vs. chickens vs. emus »Exampe: soccer vs. gambling vs. hopscotch
15
Properties of Categorization l Characteristic Features –Perceived degree of category membership has to do with which features define the category. –Members usually do not have ALL the necessary features, but have some subset. –Those members that have more of the central features are seen as more central members. –People have conceptions of typical members.
16
Three Psychologically Primary Levels SUPERORDINATE animal furniture BASIC LEVEL dog chair SUBORDINATE terrier rocker l Children take longer to learn superordinate l Superordinate not associated with mental images or motor actions l How related to –Hyponymy –Hyperonymy
17
Characteristics of Basic-level Categories Language –People name things more readily at basic level. –Name learned earliest in childhood. –Languages have simpler names at basic level. –Sounds like the “real name”. –Name used more frequently. »Strange to call a dime a coin, a metal object –Names used in neutral context. »There’s a dog on the porch. »There’s a terrier on the porch.
18
Characteristics of Basic-level Categories Concepts –Things perceived more holistically at the basic level (rather than by parts). –People interact with basic and more specific levels similarly. –Things are remembered more readily at basic level. –Folk biological categories correspond accurately to scientific biological categories only at the basic level.
19
Metadata
20
Metadata Topics l What is metadata? l Controlled vocabularies / indexing languages l Metadata standards –Dublin Core –XML –etc l Thesaurus creation and use l Classification structure –Descriptors vs subject headings –Hierarchies vs facets
21
Metadata l Metadata is: – “data about data” (term usage database systems) –Information about Information –Structures and Languages for the Description of Information Resources and their elements (components or features) –“Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)
22
Type of Metadata systems and standards l Naming and ID systems – URLs, ISBNs l Bibliographic description – MARC, Dublin Core, TEI, etc. l Music -- SMDL l Images and objects – CIMI, VRA Core Categories l Numeric Data – DDI, SDSM l Geospatial Data – FGDC l Collections – EAD
23
Types of Indexing Languages l Uncontrolled Keyword Indexing l Indexing Languages –Controlled, but not structured l Thesauri –Controlled and Structured l Classification Systems –Controlled, Structured, and Coded l Faceted Classification Systems
24
Controlled Vocabularies l Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.
25
What is a “Controlled Vocabulary” l “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) l Similarly, there are too many ways of expressing or explaining the topic of a document. l Controlled vocabularies are sets of Rules for topic identification and indexing, and a THESAURUS, which consists of “lead-in vocabulary” and an limited and selective “Indexing Language” sometimes with special coding or structures.
26
Uses of Controlled Vocabularies l Library Subject Headings, Classification and Authority Files. l Commercial Journal Indexing Services and databases l Yahoo, and other Web classification schemes l Online and Manual Systems within organizations –SunSolve –MacArthur
27
Indexing Languages l An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. l An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.
28
The Indexing Process l Concept identification l term selection (via thesaurus) l term assignment
29
Application: The Indexing Process (Manual) Is Term suitable NO Select Alternative term to represent Concept Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Preferred Term Establish Term Denoting Concept Examine Document and Identify Significant Concepts Consider First Concept Preferred Term? Start NO YES Does Thesaurus contain term for Concept Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End Adapted from ISO 5963, p.5
30
Metadata Standards
31
The problem l Proliferation of the forms of names –Different names for the same person –Different people with the same names
32
Bibliographic Description l MARC (Machine Readable Cataloging) l DUBLIN CORE –Warwick Framework for Dublin Core Metadata l GILS (Government Information Locator Service) l RFC 1807 (Format for Bibliographic Records) l RDF (Resource Description Format)
33
Images and Objects l Categories for the Description of Works of Art (Getty Art Institute) l Consortium for the Computer Interchange of Museum Information (CIMI) l RLG REACH Element Set (for Shared Description of Museum Objects) l VRA Core Categories (Visual Resources Association)
34
Collection Level Descriptors l EAD (Encoded Archival Description) l Z39.50 Profile for Access to Digital Collections l RSLP Collection Description (Research Support Libraries Programme)
35
Dublin Core l Simple metadata for describing internet resources. l For “Document-Like Objects” l 15 Elements.
36
Dublin Core Elements l Title l Creator l Subject l Description l Publisher l Other Contributors l Date l Resource Type l Format l Resource Identifier l Source l Language l Relation l Coverage l Rights Management
37
Source l Label: SOURCE l The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.
38
The Same Item in Different Metadata Systems l ISBD l Dublin Core l RFC 1807 l TEI Header l MARC Record
39
ISBD Punctuation l Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).
40
Bibliographic Record l Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).
41
MARC Record (display) l ID:DCLC9124851-B RTYP:c ST:p FRN: MS:c EL: AD:06-20-91 l CC:9110 BLT:am DCF:a CSC: MOD: SNR: ATC: UD:04-11-92 l CP:cou L:eng INT: GPC: BIO: FIC:0 CON:b l PC:s PD:1992/ REP: CPI:0 FSI:0 ILC:a II:1 l MMD: OR: POL: DM: RR: COL: EML: GEN: BSE: l 010 9124851 l 020 0872878112 (cloth) l 020 0872879674 (paper) l 040 DLC$cDLC$dDLC l 050 00 Z693$b.W94 1991 l 082 00 025.3$220 l 100 1 Wynar, Bohdan S. l 245 10 Introduction to cataloging and classification /$cBohdan S. Wynar. l 250 8th ed. /$bArlene G. Taylor. l 260 Englewood, Colo. :$bLibraries Unlimited,$c1992. l 300 xvii, 633 p. :$bill. ;$c24 cm. l 440 0 Library science text series l 504 Includes bibliographical references (p. 591-599) and index. l 650 0 Cataloging. l 650 0 Subject cataloging. l 650 0 Classification$xBooks. l 630 00 Anglo-American cataloguing rules. l 700 10 Taylor, Arlene G.,$d1941-
42
Conditions of Authorship? l Single person or single corporate entity l Unknown or anonymous authors –Fictitiously ascribed works l Shared responsibility l Collections or editorially assembled works l Works of mixed responsibility (e.g. translations) l Related Works
43
Name Authority Files ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973 Different names for the same person
44
Name Authority Files ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)
45
Name authority files ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name
46
Other Types of Controlled Vocabularies l Gazetteers (Geographic Names) l Code lists (e.g. LC Language Codes) l Subject Heading Lists l Classification Schemes l Thesauri
47
What is SGML/XML? l A. SGML stands for Standard Generalized Markup Language –XML stands for eXtended Markup Language l B. What it is NOT: –Not a visual document description –Not an application specific markup –Not proprietary
48
What is SGML/XML? l What it is: –An international standard (SGML- ISO 8879:1986) –A generic language for describing the structure of documents, and markup that can be used for those documents –Intended for generating markup for content rather than form elements l XML is a simplified subset of SGML (being established by W3C)
49
XML l Extensible Markup Language –a simplification of SGML, the Standard Generalized Markup Language –instead of a fixed set of format-oriented tags like HTML, XML allows you to create the schema -- whatever set of tags are needed -- for your information type or application –this makes any XML instance “self-describing” and easily understood by computers and people l Version 1.0 ratified by W3C in 2/98; backed by Microsoft, Sun, Netscape, many others Source Dr. Robert J Glushko
50
HTML Airline Schedule Seen “By Computer” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $368.50 Source Dr. Robert J Glushko
51
Airline Schedule in XML San Francisco 9:30 AM Honolulu 12:30 PM 368.50 Source Dr. Robert J Glushko
52
SGML/XML Structure l An SGML document consists of three parts: –The SGML Declaration –The Document Type Definition (DTD) –The Document Instance l An XML document requires only the document instance, but for effective processing a DTD is important.
53
Document Type Definitions l The DTD describes the structural elements and "shorthand" markup for a particular document type. It defines: –Names of "legal" elements –How many times elements can appear –The order of elements in a document –Whether markup can be omitted (SGML only) –Contents of elements (i.e., nested structures) –Attributes associated with elements –Names of "entities" –short-hand conventions for element tags. (SGML only)
54
DTD Components l The major components of a DTD are: –Entity Declarations –Element Declarations –Attribute Declarations
55
Thesauri l A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms
56
Thesauri (cont.) l Examples: –The ERIC Thesaurus of Descriptors –The Art and Architecture Thesaurus –The Medical Subject Headings (MESH) of the National Library of Medicine
57
Why develop a thesaurus? l To provide a conceptual structure or “space” for a body of information –To make it possible to adequately describe the topical contents of informational objects at an appropriate level of generality or specificity –To provide enhanced search capabilities and to improve the effectiveness of searching (I.e., to retrieve most of the relevant material without too much irrelevant material).
58
Why develop a thesaurus? l To provide vocabulary (or terminological) control. –When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with.
59
Preliminary considerations l What is used now? –Continue using an existing thesaurus? –Ad hoc modification of existing thesaurus? –Develop a new well-structured thesaurus? l What is the scope and complexity of the subject field? l What kind of retrieval objects or data will be dealt with? l How exhaustive and specific is the desired description of objects?
60
Preliminary Considerations l The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus. –It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows. l Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists.
61
Development of a Thesaurus l Term Selection. l Merging and Development of Concept Classes. l Definition of Broad Subject Fields and Subfields. l Development of Classificatory structure l Review, Testing, Application, Revision.
62
Flow of Work in Thesaurus Construction Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp 327-333 Yes No Revise as needed
63
2. Merging and Development of Concept Classes l Sort Term DB into alphabetical order. l First Round: Merge information for Identical terms -- possibly pulling info from additional sources. l Second Round: Merge synonyms or terms in the same concept class.
64
3. Definition of Broad Subject Fields and Subfields l Define Broad Subject fields and sort terms into these broad fields l Define subfields within each broad field and sort terms into these subfields. l Work out the detailed structure –Select Preferred Terms –Merge information for terms in the same concept class l Repeat these steps –for each subfield within a broad field –and for each broad field –Until all terms have been consolidated and preferred terms selected
65
4. Development of Classificatory Structure l Produce preliminary version of classified index and update the working database. l Improve classificatory structure l Reality check: produce and distribute a version of the classified index. Distribute to users/experts.
66
5. Final Stages l Review l Testing l Application l Revision
67
Thesaurus Revision and Updates l There will always be new concepts, products, or expressions that need to be added to the thesaurus. –Set a regular schedule of reviews and revisions. –Collect complaints, problems, etc. and fold into revision of the thesaurus
68
Hierarchical vs. Faceted (Subject Heading vs. Descriptor) Category Systems
69
Assigning Headings vs. Descriptors l Subject headings –assign one (or a few) complex heading(s) to the document l Descriptors –Mix and match How would we describe recipes using each technique?
70
Subject Heading vs. Descriptor WILSONLINE –Athletes –Athletes-- Heath&Hygiene –Athletes--Nutrition –Athletes--Physical Exams –… –Athletics –Athletics -- Administration –Athletics -- Equipment - - Catalogs –… –Sports -- Accidents and injuries –Sports -- Accidents and injuries -- prevention ERIC –Athletes –Athletic Coaches –Athletic Equipment –Athletic Fields –Athletics –… –Sports psychology –Sportsmanship
71
Subject Headings vs. Descriptors l Describe the contents of an entire document l Designed to be looked up in an alphabetical index –Look up document under its heading l Few (1-5) headings per document l Describe one concept within a document l Designed to be used in Boolean searching –Combine to describe the desired document l Many (5-25) descriptors per document
72
Hierarchical Classification –Each category is successively broken down into smaller and smaller subdivisions –No item occurs in more than one subdivision –Each level divided out by a “character of division”. Also known as a feature. »Example: distinguish Literature based on: l Language l Genre l Time Period
73
Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th...
74
Labeled Categories for Hierarchical Classification l LITERATURE –100 English Literature »110 English Prose l English Prose 16th Century l English Prose 17th Century l English Prose 18th Century l... »111 English Poetry l 121 English Poetry 16th Century l 122 English Poetry 17th Century l... »112 English Drama l 130 English Drama 16th Century l … –200 French Literature
75
Faceted Classification l Create a separate, free-standing list for each characteristic of division (feature). l Combine features to create a classification.
76
Faceted Classification along with Labeled Categories l A Language –a English –b French –c Spanish l B Genre –a Prose –b Poetry –c Drama l C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century l Aa English Literature l AaBa English Prose l AaBaCa English Prose 16th Century l AbBbCd French Poetry 19th Century l BbCd Drama 19th Century
77
Important Question: How to use both types of classification structures? l How to look through them? l How to use them in search?
78
Design of Information Architecture
79
Web Site Design Issues
80
Design Prototype Evaluate Iteration earlier in the design process is more cost-effective Iteration is the Key to UI Design
81
Design Process: Discovery Implementation Design Preliminary Design Conceptualization Discovery Assess needs –understand client’s expectations –determine scope of project –characteristics of users
82
Slide by Mark Newman Design Process: Conceptualization Implementation Design Preliminary Design Conceptualization Discovery Begin defining site –Take results from discovery and visualize solutions –Early information design
83
Slide by Mark Newman Design Process: Preliminary Design Implementation Design Preliminary Design Conceptualization Discovery Generate multiple (3- 5) designs –one will be selected for development –navigation design –early graphic design
84
Slide by Mark Newman Design Process: Preliminary Design l Activities –Sketching designs –Creating mock-ups –Quick and rough l Deliverables –Schematics (a.k.a. templates) –Site maps –Mock-ups –Presentations
85
Slide by Mark Newman Design Process: Design Implementation Design Preliminary Design Conceptualization Discovery Iteration Design Prototype Evaluate iteration at the level of development process And within design stage
86
Slide by Mark Newman Design Process: Implementation Implementation Design Preliminary Design Conceptualization Discovery l Prepare design for handoff –Create final deliverable –Specifications and prototypes –As much detail as possible
87
Why Do We Prototype? l Get feedback on our design faster –saves money l Experiment with alternative designs l Fix problems before code is written l Keep the design centered on the user
88
Slide by James Landay Fidelity in Prototyping l Fidelity refers to the level of detail l High fidelity ? –prototypes look like the final product l Low fidelity ? –artists renditions with many details missing
89
Slide by James Landay Low-fidelity Sketches
90
Slide by James Landay Low-fidelity Sketches
91
Database Systems
92
Terms and Concepts l Database: –A collection of similar records with relationships between the records. (Rowley) –A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)
93
DBMS Benefits l Minimal Data Redundancy l Consistency of Data l Integration of Data l Sharing of Data l Ease of Application Development l Uniform Security, Privacy, and Integrity Controls l Data Accessibility and Responsiveness l Data Independence l Reduced Program Maintenance
94
Database Components DBMS =============== Design tools Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) ============= Run time Form processor Query processor Report Writer Language Run time User Interface Applications Application Programs Database Database contains: User’s Data Metadata Indexes Application Metadata Kroenke, Database Processing
95
Terms and Concepts l Records –The set of values for all attributes of a particular entity –AKA “tuples” or “rows” in relational DBMS l File –Collection of records –Usually a physical file on OS –May also be a “logical file” like a “Relation” or “Table” in relational DBMS
96
Terms and Concepts l Key –an attribute or set of attributes used to identify or locate records in a file l Primary Key –an attribute or set of attributes that uniquely identifies each record in a file
97
Terms and Concepts l Data Independence –Physical representation and location of data and the use of that data are separated »The application doesn’t need to know how or where the database has stored the data, but just how to ask for it. »Moving a database from one DBMS to another should not have a material effect on application program »Recoding, adding fields, etc. in the database should not affect applications
98
Terms and Concepts l Metadata –Data about data »In DBMS means all of the characteristics describing the attributes of an entity, E.G.: l name of attribute l data type of attribute l size of the attribute l format or special characteristics –Characteristics of files or relations »name, content, notes, etc.
99
Design l Determination of the needs of the organization l Development of the Conceptual Model of the database –Typically using Entity-Relationship diagramming techniques l Construction of a Data Dictionary l Development of the Logical Model
100
Entity l An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee
101
Attributes l Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it. (This is the Metadata for the entities.) Employee Last Middle First Name SSN Age Birthdate Projects
102
Relationships l Relationships are the associations between entities. They can involve one or more entities and belong to particular relationship types
103
Relationships Class Attends Student Part Supplies project parts Supplier Project
104
Mapping to a Relational Model l Each entity in the ER Diagram becomes a relation. l A properly normalized ER diagram will indicate where intersection relations for many-to-many mappings are needed. l Relationships are indicated by common columns (or domains) in tables that are related. l We will examine the tables for the Acme Widget Company derived from the ER diagram
105
Normalization l Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data l Normalization is a multi-step process beginning with an “unnormalized” relation –Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management.
106
Normalization Boyce- Codd and Higher Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependencyof nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency
107
Relational Algebra Operations l Select l Project l Product l Union l Intersect l Difference l Join l Divide
108
Effectiveness and Efficiency Issues for DBMS l Focus on the relational model l Any column in a relational database can be searched for values. l To improve efficiency indexes using storage structures such as BTrees and Hashing are used l But many useful functions are not indexable and require complete scans of the the database
109
Advantages of RDBMS l Possible to design complex data storage and retrieval systems with ease (and without conventional programming). l Support for ACID transactions –Atomic –Consistent –Independent –Durable
110
Advantages of RDBMS l Support for very large databases l Automatic optimization of searching (when possible) l RDBMS have a simple view of the database that conforms to much of the data used in businesses. l Standard query language (SQL)
111
Disadvantages of RDBMS l Until recently, no support for complex objects such as documents, video, images, spatial or time-series data. (ORDBMS are adding support these). l Often poor support for storage of complex objects. (Disassembling the car to park it in the garage) l Still no efficient and effective integrated support for things like text searching within fields.
112
Study hard, and good luck! Thank you for all the great work!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.