Presentation is loading. Please wait.

Presentation is loading. Please wait.

11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:

Similar presentations


Presentation on theme: "11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:"— Presentation transcript:

1 11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Slides by Ray Larson, Warren Sack and Marti Hearst

2 11/20/2001Information Organization and Retrieval Final Exam Monday Dec 10 –9:30-12:30 –Room 202 Bring –Pens/pencils –Calculator –Notes/Books (optional)

3 11/20/2001Information Organization and Retrieval Final Exam Topics –Comprehensive, but –Emphasis on materials since the midterm Types of questions –Similar to those on the midterm, but less time- consuming –See Final Study Guide for types of questions http://sims.berkeley.edu/courses/is202/f01/final- study-guide.html

4 11/20/2001Information Organization and Retrieval Relationships among Language, Concepts, and Categories Cognitive Science

5 11/20/2001Information Organization and Retrieval Knowledge Representation In AI, a representation of knowledge is a combination of data structures and interpretative procedures that, if used in the right way in a program, will lead to “knowledgeable” behavior. (Barr and Feigenbaum, 1981, p. 143)

6 11/20/2001Information Organization and Retrieval “Interpretative Procedures” aka Inference Deduction –Universal instantiation: If something is true of everything, then it is true for any particular thing. –Modus ponens: Known: (1) the rule if P then Q; and, (2) the fact, P is true; Infer: Q is true Abduction –Known: (1) the rule if P then Q; and, (2) the fact, Q is true; –Infer: P is true Induction: Machine Learning –Known: P(a) is true; P(b) is true; … –Infer: Forall X, P(X) is true

7 11/20/2001Information Organization and Retrieval Knowledge Representation and Programming Paradigms Applicative Functional Logical Rule-based Constraint-based Object-oriented Frame-based

8 11/20/2001Information Organization and Retrieval Relationships among Meanings Homonymy: same word, different meanings –bank (river bank) vs bank (financial institution) Polysemy: same word, different senses of meaning –slightly different concepts expressed similarly –bank (institution vs building) Synonyms: different words, related senses of meanings –different ways to express similar concepts –jail, prison, penitentiary

9 11/20/2001Information Organization and Retrieval Category Structure Defining Category Membership –Necessary and Sufficient Conditions –Properties of Categorization Characteristic Features Centrality/Typicality Basic Level Categories

10 11/20/2001Information Organization and Retrieval Defining Category Membership Necessary and Sufficient Conditions: –Every condition must be met. –No other conditions can be required. Example: A prime number: –An integer divisible only by itself and 1. Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc. Example: mother –A woman who has given birth to a child.

11 11/20/2001Information Organization and Retrieval Can category membership be defined? What are the necessary and sufficient conditions for something to be a game?

12 11/20/2001Information Organization and Retrieval Definition of Game Famous example by Wittgenstein –Classic categories assume clear boundaries defined by common properties (necessary and sufficient conditions) Counterexample: “Game” –No common properties shared by all games card games, ball games, Olympic games, children’s games competition: ring-around-the-rosie skill: dice games luck: chess –No fixed boundary; can be extended to new games video games Alternative: Concepts related by Family Resemblances

13 11/20/2001Information Organization and Retrieval Properties of Categorization Family Resemblance –Members of a category may be related to one another without all members having any property in common. Instead, they may share a large subset of traits. Some attributes are more likely given that others have been seen. –Example: feathers, wings, twittering,... Likely to be a bird, but not all features apply to “emu” Unlikely to see an association with “barks”

14 11/20/2001Information Organization and Retrieval Properties of Categorization Centrality –Example: Prime Numbers Definition: An integer divisible only by itself and 1 Examples: 1, 2, 3, 5, 7, 11, 13, 17, … –A very clear-cut category. Or is it? Can one number be “more prime” than another? –Centrality: some members of a category may be “better examples” than others. Example: robins vs. chickens vs. emus

15 11/20/2001Information Organization and Retrieval Properties of Categorization Characteristic Features –Perceived degree of category membership has to do with which features define the category. –Members usually do not have ALL the necessary features, but have some subset. –Those members that have more of the central features are seen as more central members. –People have conceptions of typical members.

16 11/20/2001Information Organization and Retrieval Testing for Centrality/Typicality Ask a series of questions, compare how long it takes people to answer. –True or false: An apple is a fruit. A plum is a fruit. A coconut is a fruit. An olive is a fruit. A tomato is a fruit. –Rosch and Mervis: The more features a fruit shares with the other fruits, the more typical a member of the class it is.

17 11/20/2001Information Organization and Retrieval Three Psychologically Primary Levels SUPERORDINATE animal furniture BASIC LEVEL dog chair SUBORDINATE terrier rocker Children take longer to learn superordinate Superordinate not associated with mental images or motor actions How related to –Hyponymy –Hyperonymy

18 11/20/2001Information Organization and Retrieval Characteristics of Basic-level Categories Language –People name things more readily at basic level. –Name learned earliest in childhood. –Languages have simpler names at basic level. –Sounds like the “real name”. –Name used more frequently. Strange to call a dime a coin, a metal object –Names used in neutral context. There’s a dog on the porch. There’s a terrier on the porch.

19 11/20/2001Information Organization and Retrieval Characteristics of Basic-level Categories Concepts –Things perceived more holistically at the basic level (rather than by parts). –People interact with basic and more specific levels similarly. –Things are remembered more readily at basic level. –Folk biological categories correspond accurately to scientific biological categories only at the basic level.

20 11/20/2001Information Organization and Retrieval Metadata

21 11/20/2001Information Organization and Retrieval Metadata Topics What is metadata? Controlled vocabularies / indexing languages Metadata standards –Dublin Core –XML –etc Thesaurus creation and use Classification structure –Descriptors vs subject headings –Hierarchies vs facets

22 11/20/2001Information Organization and Retrieval Metadata Metadata is: – “data about data” (term usage database systems) –Information about Information –Structures and Languages for the Description of Information Resources and their elements (components or features) –“Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

23 11/20/2001Information Organization and Retrieval Type of Metadata systems and standards Naming and ID systems – URLs, ISBNs Bibliographic description – MARC, Dublin Core, TEI, etc. Music -- SMDL Images and objects – CIMI, VRA Core Categories Numeric Data – DDI, SDSM Geospatial Data – FGDC Collections – EAD

24 11/20/2001Information Organization and Retrieval Types of Indexing Languages Uncontrolled Keyword Indexing Indexing Languages –Controlled, but not structured Thesauri –Controlled and Structured Classification Systems –Controlled, Structured, and Coded Faceted Classification Systems

25 11/20/2001Information Organization and Retrieval Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.

26 11/20/2001Information Organization and Retrieval What is a “Controlled Vocabulary” “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) Similarly, there are too many ways of expressing or explaining the topic of a document. Controlled vocabularies are sets of Rules for topic identification and indexing, and a THESAURUS, which consists of “lead-in vocabulary” and an limited and selective “Indexing Language” sometimes with special coding or structures.

27 11/20/2001Information Organization and Retrieval Uses of Controlled Vocabularies Library Subject Headings, Classification and Authority Files. Commercial Journal Indexing Services and databases Yahoo, and other Web classification schemes Online and Manual Systems within organizations –SunSolve –MacArthur

28 11/20/2001Information Organization and Retrieval Indexing Languages An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.

29 11/20/2001Information Organization and Retrieval The Indexing Process Concept identification term selection (via thesaurus) term assignment

30 11/20/2001Information Organization and Retrieval Application: The Indexing Process (Manual) Is Term suitable NO Select Alternative term to represent Concept Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Preferred Term Establish Term Denoting Concept Examine Document and Identify Significant Concepts Consider First Concept Preferred Term? Start NO YES Does Thesaurus contain term for Concept Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End Adapted from ISO 5963, p.5

31 11/20/2001Information Organization and Retrieval Metadata Standards

32 11/20/2001Information Organization and Retrieval The problem Proliferation of the forms of names –Different names for the same person –Different people with the same names

33 11/20/2001Information Organization and Retrieval Bibliographic Description MARC (Machine Readable Cataloging) DUBLIN CORE –Warwick Framework for Dublin Core Metadata GILS (Government Information Locator Service) RFC 1807 (Format for Bibliographic Records) RDF (Resource Description Format)

34 11/20/2001Information Organization and Retrieval Images and Objects Categories for the Description of Works of Art (Getty Art Institute) Consortium for the Computer Interchange of Museum Information (CIMI) RLG REACH Element Set (for Shared Description of Museum Objects) VRA Core Categories (Visual Resources Association)

35 11/20/2001Information Organization and Retrieval Collection Level Descriptors EAD (Encoded Archival Description) Z39.50 Profile for Access to Digital Collections RSLP Collection Description (Research Support Libraries Programme)

36 11/20/2001Information Organization and Retrieval Dublin Core Simple metadata for describing internet resources. For “Document-Like Objects” 15 Elements.

37 11/20/2001Information Organization and Retrieval Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

38 11/20/2001Information Organization and Retrieval The Same Item in Different Metadata Systems ISBD Dublin Core RFC 1807 TEI Header MARC Record

39 11/20/2001Information Organization and Retrieval ISBD Punctuation Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).

40 11/20/2001Information Organization and Retrieval Bibliographic Record Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).

41 11/20/2001Information Organization and Retrieval MARC Record (display) ID:DCLC9124851-B RTYP:c ST:p FRN: MS:c EL: AD:06-20-91 CC:9110 BLT:am DCF:a CSC: MOD: SNR: ATC: UD:04-11-92 CP:cou L:eng INT: GPC: BIO: FIC:0 CON:b PC:s PD:1992/ REP: CPI:0 FSI:0 ILC:a II:1 MMD: OR: POL: DM: RR: COL: EML: GEN: BSE: 010 9124851 020 0872878112 (cloth) 020 0872879674 (paper) 040 DLC$cDLC$dDLC 050 00 Z693$b.W94 1991 082 00 025.3$220 100 1 Wynar, Bohdan S. 245 10 Introduction to cataloging and classification /$cBohdan S. Wynar. 250 8th ed. /$bArlene G. Taylor. 260 Englewood, Colo. :$bLibraries Unlimited,$c1992. 300 xvii, 633 p. :$bill. ;$c24 cm. 440 0 Library science text series 504 Includes bibliographical references (p. 591-599) and index. 650 0 Cataloging. 650 0 Subject cataloging. 650 0 Classification$xBooks. 630 00 Anglo-American cataloguing rules. 700 10 Taylor, Arlene G.,$d1941-

42 11/20/2001Information Organization and Retrieval Conditions of Authorship? Single person or single corporate entity Unknown or anonymous authors –Fictitiously ascribed works Shared responsibility Collections or editorially assembled works Works of mixed responsibility (e.g. translations) Related Works

43 11/20/2001Information Organization and Retrieval Name Authority Files ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973 Different names for the same person

44 11/20/2001Information Organization and Retrieval Name Authority Files ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)

45 11/20/2001Information Organization and Retrieval Name authority files ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

46 11/20/2001Information Organization and Retrieval Other Types of Controlled Vocabularies Gazetteers (Geographic Names) Code lists (e.g. LC Language Codes) Subject Heading Lists Classification Schemes Thesauri

47 11/20/2001Information Organization and Retrieval What is SGML/XML? A. SGML stands for Standard Generalized Markup Language –XML stands for eXtended Markup Language B. What it is NOT: –Not a visual document description –Not an application specific markup –Not proprietary

48 11/20/2001Information Organization and Retrieval What is SGML/XML? What it is: –An international standard (SGML- ISO 8879:1986) –A generic language for describing the structure of documents, and markup that can be used for those documents –Intended for generating markup for content rather than form elements XML is a simplified subset of SGML (being established by W3C)

49 11/20/2001Information Organization and Retrieval XML Extensible Markup Language –a simplification of SGML, the Standard Generalized Markup Language –instead of a fixed set of format-oriented tags like HTML, XML allows you to create the schema -- whatever set of tags are needed --for your information type or application –this makes any XML instance “self-describing” and easily understood by computers and people Version 1.0 ratified by W3C in 2/98; backed by Microsoft, Sun, Netscape, many others Source Dr. Robert J Glushko

50 11/20/2001Information Organization and Retrieval HTML Airline Schedule Seen “By Computer” Airline Schedule Flight Information United Airlines #200 San Francisco 9:30 AM Honolulu 12:30 PM $368.50 Source Dr. Robert J Glushko

51 11/20/2001Information Organization and Retrieval Airline Schedule in XML San Francisco 9:30 AM Honolulu 12:30 PM 368.50 Source Dr. Robert J Glushko

52 11/20/2001Information Organization and Retrieval SGML/XML Structure An SGML document consists of three parts: –The SGML Declaration –The Document Type Definition (DTD) –The Document Instance An XML document requires only the document instance, but for effective processing a DTD is important.

53 11/20/2001Information Organization and Retrieval Document Type Definitions The DTD describes the structural elements and "shorthand" markup for a particular document type. It defines: –Names of "legal" elements –How many times elements can appear –The order of elements in a document –Whether markup can be omitted (SGML only) –Contents of elements (i.e., nested structures) –Attributes associated with elements –Names of "entities" –short-hand conventions for element tags. (SGML only)

54 11/20/2001Information Organization and Retrieval DTD Components The major components of a DTD are: –Entity Declarations –Element Declarations –Attribute Declarations

55 11/20/2001Information Organization and Retrieval Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms

56 11/20/2001Information Organization and Retrieval Thesauri (cont.) Examples: –The ERIC Thesaurus of Descriptors –The Art and Architecture Thesaurus –The Medical Subject Headings (MESH) of the National Library of Medicine

57 11/20/2001Information Organization and Retrieval Why develop a thesaurus? To provide a conceptual structure or “space” for a body of information –To make it possible to adequately describe the topical contents of informational objects at an appropriate level of generality or specificity –To provide enhanced search capabilities and to improve the effectiveness of searching (I.e., to retrieve most of the relevant material without too much irrelevant material).

58 11/20/2001Information Organization and Retrieval Why develop a thesaurus? To provide vocabulary (or terminological) control. –When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with.

59 11/20/2001Information Organization and Retrieval Preliminary considerations What is used now? –Continue using an existing thesaurus? –Ad hoc modification of existing thesaurus? –Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects?

60 11/20/2001Information Organization and Retrieval Preliminary Considerations The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus. –It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows. Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists.

61 11/20/2001Information Organization and Retrieval Development of a Thesaurus Term Selection. Merging and Development of Concept Classes. Definition of Broad Subject Fields and Subfields. Development of Classificatory structure Review, Testing, Application, Revision.

62 11/20/2001Information Organization and Retrieval Flow of Work in Thesaurus Construction Select Sources Assign codes Select Terms Record Selected Terms Sort Terms Merge identical Terms Define Broad Subject Fields Merge Terms in Same Concept class Sort Terms into Broad Subject Fields Define Subfields within one Subject Field Work out detailed structure of the Subject Field Select Preferred Terms All Subfields of Broad Subject finished? All Broad Subjects finished? Improve Class Structure Yes No Print Classified Index and review Discuss with Experts and Users Select descriptors and checklist items Produce Full Thesaurus and Check references Assign Notation Review and Test Many Modifications? Based on Soergel, pp 327-333 Yes No Revise as needed

63 11/20/2001Information Organization and Retrieval 2. Merging and Development of Concept Classes Sort Term DB into alphabetical order. First Round: Merge information for Identical terms -- possibly pulling info from additional sources. Second Round: Merge synonyms or terms in the same concept class.

64 11/20/2001Information Organization and Retrieval 3. Definition of Broad Subject Fields and Subfields Define Broad Subject fields and sort terms into these broad fields Define subfields within each broad field and sort terms into these subfields. Work out the detailed structure –Select Preferred Terms –Merge information for terms in the same concept class Repeat these steps –for each subfield within a broad field –and for each broad field –Until all terms have been consolidated and preferred terms selected

65 11/20/2001Information Organization and Retrieval 4. Development of Classificatory Structure Produce preliminary version of classified index and update the working database. Improve classificatory structure Reality check: produce and distribute a version of the classified index. Distribute to users/experts.

66 11/20/2001Information Organization and Retrieval 5. Final Stages Review Testing Application Revision

67 11/20/2001Information Organization and Retrieval Thesaurus Revision and Updates There will always be new concepts, products, or expressions that need to be added to the thesaurus. –Set a regular schedule of reviews and revisions. –Collect complaints, problems, etc. and fold into revision of the thesaurus

68 11/20/2001Information Organization and Retrieval Hierarchical vs. Faceted (Subject Heading vs. Descriptor) Category Systems

69 11/20/2001Information Organization and Retrieval Assigning Headings vs. Descriptors Subject headings –assign one (or a few) complex heading(s) to the document Descriptors –Mix and match How would we describe recipes using each technique?

70 11/20/2001Information Organization and Retrieval Subject Heading vs. Descriptor WILSONLINE –Athletes –Athletes--Heath&Hygiene –Athletes--Nutrition –Athletes--Physical Exams –… –Athletics –Athletics -- Administration –Athletics -- Equipment -- Catalogs –… –Sports -- Accidents and injuries –Sports -- Accidents and injuries -- prevention ERIC –Athletes –Athletic Coaches –Athletic Equipment –Athletic Fields –Athletics –… –Sports psychology –Sportsmanship

71 11/20/2001Information Organization and Retrieval Subject Headings vs. Descriptors Describe the contents of an entire document Designed to be looked up in an alphabetical index –Look up document under its heading Few (1-5) headings per document Describe one concept within a document Designed to be used in Boolean searching –Combine to describe the desired document Many (5-25) descriptors per document

72 11/20/2001Information Organization and Retrieval Hierarchical Classification –Each category is successively broken down into smaller and smaller subdivisions –No item occurs in more than one subdivision –Each level divided out by a “character of division”. Also known as a feature. Example: distinguish Literature based on: –Language –Genre –Time Period

73 11/20/2001Information Organization and Retrieval Hierarchical Classification Literature SpanishFrenchEnglish DramaPoetryProse 18th17th16th DramaPoetryProse 19th18th17th16th19th...

74 11/20/2001Information Organization and Retrieval Labeled Categories for Hierarchical Classification LITERATURE –100 English Literature 110 English Prose –English Prose 16th Century –English Prose 17th Century –English Prose 18th Century –... 111 English Poetry –121 English Poetry 16th Century –122 English Poetry 17th Century –... 112 English Drama –130 English Drama 16th Century –… –200 French Literature

75 11/20/2001Information Organization and Retrieval Faceted Classification Create a separate, free-standing list for each characteristic of division (feature). Combine features to create a classification.

76 11/20/2001Information Organization and Retrieval Faceted Classification along with Labeled Categories A Language –a English –b French –c Spanish B Genre –a Prose –b Poetry –c Drama C Period –a 16th Century –b 17th Century –c 18th Century –d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century

77 11/20/2001Information Organization and Retrieval Important Question: How to use both types of classification structures? How to look through them? How to use them in search?

78 11/20/2001Information Organization and Retrieval Design of Information Architecture

79 11/20/2001Information Organization and Retrieval Web Site Design Issues

80 11/20/2001Information Organization and Retrieval Design Prototype Evaluate Iteration earlier in the design process is more cost-effective Iteration is the Key to UI Design

81 11/20/2001Information Organization and Retrieval Design Process: Discovery Implementation Design Preliminary Design Conceptualization Discovery Assess needs –understand client’s expectations –determine scope of project –characteristics of users

82 Information Organization and Retrieval Design Process: Conceptualization Implementation Design Preliminary Design Conceptualization Discovery Begin defining site –Take results from discovery and visualize solutions –Early information design

83 Information Organization and Retrieval Design Process: Preliminary Design Implementation Design Preliminary Design Conceptualization Discovery Generate multiple (3-5) designs –one will be selected for development –navigation design –early graphic design

84 Information Organization and Retrieval Design Process: Preliminary Design Activities –Sketching designs –Creating mock-ups –Quick and rough Deliverables –Schematics (a.k.a. templates) –Site maps –Mock-ups –Presentations

85 Information Organization and Retrieval Design Process: Design Implementation Design Preliminary Design Conceptualization Discovery Iteration Design Prototype Evaluate iteration at the level of development process And within design stage

86 Information Organization and Retrieval Design Process: Implementation Implementation Design Preliminary Design Conceptualization Discovery Prepare design for handoff –Create final deliverable –Specifications and prototypes –As much detail as possible

87 11/20/2001Information Organization and Retrieval Why Do We Prototype? Get feedback on our design faster –saves money Experiment with alternative designs Fix problems before code is written Keep the design centered on the user

88 Information Organization and Retrieval Fidelity in Prototyping Fidelity refers to the level of detail High fidelity ? –prototypes look like the final product Low fidelity ? –artists renditions with many details missing

89 Information Organization and Retrieval Low-fidelity Sketches

90 Information Organization and Retrieval Low-fidelity Sketches

91 11/20/2001Information Organization and Retrieval Database Systems

92 11/20/2001Information Organization and Retrieval Terms and Concepts Database: –A collection of similar records with relationships between the records. (Rowley) –A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)

93 11/20/2001Information Organization and Retrieval DBMS Benefits Minimal Data Redundancy Consistency of Data Integration of Data Sharing of Data Ease of Application Development Uniform Security, Privacy, and Integrity Controls Data Accessibility and Responsiveness Data Independence Reduced Program Maintenance

94 11/20/2001Information Organization and Retrieval Database Components DBMS =============== Design tools Table Creation Form Creation Query Creation Report Creation Procedural language compiler (4GL) ============= Run time Form processor Query processor Report Writer Language Run time User Interface Applications Application Programs Database Database contains: User’s Data Metadata Indexes Application Metadata Kroenke, Database Processing

95 11/20/2001Information Organization and Retrieval Terms and Concepts Records –The set of values for all attributes of a particular entity –AKA “tuples” or “rows” in relational DBMS File –Collection of records –Usually a physical file on OS –May also be a “logical file” like a “Relation” or “Table” in relational DBMS

96 11/20/2001Information Organization and Retrieval Terms and Concepts Key –an attribute or set of attributes used to identify or locate records in a file Primary Key –an attribute or set of attributes that uniquely identifies each record in a file

97 11/20/2001Information Organization and Retrieval Terms and Concepts Data Independence –Physical representation and location of data and the use of that data are separated The application doesn’t need to know how or where the database has stored the data, but just how to ask for it. Moving a database from one DBMS to another should not have a material effect on application program Recoding, adding fields, etc. in the database should not affect applications

98 11/20/2001Information Organization and Retrieval Terms and Concepts Metadata –Data about data In DBMS means all of the characteristics describing the attributes of an entity, E.G.: –name of attribute –data type of attribute –size of the attribute –format or special characteristics –Characteristics of files or relations name, content, notes, etc.

99 11/20/2001Information Organization and Retrieval Design Determination of the needs of the organization Development of the Conceptual Model of the database –Typically using Entity-Relationship diagramming techniques Construction of a Data Dictionary Development of the Logical Model

100 11/20/2001Information Organization and Retrieval Entity An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information –Persons (e.g.: customers in a business, employees, authors) –Things (e.g.: purchase orders, meetings, parts, companies) Employee

101 11/20/2001Information Organization and Retrieval Attributes Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it. (This is the Metadata for the entities.) Employee Last Middle First Name SSN Age Birthdate Projects

102 11/20/2001Information Organization and Retrieval Relationships Relationships are the associations between entities. They can involve one or more entities and belong to particular relationship types

103 11/20/2001Information Organization and Retrieval Relationships Class Attends Student Part Supplies project parts Supplier Project

104 11/20/2001Information Organization and Retrieval Mapping to a Relational Model Each entity in the ER Diagram becomes a relation. A properly normalized ER diagram will indicate where intersection relations for many-to-many mappings are needed. Relationships are indicated by common columns (or domains) in tables that are related. We will examine the tables for the Acme Widget Company derived from the ER diagram

105 11/20/2001Information Organization and Retrieval Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an “unnormalized” relation –Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management.

106 11/20/2001Information Organization and Retrieval Normalization Boyce- Codd and Higher Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependencyof nonkey attributes on the primary key No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency

107 11/20/2001Information Organization and Retrieval Relational Algebra Operations Select Project Product Union Intersect Difference Join Divide

108 11/20/2001Information Organization and Retrieval Effectiveness and Efficiency Issues for DBMS Focus on the relational model Any column in a relational database can be searched for values. To improve efficiency indexes using storage structures such as BTrees and Hashing are used But many useful functions are not indexable and require complete scans of the the database

109 11/20/2001Information Organization and Retrieval Advantages of RDBMS Possible to design complex data storage and retrieval systems with ease (and without conventional programming). Support for ACID transactions –Atomic –Consistent –Independent –Durable

110 11/20/2001Information Organization and Retrieval Advantages of RDBMS Support for very large databases Automatic optimization of searching (when possible) RDBMS have a simple view of the database that conforms to much of the data used in businesses. Standard query language (SQL)

111 11/20/2001Information Organization and Retrieval Disadvantages of RDBMS Until recently, no support for complex objects such as documents, video, images, spatial or time- series data. (ORDBMS are adding support these). Often poor support for storage of complex objects. (Disassembling the car to park it in the garage) Still no efficient and effective integrated support for things like text searching within fields.

112 11/20/2001Information Organization and Retrieval Study hard, and good luck! Thank you for all the great work!


Download ppt "11/20/2001Information Organization and Retrieval Final Review University of California, Berkeley School of Information Management and Systems SIMS 202:"

Similar presentations


Ads by Google