2007.02.08 - SLIDE 1IS 240 – Spring 2007 Lecture 8: Clustering University of California, Berkeley School of Information IS 245: Organization of Information.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

March 2006NaCTeM – Ray R. Larson Prof. Ray R. Larson University of California, Berkeley School of Information Metadata as Infrastructure for Information.
Chapter 5: Introduction to Information Retrieval
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
December 9, 2002 Cheshire II at INEX -- Ray R. Larson Cheshire II at INEX: Using A Hybrid Logistic Regression and Boolean Model for XML Retrieval Ray R.
Hanoi, Dec 6, 2008ECAI-PNC Laptops1 Laptops and Libraries: Decentralized Access to Explanatory Resources Michael Buckland University of California, Berkeley.
7/16/2002JCDL 2002, Ray Larson The “Entry Vocabulary Index” Approach to Multilingual Search Ray R. Larson, Fredric Gey, Aitao Chen, Michael Buckland University.
SLIDE 1IS 240 – Spring 2009 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
SLIDE 1IS 240 – Spring 2011 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Searching Text and Data via Common Geography 1 SEARCHING TEXT AND DATA via COMMON GEOGRAPHY Geographic Information Retrieval: Searching Text and Data via.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Access to Digital Heritage Resources using What, Where, When and Who Michael Buckland Electronic Cultural Atlas Initiative University of California, Berkeley.
Thesaurus Design and Development
Nov 15, 2005Ohio State University Libraries1 What, Where, When, and Who: A Renaissance for the Reference Collection Michael Buckland School of Information.
Geography, Time, and the Representation of Cultural Change – Experience from a Large Collaboration: The Electronic Cultural Atlas Initiative (ECAI) Michael.
9/14/2000Information Organization and Retrieval Vector Representation, Term Weights and Clustering Ray Larson & Marti Hearst University of California,
Seamless Searching of Numeric and Textual Resources Funded by a National Library Leadership Grant from the Institute of Museum and Library Services Michael.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Sept 21, 2007Friday Afternoon Seminar1 Friday Afternoon Seminar, Sept 21, 2007 Reference Library Service in a Digital Environment: A Question; an Explanation;
SLIDE 1IS 245 – Spring 2009 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
Printed Resources and Digital Information The Digital Difference in Reference Collections Michael Buckland, School of Information Management & Systems,
Oct 2, 2008SALT2, Uppsala1 The Educational Role of the Library in a Digital Environment Part II: Design for Learning. Michael Buckland NORSLIS Visiting.
Mar 24, 2009ECAI/CAA Williamsburg1 Electronic Cultural Atlas Initiative – Computer Applications in Archaeology Joint Conference, Williamsburg, “Making.
ECAI – CAA Conference, Fargo, April 19, 2006 Geo-temporal Indexing: Events, Lives, and Geographical Features Michael Buckland also Kim Carl, Sarah Ellinger.
Prof. Ray R. Larson University of California, Berkeley School of Information Developing a Metadata Infrastructure for Information Access: What, Where,
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History University of California, Berkeley School of Information IS 245: Organization.
July 7, 2008ISKO Montréal1 ISKO 2008, Montréal 4W Vocabulary Mapping Across Diverse Reference Genres Michael Buckland and Ryan Shaw (& others) Electronic.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Answer-Providing Tools (APTs) Dr. Dania Bilal IS 530 Spring 2006.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Cultural Heritage Markup Strategies Bibliotheca Alexandria –Digital Library of the Middle East –January, 2006.
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
The Internet 8th Edition Tutorial 4 Searching the Web.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Lazerow Lecture, UTK, University of Tennessee, Knoxville, School of Information Sciences, What, Where, When, and Who: Redesigning the Reference Environment.
Ray R. Larson : University of California, Berkeley Clustering and Classification Workshop 1998 Cheshire II and Automatic Categorization Ray R. Larson Associate.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval
A Logistic Regression Approach to Distributed IR Ray R. Larson : School of Information Management & Systems, University of California, Berkeley --
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
1 SUBJECT ACCESS INF 389F: Organization of Records Information Professor Fran Miksa October 29, 2003.
Information Retrieval in Practice
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Best pTree organization? level-1 gives te, tf (term level)
Dr. Dania Bilal IS 530 Spring 2005
Categories of APTs Dr. Dania Bilal IS 530 Fall 2005.
Subject Access: Indexing and Abstracting
University of California, Berkeley
Time Period Directories
Introduction to Information Retrieval
Introduction to Search Engines
Presentation transcript:

SLIDE 1IS 240 – Spring 2007 Lecture 8: Clustering University of California, Berkeley School of Information IS 245: Organization of Information In Collections Automatic Classification Some slides in this lecture were originally created by Prof. Marti Hearst

SLIDE 2IS 240 – Spring 2007 Overview Introduction to Automatic Classification and Clustering Classification of Classification Methods Classification Clusters and Information Retrieval in Cheshire II The 4W project revisited…

SLIDE 3IS 240 – Spring 2007 Classification The grouping together of items (including documents or their representations) which are then treated as a unit. The groupings may be predefined or generated algorithmically. The process itself may be manual or automated. In document classification the items are grouped together because they are likely to be wanted together –For example, items about the same topic.

SLIDE 4IS 240 – Spring 2007 Automatic Indexing and Classification Automatic indexing is typically the simple deriving of keywords from a document and providing access to all of those words. More complex Automatic Indexing Systems attempt to select controlled vocabulary terms based on terms in the document. Automatic classification attempts to automatically group similar documents using either: –A fully automatic clustering method. –An established classification scheme and set of documents already indexed by that scheme.

SLIDE 5IS 240 – Spring 2007 Background and Origins Early suggestion by Fairthorne –“The Mathematics of Classification” Early experiments by Maron (1961) and Borko and Bernick(1963) Work in Numerical Taxonomy and its application to Information retrieval Jardine, Sibson, van Rijsbergen, Salton (1970’s). Early IR clustering work more concerned with efficiency issues than semantic issues.

SLIDE 6IS 240 – Spring 2007 Cluster Hypothesis The basic notion behind the use of classification and clustering methods: “Closely associated documents tend to be relevant to the same requests.” –C.J. van Rijsbergen

SLIDE 7IS 240 – Spring 2007 Classification of Classification Methods Class Structure –Intellectually Formulated Manual assignment (e.g. Library classification) Automatic assignment (e.g. Cheshire Classification Mapping) –Automatically derived from collection of items Hierarchic Clustering Methods (e.g. Single Link) Agglomerative Clustering Methods (e.g. Dattola) Hybrid Methods (e.g. Query Clustering)

SLIDE 8IS 240 – Spring 2007 Classification of Classification Methods Relationship between properties and classes –monothetic –polythetic Relation between objects and classes –exclusive –overlapping Relation between classes and classes –ordered –unordered Adapted from Sparck Jones

SLIDE 9IS 240 – Spring 2007 Properties and Classes Monothetic –Class defined by a set of properties that are both necessary and sufficient for membership in the class Polythetic –Class defined by a set of properties such that to be a member of the class some individual must have some number (usually large) of those properties, and that a large number of individuals in the class possess some of those properties, and no individual possesses all of the properties.

SLIDE 10IS 240 – Spring 2007 A B C D E F G H Monothetic vs. Polythetic Polythetic Monothetic Adapted from van Rijsbergen, ‘79

SLIDE 11IS 240 – Spring 2007 Exclusive Vs. Overlapping Item can either belong exclusively to a single class Items can belong to many classes, sometimes with a “membership weight”

SLIDE 12IS 240 – Spring 2007 Ordered Vs. Unordered Ordered classes have some sort of structure imposed on them –Hierarchies are typical of ordered classes Unordered classes have no imposed precedence or structure and each class is considered on the same “level” –Typical in agglomerative methods

SLIDE 13IS 240 – Spring 2007 Clustering Methods Hierarchical Agglomerative Hybrid Automatic Class Assignment

SLIDE 14IS 240 – Spring 2007 Coefficients of Association Simple Dice’s coefficient Jaccard’s coefficient Cosine coefficient Overlap coefficient

SLIDE 15IS 240 – Spring 2007 Hierarchical Methods Single Link Dissimilarity Matrix Hierarchical methods: Polythetic, Usually Exclusive, Ordered Clusters are order-independent

SLIDE 16IS 240 – Spring 2007 Threshold =.1 Single Link Dissimilarity Matrix

SLIDE 17IS 240 – Spring 2007 Threshold =

SLIDE 18IS 240 – Spring 2007 Threshold =

SLIDE 19IS 240 – Spring 2007 Clustering Agglomerative methods: Polythetic, Exclusive or Overlapping, Unordered clusters are order-dependent. Doc 1. Select initial centers (I.e. seed the space) 2. Assign docs to highest matching centers and compute centroids 3. Reassign all documents to centroid(s) Rocchio’s method (similar to current K-means methods

SLIDE 20IS 240 – Spring 2007 Automatic Class Assignment Doc Search Engine 1. Create pseudo-documents representing intellectually derived classes. 2. Search using document contents 3. Obtain ranked list 4. Assign document to N categories ranked over threshold. OR assign to top-ranked category Automatic Class Assignment: Polythetic, Exclusive or Overlapping, usually ordered clusters are order-independent, usually based on an intellectually derived scheme

SLIDE 21IS 240 – Spring 2007 Automatic Categorization in Cheshire II The Cheshire II system is intended to provide a bridge between the purely bibliographic realm of previous generations of online catalogs and the rapidly expanding realm of full-text and multimedia information resources. It is currently used in the UC Berkeley Digital Library Project and for a number of other sites and projects.

SLIDE 22IS 240 – Spring 2007 Overview of Cheshire II It supports SGML as the primary database type. It is a client/server application. Uses the Z39.50 Information Retrieval Protocol. Supports Boolean searching of all servers. Supports probabilistic ranked retrieval in the Cheshire search engine. Supports ``nearest neighbor'' searches, relevance feedback and Two-Stage Search. GUI interface on X window displays (Tcl/Tk). HTTP/CGI interface for the Web (Tcl scripting).

SLIDE 23IS 240 – Spring 2007 Cheshire II - Cluster Generation Define basis for clustering records. –Select field to form the basis of the cluster. –Evidence Fields to use as contents of the pseudo- documents. During indexing cluster keys are generated with basis and evidence from each record. Cluster keys are sorted and merged on basis and pseudo-documents created for each unique basis element containing all evidence fields. Pseudo-Documents (Class clusters) are indexed on combined evidence fields.

SLIDE 24IS 240 – Spring 2007 Cheshire II - Two-Stage Retrieval Using the LC Classification System –Pseudo-Document created for each LC class containing terms derived from “content-rich” portions of documents in that class (subject headings, titles, etc.) –Permits searching by any term in the class –Ranked Probabilistic retrieval techniques attempt to present the “Best Matches” to a query first. –User selects classes to feed back for the “second stage” search of documents. Can be used with any classified/Indexed collection.

SLIDE 25IS 240 – Spring 2007 Probabilistic Retrieval: Logistic Regression Estimates for relevance based on log-linear model with various statistical measures of document content as independent variables. Log odds of relevance is a linear function of attributes: Term contributions summed: Probability of Relevance is inverse of log odds:

SLIDE 26IS 240 – Spring 2007 Probabilistic Retrieval: Logistic Regression In Cheshire II probability of relevance is based on Logistic regression from a sample set of TREC documents to determine values of the coefficients. At retrieval the probability estimate is obtained by: For 6 attributes or “clues” about term usage in the documents and the query

SLIDE 27IS 240 – Spring 2007 Probabilistic Retrieval: Logistic Regression attributes Average Absolute Query Frequency Query Length Average Absolute Document Frequency Document Length Average Inverse Document Frequency Inverse Document Frequency Number of Terms in common between query and document (M) -- logged

SLIDE 28IS 240 – Spring 2007 Cheshire II Demo Examples from the: –SciMentor(BioSearch) project Journal of Biological Chemistry and MEDLINE data –CHESTER (EconLit) Journal of Economic Literature subjects –Unfamiliar Metadata & TIDES Projects Basis for clusters is a normalized Library of Congress Class Number Evidence is provided by terms from record titles (and subject headings for the “all languages” Five different training sets (Russian, German, French, Spanish, and All Languages Testing cross-language retrieval and classification –4W Project Search

SLIDE 29IS 240 – Spring 2007 References Christian Plaunt & Barbara Norgard, “An Association Based Method for Automatic Indexing with a Controlled Vocabulary”. JASIS 49(10), –Preprint available available on class web site Ray R. Larson, Jerome McDonough, Lucy Kuntz, Paul O’Leary & Ralph Moon, “Cheshire II: Designing a Next- Generation Online Catalog”. JASIS, 47(7) , 1996.

SLIDE 30IS 240 – Spring 2007 Developing a Metadata Infrastructure for Information Access: What, Where, When and Who? Prof. Ray R. Larson University of California, Berkeley School of Information

SLIDE 31IS 240 – Spring 2007 Overview Metadata as Infrastructure –What, Where, When and Who? What are Entry Vocabulary Indexes? –Notion of an EVI –How are EVIs Built Time Period Directories –Mining Metadata for new metadata

SLIDE 32IS 240 – Spring 2007 Metadata as Infrastructure The difference between memorization and understanding lies in knowing the context and relationships of whatever is of interest. When setting out to learn about a new topic, a well-tested practice is to follow the traditional “5Ws and the H”: Who?, What?, When?, Where?, Why?, and How?

SLIDE 33IS 240 – Spring 2007 Metadata as Infrastructure The reference collections of paper-based libraries provide a structured environment for resources, with encyclopedias and subject catalogs, gazetteers, chronologies, and biographical dictionaries, offering direct support for at least What, Where, When, and Who. The digital environment does not yet provide an effective, and easily exploited, infrastructure comparable to the traditional reference library.

SLIDE 34IS 240 – Spring 2007 What? Searching texts by topic, e.g. Dewey, LCSH, any subject index, or category scheme applied to documents. Two kinds of mapping in every search: –Documents are assigned to topic categories, e.g. Dewey –Queries have to map to topic categories, e.g. Dewey’s Relativ Index from ordinary words/phrases to Decimal Classification numbers. Also mapping between topic systems, e.g. US Patent classification and International Patent Classification.

SLIDE 35IS 240 – Spring 2007 Texts ‘What’ searches involve mapping to controlled vocabularies Thesaurus/ Ontology

SLIDE 36IS 240 – Spring 2007 Start with a collection of documents.

SLIDE 37IS 240 – Spring 2007 Classify and index with controlled vocabulary Or use a pre- indexed collection. Index

SLIDE 38IS 240 – Spring 2007 Problem: Controlled Vocabularies can be difficult for people to use. “pass mtr veh spark ign eng” Index Use: “Economic Policy” In Library of Congress subj For: “Wirtschaftspolitik”

SLIDE 39IS 240 – Spring 2007 Solution: Entry Level Vocabulary Indexes. Index EVI pass mtr veh spark ign eng” = “Automobile”

SLIDE 40IS 240 – Spring 2007 “What” and Entry Vocabulary Indexes EVIs are a means of mapping from user’s vocabulary to the controlled vocabulary of a collection of documents…

SLIDE 41IS 240 – Spring 2007 Has an Entry Vocabulary Module been built? User selects a subject domain of interest. Download a set of training data. Build associations between extracted terms & controlled vocabularies. Map user’s query to ranked list of controlled vocabulary terms Part of speech tagging Use an existing EVI. Extract terms (words and noun phrases) from titles and abstracts. User selects search terms from the ranked list of terms returned by the EVI. YES Building an Entry Vocabulary Module (EVI) Searching For noun phrases Internet DB indexed with a controlled vocabulary. Domains to select from: Engineering, Medicine, Biology, Social science, etc. User has question but is unfamiliar with the domain he wants to search. NO Building and Searching EVIs

SLIDE 42IS 240 – Spring 2007 Technical Details Download a set of training data. Build associations between extracted terms & controlled vocabularies. Part of speech tagging Extract terms (words and noun phrases) from titles and abstracts. Building an Entry Vocabulary Module (EVI) For noun phrases Internet DB indexed with a controlled vocabulary.

SLIDE 43IS 240 – Spring 2007 Association Measure C ¬C t a b ¬t c d Where t is the occurrence of a term and C is the occurrence of a class in the training set

SLIDE 44IS 240 – Spring 2007 Association Measure Maximum Likelihood ratio W(C,t) = 2[logL(p 1,a,a+b) + logL(p 2,c,c+d) - logL(p,a,a+b) – logL(p,c,c+d)] where logL(p,n,k) = klog(p) + (n – k)log(1- p) and p 1 = p 2 = p= a a+b c c+d a+c a+b+c+d Vis. Dunning

SLIDE 45IS 240 – Spring 2007 Alternatively Because the “evidence” terms in EVIs can be considered a document, you can also use IR techniques and use the top-ranked classes for classification or query expansion

SLIDE 46IS 240 – Spring 2007 Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Statistical association Digital library resources

SLIDE 47IS 240 – Spring 2007 EVI example EVI 1 Index term: “pass mtr veh spark ign eng” User Query “Automobile” EVI 2 Index term: “automobiles” OR “internal combustible engines”

SLIDE 48IS 240 – Spring 2007 But why stop there? Index EVI

SLIDE 49IS 240 – Spring 2007 “Which EVI do I use?” Index EVI Index EVI Index EVI

SLIDE 50IS 240 – Spring 2007 EVI to EVIs Index EVI Index EVI Index EVI EVI 2

SLIDE 51IS 240 – Spring 2007 Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Why not treat language the same way?

SLIDE 52IS 240 – Spring 2007 Texts Numeric datasets It is also difficult to move between different media forms Thesaurus/ Ontology EVI

SLIDE 53IS 240 – Spring 2007 Searching across data types Different media can be linked indirectly via metadata, but often (e.g. for socio-economic numeric data series) you also need to specify WHERE to get correct results

SLIDE 54IS 240 – Spring 2007 Texts Numeric datasets But texts associated with numeric data can be mapped as well… Thesaurus/ Ontology captions EVI

SLIDE 55IS 240 – Spring 2007 EVI to Numeric Data example EVI LCSH marcnew query search results captions numeric table numeric database online catalog search interface 1 search interface

SLIDE 56IS 240 – Spring 2007 Texts Numeric datasets But there are also geographic dependencies… Thesaurus/ Ontology captionsMaps/ Geo Data EVI

SLIDE 57IS 240 – Spring 2007 WHERE: Place names are problematic… Variant forms: St. Petersburg, Санкт Петербург, Saint-Pétersbourg,... Multiple names: Cluj, in Romania / Roumania / Rumania, is also called Klausenburg and Kolozsvar. Names changes: Bombay  Mumbai. Homographs:Vienna, VA, and Vienna, Austria; –50 Springfields. Anachronisms: No Germany before 1870 Vague, e.g. Midwest, Silicon Valley Unstable boundaries: 19th century Poland; Balkans; USSR Use a gazetteer!

SLIDE 58IS 240 – Spring 2007 WHERE. Geo-temporal search interface. Place names found in documents. Gazetteer provided lat. & long. Places displayed on map. Timebar 

SLIDE 59IS 240 – Spring 2007 Zoom on map. Click on place for a list of records. Click on record to display text.

SLIDE 60IS 240 – Spring 2007 Catalogs and gazetteers should talk to each other! Geographic sort / display of catalog search result. Catalog search Gazetteer search

SLIDE 61IS 240 – Spring 2007 Texts Numeric datasets So geographic search becomes part of the infrastructure Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI

SLIDE 62IS 240 – Spring 2007 WHEN: Search by time is also weakly supported… Calendars are the standard for time But people use the names of events to refer to time periods Named time periods resemble place names in being: –Unstable: European War, Great War, First World War –Multiple: Second World War, Great Patriotic War –Ambiguous: “Civil war” in different centuries in England, USA, Spain, etc. Places have temporal aspects & periods have geographical aspects: When the Stone Age was, varies by region

SLIDE 63IS 240 – Spring 2007 Suggests a similar solution: A gazetteer- like Time Period Directory. Gazetteer: –Place name – Type – Spatial markers (Lat & long) -- When Time Period Directory: –Period name – Type – Time markers (Calendar) – Where Note the symmetry in the connections between Where and When. Similarity between place names and period names

SLIDE 64IS 240 – Spring 2007 Solution - Time Period Directories Initial development involved mining the Library of Congress Subject Authority file for named time periods…

SLIDE 65IS 240 – Spring 2007 LC MARC Authorities Records sh Magdeburg (Germany) History Siege, g Sieges Germany Work cat.: : Besselmeier, S. Warhafftige history vnd beschreibung des Magdeburgischen Kriegs, Cath. encyc. (Magdeburg: besieged ( ) by the Margrave Maurice of Saxony) Ox. encyc. reformation (Magdeburg:... during the siege of Magdeburg...)

SLIDE 66IS 240 – Spring 2007 timePeriodEntry  Time Period Directory Instance  Contains components described below - periodID  Unique identifier - periodName  Period name, can be repeated for alternative names  Information about language, script, transliteration scheme  Source information and notes (where was the period name mentioned) - descriptiveNotes  Description of time period - dates  Calendar and date format  Begin & end date (exact, earliest, latest, most-likely, advocated-by-source, ongoing)  Notes, sources - periodClassification  Period type, e.g. Period of Conflict, Art movement  Can plug in different classification schemes  Can be repeated for several classifications - location  Associated places with time period  Contains both place name and entry to a gazetteer providing more specific place information like latitude / longitude coordinates  Can plug in different location indicators (e.g. ADL gazetteer, Getty Thesaurus of Geographic names)  Recently added coordinates for direct use - relatedPeriod  Related time periods  periodID of related periods  Information about relationship type (part-of, successor etc.)  Can plug in different relationship type schemes - entryMetadata  Notes about creator / creation of instance  Entry date  Modification date

SLIDE 67IS 240 – Spring 2007

SLIDE 68IS 240 – Spring 2007 Time periods by named location

SLIDE 69IS 240 – Spring 2007 Catalog Search Result

SLIDE 70IS 240 – Spring 2007 Web Interface - Access by map

SLIDE 71IS 240 – Spring 2007 Zoomable interface gives access to geographically focused info…

SLIDE 72IS 240 – Spring 2007 Link initiates search of the Library of Congress catalog for all records relating to this time period. Web Interface - Access by timeline

SLIDE 73IS 240 – Spring 2007 WHEN and WHAT These named time periods are derived from Library of Congress catalog subject headings and so can be used for catalog searching which finds books on topics important for that time period

SLIDE 74IS 240 – Spring 2007 Time period directories link via the place (or time) Texts Numeric datasets Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies

SLIDE 75IS 240 – Spring 2007 WHEN, WHERE and WHO Catalog records found from a time period search commonly include names of persons important at that time. Their names can be forwarded to, e.g., biographies in the Wikipedia encyclopedia.

SLIDE 76IS 240 – Spring 2007 Place and time are broadly important across numerous tools and genres including, e.g. Language atlases, Library catalogs, Biographical dictionaries, Bibliographies, Archival finding aids, Museum records, etc., etc. Biographical dictionaries are heavy on place and time: Emanuel Goldberg, Born Moscow PhD under Wilhelm Ostwald, Univ. of Leipzig, Director, Zeiss Ikon, Dresden, Moved to Palestine Died Tel Aviv, Life as a series of episodes involving Activity (WHAT), WHERE, WHEN, and WHO else.

SLIDE 77IS 240 – Spring 2007 A new form of biographical dictionary would link to all Texts Numeric datasets Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies Biographical Dictionary

SLIDE 78IS 240 – Spring 2007 A Metadata Infrastructure CATALOGS Achives Historical Societies Libraries Museums Public Television Publishers Booksellers Audio Images Numeric Data Objects Texts Virtual Reality Webpages RESOURCES INTERMEDIA INFRASTRUCTURE Text and ImagesBiographical DictionaryWHO TimelinesTime Period DirectoryWHEN MapsGazetteer WHERE Syndetic StructureThesaurusWHAT Special Display ToolsAuthority ControlFacet Learners Dossiers

SLIDE 79IS 240 – Spring 2007 Acknowledgements Electronic Cultural Atlas Initiative project This work was partially supported by the Institute of Museum and Library Services through a National Leadership Grant for Libraries, award number LG , Oct Sept 2006 entitled “Supporting the Learner: What, Where, When and Who” – See: Michael Buckland, Fred Gey, Vivien Petras, Matt Meiske, Kim Carl, Anya Kartavenko, Minakshi Mukherjee Contact: