1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Copyright Law & Your Websites Computer Science 201 November 21, 2005 Sarah Garner, J.D., M.L.I.S. Law Library Director,
Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel.
Information Retrieval in Practice
1 CS 502: Computing Methods for Digital Libraries Lecture 2 The Nomadic Computing Experiment Object Models.
1 PROJECT Web-based Database Applications Lecture 1: Basic Internet Concepts & Databases - the History.
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Searching Full Text 2.
WMES3103 : INFORMATION RETRIEVAL
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
CS300 Planning and Executing a Project Terry Hinton Helen Treharne.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
1 CS 430 / INFO 430 Information Retrieval Lecture 6 Vector Methods 2.
EDT 347 Education Technology Copyright and Fair Use.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
CS 430 / INFO 430 Information Retrieval
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Overview of Search Engines
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 WCAG2 for ICT Working Draft.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 CS 430: Information Discovery Lecture 3 Inverted Files.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
OLE Slide No. 1 Object Linking and Embedding H OLE H definition H add other information to documents H copy.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
Information Retrieval Lecture 6 Vector Methods 2.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
Global Rangelands Data Entry Guidelines March 23, 2015.
Google Scholar Google Scholar allows the researcher to search for scholarly articles on a broad range of subjects.
By the end of this session you should be able to... Understand character sets and why these are used within computer systems. Understand how characters.
Automated Information Retrieval
Information Retrieval in Practice
Topic 2: Hardware and Software
Why indexing? For efficient searching of a document
CS 430: Information Discovery
About me Civil engineer (not in IT) and self-taught developer
Search Engine Architecture
Application Software Chapter 6.
GO! with Microsoft Office 2016
CS 430: Information Discovery
Text Based Information Retrieval
Using computers to search electronic databases
CS 430: Information Discovery
GO! with Microsoft Access 2016
Product Retrieval Statistics Canada / Statistique Canada Title page
CS 430: Information Discovery
Database Vocabulary Terms.
Overview What is Multimedia? Characteristics of multimedia
CS 430: Information Discovery
TYPES OF INFORMATION SOURCES
Uppingham Community College
CS 430: Information Discovery
Presentation transcript:

1 Midterm Examination

2 General Observations Examination was too long! Most people submitted by

3 Code of Conduct  Computing is a collaborative activity. You are encouraged to work together, but...  Some tasks may require individual work.  Always give credit to your sources and collaborators. Good professional practice: To make use of the expertise of others and to build on previous work, with proper attribution. Unethical and academic plagiarism: To use the efforts of others without attribution.

4 Question 1 Consider the following scenario: [Maple Leaf Rag example] (a) Using the IFLA definitions, what are the work(s), expression(s), manifestation(s) and item(s)? What is the genre(s)? Explain your choices. (b) If you were including this material in a digital library, what digital objects would you use? What would be their structural types? Explain the reason behind your design.

5 IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

6 IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

7 IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

8 IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file

9 Structural Types Genre: Describes category of content, e.g., jazz, blues, rap, rock,... painting, fresco, mural,... operating system, compiler, interpreter,... Structural type: Describes structure of computer representation, e.g., scanned image web page marked-up text digitized audio

10 Object Models Digital object: An item as stored in a digital library, consisting of data, metadata, and an identifier. Object model: The relationship between digital objects and the content that they represent.

11 Data Structure Identifier Data Metadata page 3 gif loc.ndlp/amrlp page 1 gif page 2 gif doc1 page map object-md

12 Question 2 A printed text document can be converted to digital formats by a choice of methods: (i) digitization by scanning (ii) digitization plus optical character recognition (iii) retyping with SGML markup (a) What are the advantages and disadvantages of each of these three methods? (b) Under what circumstances would a user be unsatisfied with all three digital manifestations and want to use the original printed copy?

13 Part (a) What impact does the method of conversion have on each of the following? Retaining the appearance of the document Manipulation and searching of the converted object Cost

14 Question 3 The diagram shows a system that can be used for reference linking between journal articles. [Diagram] (a) In this system, describe the execution steps that occur at run time to resolve a reference link and obtain the required material. (b) What is the problem of selective resolution? Suggest one way that this system might be enhanced to support selective resolution.

15 The General Model Reference database Location database Content Publisher Client Publisher places information in databases

16 The General Model Reference database Location database Content Publisher Client Citation Identifier s

17 The General Model Reference database Location database Content Publisher Client URL s Identifier

18 The General Model Reference database Location database Content Publisher Client URL Content

19 The General Model Reference database Location database Content Publisher Client Citation Identifiers URLs Identifier URL Content

20 Resolution of Identifier Choice of resolver (distributed resolution) –Simple model: identifier determines resolver Selection from multiple copies (selective resolution) –Performance criteria –Economic and related criteria –User requirements

21 Question 4 (a) With MathML, what is the distinction between presentation markup and content markup? (b) You are asked to represent the following expression in MathML:  5 (1 + x) dx  2 x 2 Give the content markup for this expression. (c) Suppose that you have the representation of this expression in TeX. How would you link this to the MathML markup?

22 Part (b) is a number is an identifier

23 Annotations Content encoding Presentation encoding

24 Annotations Content encoding TeX encoding

25 Question 5 (a) You have a query, Q, that you wish to search against a set of documents D1, D2,..., D3. Explain how vector space methods can be used to rank how closely the documents match the query. (b) The query is:[query] The set of documents is: [documents] Calculate the ranking of these four documents against this query.

26 Vector Space Methods: Concept n-dimensional space, where n is the total number of different words in the set of documents. Each document is represented by a vector, with magnitude in each dimension equal to the number of times that the corresponding word appears in the document. Similarity between two documents is the angle between their vectors.

27 Example D1 -> ant ant bee D2 -> bee hog ant dog D3 -> cat gnu dog eel fox ant bee cat dog eel fox gnu hog length D1 2 1  5 D  4 D  5