Information retrieval mon jan 26 2015 data…. framework for today’s lecture…

Slides:



Advertisements
Similar presentations
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Advertisements

Fawcett Library Online Resources The Webb Schools of California.
Database A collection of related information stored on a computer and organized in a manner that allows access, retrieval, and use of that data.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Database Management An Introduction.
Faceted Metadata for Site Navigation and Search Marti Hearst 12/17/2009.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
Social Tagging and Search Marti Hearst UC Berkeley.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
1 Research in Nursing Introduction to Web-Based Resources at the Kean University Library.
Best Practices for Search for the Federal Government Marti Hearst Web Manager University November 10, 2009.
What is the Internet? The Internet is a computer network connecting millions of computers all over the world It has no central control - works through.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Faceted Metadata for Information Architecture and Search Marti Hearst, SIMS at UC Berkeley Preston Smalley & Corey Chandler, eBay User Experience & Design.
UIs for Faceted Navigation Recent Advances and Remaining Open Problems HCIR’08 Marti Hearst, UC Berkeley (including some slides from Corey Chandler of.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
CS 430 / INFO 430 Information Retrieval
1 Nursing: Concept Models for Professional Practice Introduction to Research Resources at the Kean University Library.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
OER Case Study TJTS569 Advanced Topics in Global Information Systems Savenkova Iuliia.
Information retrieval thur jan data…. framework for today’s lecture…
Databases & Data Warehouses Chapter 3 Database Processing.
RESEARCHING TIPS & STRATEGIES Summer 2008 Melanie Wilson Academic Success Center MSC 207.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Chapter 5 Searching for Truth: Locating Information on the WWW.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Information retrieval wed sept data…. -start at 6.45.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Undergraduate Project Preparation – Literature review and referencing.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Information Systems & Databases 2.2) Organisation methods.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
How can Search Interfaces Enhance the Value of Semantic Annotations (and Vice Versa?) Keynote Talk ESAIR’13: Sixth International Workshop on Exploiting.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Introduction to metadata
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Search Engines By: Faruq Hasan.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
Comparative Labor History Research Tools & Strategies.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
What is Academic Research and Where Does It Come From? Database v Internet.
Databases vs the Internet. QUESTION: What is the main difference between using library databases and search engines? ANSWER: Databases are NOT the Internet.
A Faceted Interface to the Library Catalog Tito Sierra NCSU Libraries ALA Midwinter Meeting January 20, 2007.
Definition, purposes/functions, elements of IR systems Lesson 1.
CS315 Introduction to Information Retrieval Boolean Search 1.
Organization of Information LSIS Summer II (2005)
What is Academic Research and Where Does It Come From? Database v Internet.
Information Organization: Overview
NLP Support for Faceted Navigation in Scholarly Collections
Federated & Meta Search
PAF 101 Module 2, Lecture 1 “An educated person is one who has learned that information almost always turns out to be at best incomplete and very often.
Library Content Comparison System
Information Retrieval
Introduction to Semantic Metadata & Semantic Web
Searching for Truth: Locating Information on the WWW
Introduction to Information Retrieval
Searching for Truth: Locating Information on the WWW
Spreadsheets, Modelling & Databases
Searching for Truth: Locating Information on the WWW
Information Organization: Overview
Information Retrieval and Web Design
Presentation transcript:

information retrieval mon jan data…

framework for today’s lecture…

STRUCTURED vs unstructured data easy to envision structured data in terms of “tables” 4 EmployeeManagerSalary SmithJones50000 ChangSmith IvySmith Typically allows numerical range and exact match (for text) queries, e.g., Salary < AND Manager = Smith.

tables in a MS Access relational database – defines each defining a social networking site

Data entry form in a MS Access relational database – create each record

typically refers to free text is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an remains unstructured other examples of unstructured data include books, documents, medical records, and social media posts structured vs UNSTRUCTURED data

magazine article is an example of unstructured data

Document collection (corpus) Index Query Representation function Matching function Results CATEGORIES SUBJECT HEADINGS

KWIC Key word in context

KWIC Key word in context

metadata

What is Metadata? Classic definition: data about data Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) 3 primary “types”: – Descriptive – Structural – Administrative (rights management, preservation)

digital forensics

The article was about a court case in which a judge ruled that the NSA's collection of metadata related to Americans' phone calls (their length, who they were to/from, how often they occurred) could very well be unconstitutional, despite the argument of the defendants of the NSA program--that collecting metadata was not akin to recording phone calls. But the judge's ruling clearly demonstrates that, today, metadata can tell us worlds of information. What's even more interesting to think about, and what the article also addresses, is that the power of metadata has increased partly because of the increasing extent to which technology is incorporated into our lives. Our use of technology leaves a sort of digital footprint, and as technology has become even more prevalent for us, there is more and more metadata about how we are using technology. That metadata, in turn, can tell others a great deal about ourselves. -Emma

Google has millions and millions of web crawlers, robots that “crawl” around the “web” and gather new sites and archive them in the google sphere. The way they do this is by collecting and creating metadata about the sites they visit. Google proceeds to use this data in its ranking algorithms defining what gets precedence even in the most basic searches. We know Google accepts money from companies to place their site in the advertised content above your relevance search results. Even slightly divided from each other they’re still coming up in the same search. Google changes their Algorithms relatively regularly and who’s to say that their not weighting money givers higher in the rankings? -Ryan

b More Metadata: A Cataloging Record

The Idea of Facets Facets are a way of labeling data – A kind of Metadata (data about data) – Can be thought of as properties of items Facets vs. Categories – Items are placed INTO a category system – Multiple facet labels are ASSIGNED TO items

Facets Epicurious example Create INDEPENDENT categories (facets) – Each facet has labels (sometimes arranged in a hierarchy) Assign labels from the facets to every item – Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

The Idea of Facets Break out all the important concepts into their own facets Sometimes the facets are hierarchical – Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Using Facets Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

labor intensive? expensive?

UNC Libraries Online Catalog e.g. personal crisis

caveat: semi-structured data in fact almost no data is absolutely “unstructured” e.g., this slide has distinctly identified zones such as the title and bullets facilitates “semi-structured” search such as – title contains data and bullets contain structure

Let’s look at a database of magazine & journal articles… …Academic Search Complete >> UNC Libraries Homepage: >> E-Research Tools >> Frequently Used >> Academic Search Complete [off-campus log in with onyen/password

Organization / Search We organize to enable retrieval The more effort we put into organizing information, the more effectively it can be retrieved The more effort we put into retrieving information, the less it needs to be organized first We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever The allocation differs according to the relationship between them; who does the work and who gets the benefit?