Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Information Retrieval

Similar presentations


Presentation on theme: "Advanced Information Retrieval"— Presentation transcript:

1 Advanced Information Retrieval
Meeting #1

2 Revision Data Information Knowledge

3 Data: comes in the form of measurements.
50mps, 50GB, 50kg Information: a statement of fact about the measurements. The hard-disk size is 50GB Her weight drops to 50kg after taking diet pills.

4 What’s information/knowledge?
There is no correct/definite definition “Information” means different things to people depending on an individual’s world of experience, knowledge, environment, profession and situation (or IQ)

5 Informing, telling; thing told, knowledge, items of knowledge, news
Information is whatever contributes to a reduction in the uncertainty of the state of a system Dictionary of Computing Data available to individuals, firms, or governments at the time when economic decisions have to be taken Dictionary of Economics Informing, telling; thing told, knowledge, items of knowledge, news Concise Oxford Dictionary

6 Conceptually easy, but it can be very difficult.
Knowledge: is the ability to turn information and data into effective action. Conceptually easy, but it can be very difficult. e.g. How to get a distinction in exam.? How to get rid of a disgusting person?

7 How? Case studies – case based reasoning
General principles – rules of thumb Mining by using computer … … …

8 Characteristics of information
Information, to be useful, requires : information contributes to a larger understanding Information is something communicated information must be received by a human

9 For the individual, information must: •be something new •be true •be about something relevant

10 How can information be measured?
In 1948, Claude Shannon published The Mathematical Theory of Information •Spawned the “Digital Age” •Changed our conception of information

11 Shannon’s Information theory
Provided the framework to: Encode; Compress ;Transmit ; decode information without error Information becomes objective and measurable Applications: CDROM, DVD, Internet, communication….

12 Digital Age: what does info. mean today?
Information today is referred to in measurements of quantity, speed, and storage: •estimated size of the Internet is 532,897 terabytes (2002) •Web consists of approximately 3 to 6 billion pages with a growth rate of 7.3 million pages per day [which is] 0.1 terabytes of new information per day •Internet traffic doubling every 100 days •Lucent’s latest fibre-optic cable can ship “90,000 volumes of encyclopedias in a second”

13 Creation of Information
more new information has been produced in the last 30 years than in the previous 5,000 a tree can produce about 80,500 sheets of paper, thus it requires about 786 million trees to produce the world's annual paper supply the world's total yearly production of print, film, optical, and magnetic content would require roughly 5 exabytes of storage. This is the equivalent of 800 megabytes for each man, woman, and child on earth

14 Human factor Human capacity to process information remains unchanged
Human need for information unchanged .. Nosy parker Information Anxiety Frustration Pressure Competitiveness Guilt

15 Digitization … “Real” information is still out there, valuable, and wanted Digital Age has shifted emphasis away from the value of information to the access to information Our professional challenge lies somewhere in between

16 Digital age – the antidote
Richard Saul Wurman, Information Anxiety guru: “access is the antidote to anxiety” “understanding the structure of how something is organized is the first huge step in effective retrieval and controlling the information anxiety beast.”

17 Retrieval Retrieval We can’t retrieve information! Text retrieval
Document retrieval Multimedia retrieval Information retrieval the same? We can’t retrieve information! We can only retrieve documents that contains text which carries information. Information can be anywhere in the text, in the links, in the process of text. Use your brain. Grasp

18 Emergence of online information retrieval
Volume of digitized information necessitated methods to store and retrieve this data Databases needed to store text-based, natural language documents Various ways databases structured is central to the study of Information Retrieval

19 Information Retrieval
IR focuses on the acquisition, organization, storage, retrieval, and distribution of information. IR involves helping users find information that matches their information needs. IR has become a center of the focus in the web era.

20 Information Retrieval
Conceptually, information retrieval is used to cover all related problems in finding needed information Historically, information retrieval is about document retrieval, emphasizing document as the basic unit Technically, information retrieval refers to (text) string manipulation, indexing, matching, querying, etc.

21 Components of IR Systems
Human Users -- who create the needs of the system (the user) Organization -- who makes it possible to have the system (librarian Information professionals -- who operate the system and provide the services (systems ppl, , catalogers …) System Data -- the content of the system Device & media -- hardware of the system Algorithms & procedures -- software of the system

22 Information Retrieval systems
Reading , p.5 Exhibit 3 Subject indexing, why? Translation surrogates

23 Abstraction Principles
First Abstraction Principle Abstract data from the “real world” And make them available to the system. Indexing Second Abstraction Principles Abstract the user’s information needs into a form the system understands. From user’s thinking to search terms

24 Generic Document Retrieval Model
Best Matching Documents Information Need Ranking Algorithm Query Language Representation of Information Need Representation of Document Content Prior Knowledge & Assumptions Documents

25 Problem of IR User Information Search/select Info. Needs Queries
Stored Information Translating info. needs to queries Matching queries To stored information Query result evaluation Does information found match user’s information needs?

26 Information retrieval solutions may incorporate data retrieval
Data retrieval as a subset of information retrieval Data retrieval alone is not interesting Question-answering systems, data retrieval systems, text retrieval systems are all legitimate forms of IR

27 Comparison of data retrieval and information retrieval
Content Data Information Data object Table Document Matching Exact match Partial match, best match Items wanted Relevant Query language SQL Natural Query specification Complete Incomplete Model Deterministic Highly structured Probabilistic less structured Table by Xin Xao, Drexel University

28 Ass 1 Organization of info., how? Interface Functions and features
Searching, browsing examines the nature and function of library catalogues, demonstrates library catalogue searching using a number of access points, Data representation Type of IR? Technology Coverage, completeness and accuracy Indexing method and quality Indexing type ;currency of indexing; indexing size User satisfaction


Download ppt "Advanced Information Retrieval"

Similar presentations


Ads by Google