Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of California, Berkeley

Similar presentations


Presentation on theme: "University of California, Berkeley"— Presentation transcript:

1 What is Information? The Nature, Growth and Characteristics of Information
University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture authors: Marti Hearst & Ray Larson 8/30/2001 Information Organization and Retrieval

2 Information Organization and Retrieval
What is Information? There is no “correct” definition Can involve philosophy, psychology, signal processing, physics Cookie Monster’s definition: “news or facts about something” Oxford English Dictionary information: informing, telling; thing told, knowledge, items of knowledge, news knowledge: knowing familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known 8/30/2001 Information Organization and Retrieval

3 Assignment 1 Discussion
What is information, according to your background or area of expertise? 8/30/2001 Information Organization and Retrieval

4 Information Organization and Retrieval
Types of Information Differentiation by form. Differentiation by content. Differentiation by quality. Differentiation by associated information. 8/30/2001 Information Organization and Retrieval

5 Information Properties
Information can be communicated electronically Broadcasting Networking Information can be easily duplicated and shared Problems of Ownership Problems of Control Adapted from ‘Silicon Dreams’ by Robert W. Lucky 8/30/2001 Information Organization and Retrieval

6 Intuitive Notion (Losee 97)
Information must Be something, although the exact nature (substance, energy, or abstract concept) is not clear; Be “new”: repetition of previously received messages is not informative Be “true”: false or counterfactual information is “mis-information” Be “about” something This human-centered approach emphasizes meaning and use of message 8/30/2001 Information Organization and Retrieval

7 Information from the Human Perspective
Levels in cognitive processing perception observation/attention reasoning, assimilating, forming inferences Knowledge: “justified true belief” Belief: an idea held based on some support; an internally accepted statement, result of inductive processes combining observed facts with a reasoning process Does information require a human mind? Communication and information transfer among ants A tree falls in the forest … is there information there? Existence of quarks 8/30/2001 Information Organization and Retrieval

8 Information Organization and Retrieval
Meaning vs. Form Form of information as the information itself Meaning of a signal vs. the signal itself What aspects of a document are information? Representation (Norman 93) Why do we write things down? Socrates thought writing would obliterate serious thought Sounds and gestures fade away Artifacts help us to reason Anything not present in the representation can be ignored Things left out of the representation are often what we don’t know how to represent 8/30/2001 Information Organization and Retrieval

9 Information Organization and Retrieval
Consider Borges infinite Library of Babel… It has all possible data combinations of letters Does it therefore contain all possible information? What about all possible knowledge? What about Wisdom? Is the Internet a prototype Library of Babel? 8/30/2001 Information Organization and Retrieval

10 Information Hierarchy
Wisdom Knowledge Information Data 8/30/2001 Information Organization and Retrieval

11 Information Hierarchy
Data The raw material of information Information Data organized and presented by someone Knowledge Information read, heard or seen and understood Wisdom Distilled and integrated knowledge and understanding 8/30/2001 Information Organization and Retrieval

12 Information Organization and Retrieval
Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data? 8/30/2001 Information Organization and Retrieval

13 Information Organization and Retrieval
Information Theory Claude Shannon, 1940’s, studying communication Ways to measure information Communication: producing the same message at its destination as that seen at its source Problem: a “noisy channel” can distort the message Between transmitter and receiver, the message must be encoded Semantic aspects are irrelevant Noise Channel Receiver Desti-nation Message source Trans-mitter 8/30/2001 Information Organization and Retrieval

14 Information Organization and Retrieval
Information Theory Better called “Communication Theory” Communication may be over time and space Source Decoding Encoding Destination Message Channel Noise Storage Source Decoding (Retrieval/Reading) Encoding (writing/indexing) Destination Message 8/30/2001 Information Organization and Retrieval

15 What kinds of information are there?
Text books, periodicals, WWW, memos, ads published/refeered Film Photos, other Images Broadcast TV, Radio Telephone Conversations Databases 8/30/2001 Information Organization and Retrieval

16 Information Organization and Retrieval
How much information is there? (Estimates courtesy Hal Varian and Peter Lyman: 8/30/2001 Information Organization and Retrieval

17 Information Organization and Retrieval
How Much Information? Stored Information Print Film Optical Magnetic Communicated Internet Broadcast Phone Mail 8/30/2001 Information Organization and Retrieval

18 Information Organization and Retrieval
Print Annual Production Books ,735 = Terabytes (compressed image) Newspapers = Terabytes Journals = Terabytes Magazines = Terabytes Office Documents 7.5x10^9 pages = 195 Terabytes I.e. 7,500,000,000 TOTAL 240 Terabytes (1200 scanned, 24 text) 8/30/2001 Information Organization and Retrieval

19 Information Organization and Retrieval
Print Library of Congress Printed book collection About 18 Million books About 130 Terabytes (compressed image) For all of LC we should also assume 13M photographs, 5MB each = 65 TB 4M maps, say 200 TB 500K files, 1GB each = 500 TB 3.5M sound recordings, ~2000 TB Grand total: 3 petabytes (~3000 terabytes) Books in Print 3.2 Million titles About 26 Terabytes 8/30/2001 Information Organization and Retrieval

20 Information Organization and Retrieval
Film and Image Film Photographs = 410 Petabytes per year Movies = 16 Terabytes (Commercial Production of about 4000 films) X-Rays = 17.2 Petabytes 8/30/2001 Information Organization and Retrieval

21 Information Organization and Retrieval
Optical Media CD-Music 90,000 items = 58 TB CD-ROM 1,000 items = 3 TB DVD-Video 5,000 items = 22 TB Total TB Total compressed TB 8/30/2001 Information Organization and Retrieval

22 Information Organization and Retrieval
Magnetic Media Audio Tape 184,200,000 = Petabytes Video Tape 355,000,000 = 1420 Floppy disks = 0.07 Removable disks = 1.69 Hard Disks = 500 8/30/2001 Information Organization and Retrieval

23 Information Organization and Retrieval
Totals Stored Per Year Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books Newspapers Periodicals Office documents SUBTOTAL Film Photographs , ,000 Cinema X-Rays , ,200 SUBTOTAL , ,216 Optical Music CDs Data CDs DVDs SUBTOTAL Magnetic Camcorder , ,000 Disk drives ,393, ,210 SUBTOTAL ,693, ,210 TOTAL ,120, ,480 8/30/2001 Information Organization and Retrieval

24 Internet Traffic -- Historical
Dec 1996 = 1500Tb Dec 1997 = 3000Tb Tb Nov ‘92 Apr ‘95 8/30/2001 Information Organization and Retrieval

25 Information Organization and Retrieval
Internet Traffic Percentage Nov ‘92 Apr ‘95 8/30/2001 Information Organization and Retrieval

26 Information Organization and Retrieval
Currently... There are an estimated 2.5 Billion pages on the Web About Terabytes (surface web) About 7500 further Terabytes in web-accessed DBs. 610 Billion messages per year = TB Internet Traffic is doubling every 100 days - An estimated 62 Million Americans now use the internet (US Commerce Dept 1998) Radio took 38 years to get 50 M listeners, TV took 13 years, the Net took 4 years... 8/30/2001 Information Organization and Retrieval

27 Internet - Recent Statistics
5 M Level 2 Domains (NW June 1999) 43.2 Million Hosts (NW January 1999) 206/246 IP countries (NW July 1998) 300 Million Users (Newsbytes, Mar 2000) (830 Million Telephone Terminations) Source: Vint Cerf 8/30/2001 Information Organization and Retrieval 4

28 Information Organization and Retrieval
Internet Hosts (000s) Source: Vint Cerf 8/30/2001 Information Organization and Retrieval 5

29 Projected Voice and Data Traffic
Gb/s Source: America's Network, May 15, 1998 8/30/2001 Information Organization and Retrieval

30 Users on the Internet - May 1999
CAN/US M Europe M Asia/Pac M Latin Am M Africa M Mid-east M Total - 165M Source: Vint Cerf 8/30/2001 Information Organization and Retrieval

31 Information Organization and Retrieval
Language Distribution of Web Content Source: Jack Xu: Excite 8/30/2001 Information Organization and Retrieval

32 Language Distribution on a 634 Million Web Pages Corpus
8/30/2001 Information Organization and Retrieval

33 Sources on Information, Computer, and Network Use
Statistical snippets extracted from the news Vint Cerf’s pages The size and growth rate of the Internet by K.G. Coffman and Andrew Odlyzko 8/30/2001 Information Organization and Retrieval

34 Information Organization and Retrieval
Human Memory Landauer 86: Human brain holds 200MB looked at rate of information intake and rate of forgetting, and amount of information adults need for normal tasks 6B people on earth implies total memory of all people alive about 1,200 petabytes Another way: estimate that people take in a byte/sec lifetime 250,000 days or 2B sec result is 2 GB (doesn’t count synthesizing new info) 8/30/2001 Information Organization and Retrieval

35 Information Organization and Retrieval
Information Overload “The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.” (Varian & Lyman) “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) 8/30/2001 Information Organization and Retrieval

36 Information Organization and Retrieval
To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley 8/30/2001 Information Organization and Retrieval

37 Information Life Cycle
Creation Utilization Searching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering 8/30/2001 Information Organization and Retrieval

38 Information Organization and Retrieval
Authoring/Modifying Converting Data+Information+Knowledge to New Information. Creating information from observation, thought. Editing and Publication. Gatekeeping 8/30/2001 Information Organization and Retrieval

39 Information Organization and Retrieval
Organizing/Indexing Collecting and Integrating information. Affects Data, Information and Metadata. “Metadata” Describes data and information. More on this later. Organizing Information. Types of organization? Indexing 8/30/2001 Information Organization and Retrieval

40 Information Organization and Retrieval
Storing/Retrieving Information Storage How and Where is Information stored? Retrieving Information. How is information recovered from storage How to find needed information Linked with Accessing/Filtering stage 8/30/2001 Information Organization and Retrieval

41 Distribution/Networking
Transmission of information How is information transmitted? Networks vs Broadcast. 8/30/2001 Information Organization and Retrieval

42 Information Organization and Retrieval
Accessing/Filtering Using the organization created in the O/I stage to: Select desired (or relevant) information Locate that information Retrieve the information from its storage location (often via a network) 8/30/2001 Information Organization and Retrieval

43 Information Organization and Retrieval
Using/Creating Using Information. Transformation of Information to Knowledge. Knowledge to New Data and New Information. 8/30/2001 Information Organization and Retrieval

44 Key issues in this course
How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs. Retrieving How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them. Organizing 8/30/2001 Information Organization and Retrieval

45 Information Organization and Retrieval
Key Issues Creation Utilization Searching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering 8/30/2001 Information Organization and Retrieval

46 Information Organization and Retrieval
Next Week Introduction to IR The search process 8/30/2001 Information Organization and Retrieval


Download ppt "University of California, Berkeley"

Similar presentations


Ads by Google