University of California, Berkeley

Slides:



Advertisements
Similar presentations
An Overview of the Computer System
Advertisements

Audio and Visual Technologies
What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley School of Information Management and Systems.
FIRST COURSE Essential Computer Concepts. XP New Perspectives on Microsoft Office 2007: Windows XP Edition2 Objectives Compare the types of computers.
Technological Convergence for Institutions & Audiences
What Is A Computer System?
8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
SLIDE 1IS Fall 2003 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.
SLIDE 1IS Fall 2002 Course Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am.
10/23/2001Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
SLIDE 1IS Fall 2002 Lecture 02: Info/History/Photo Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am.
8/31/2000Information Organization and Retrieval What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley.
10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
Introduction to Computers Essential Understanding of Computers and Computer Operations.
 A device that  accepts input,  processes data,  stores data, and  produces output, all according to a series of stored instructions. 4 Step process.
FIRST COURSE Essential Computer Concepts. 2 Objectives Compare the types of computers Describe the components of a computer system Describe input and.
Information Seeking Behavior
Computer Memory Chips Vs. Human Memory Computer Memory Chips Vs. Human Memory Agenda.Introduction.What does ( memory ) mean ?.Brain memory V.S computer.
INTRODUCTION TO COMPUTING
1 Machine Architecture and Number Systems Topics Major Computer Components Bits, Bytes, and Words The Decimal Number System The Binary Number System Converting.
What is a Computer ? What is the application of computer in Our Daily Life ? What is the application of computer in Teaching Field?
Where are we?. Assignments Library map assignment Biography Defining the Research Question (Tutorial 1) Organization Paragraph Netiquette Quiz.
I.T MEDIA MAISRUL www.roelsite.yolasite.com
Digital Literacy Lesson 3. The Role of Memory A computer stores data in the memory when a task is performed. Data is stored in the form of 0s and 1s.
Multimedia ITGS. Multimedia Multimedia: Documents that contain information in more than one form: Text Sound Images Video Hypertext: A document or set.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS.
ELEMENTS OF A COMPUTER SYSTEM HARDWARE SOFTWARE PEOPLEWARE DATA.
What is Communication? Güven Selçuk.
Computers Mrs. Flowers University High School.
© OCR 2016 Unit 2.6 Data Representation Lesson 1 ‒ Numbers.
Introduction to Multimedia
How Has This Course Changed Your Perception of Digital Media
Introduction to Computers
Part B Computer Storage
An Overview of the Computer System
Multimedia Content & Animation Design
Essential Computer Concepts
Welcome Back!.
3 - STORAGE: DATA CAPACITY CALCULATIONS
Unit 1, Lesson 2 Introduction to Digital Media
Chapter 1 : 1.1 Computer Concepts
Machine Architecture and Number Systems
Searching for and Accessing Information
Unit 3—Part A Computer Memory
Looking Inside the machine (Types of hardware, CPU, Memory)
Introduction to Computing Lecture # 1
An Overview of the Computer System
An Overview of the Computer System
Data Representation Numbers
Chapter 11-Business and Technology
Unit 3—Part A Computer Memory
INTRODUCTION TO INFORMATION TECHNOLOGY Your Digital World
short term and long term speed, capacity, compression formats, access
Technology Mrs. Huddleston
Introduction into Knowledge and information
Chapter 3 Hardware and software 1.
Advanced Information Retrieval
Traditional & social media
Unit# 5: Internet and Worldwide Web
Chapter 3 Hardware and software 1.
Part C Computer Storage
Information Technology Department
Multimedia Content & Animation Design
Search and Retrieval in a Virtual World
Introduction to Multimedia
TECHNOLOGICAL CONVERGENCE for Institutions & Audiences
Information Retrieval and Web Search
Presentation transcript:

What is Information? The Nature, Growth and Characteristics of Information University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture authors: Marti Hearst & Ray Larson 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval What is Information? There is no “correct” definition Can involve philosophy, psychology, signal processing, physics Cookie Monster’s definition: “news or facts about something” Oxford English Dictionary information: informing, telling; thing told, knowledge, items of knowledge, news knowledge: knowing familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known 8/30/2001 Information Organization and Retrieval

Assignment 1 Discussion What is information, according to your background or area of expertise? 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Types of Information Differentiation by form. Differentiation by content. Differentiation by quality. Differentiation by associated information. 8/30/2001 Information Organization and Retrieval

Information Properties Information can be communicated electronically Broadcasting Networking Information can be easily duplicated and shared Problems of Ownership Problems of Control Adapted from ‘Silicon Dreams’ by Robert W. Lucky 8/30/2001 Information Organization and Retrieval

Intuitive Notion (Losee 97) Information must Be something, although the exact nature (substance, energy, or abstract concept) is not clear; Be “new”: repetition of previously received messages is not informative Be “true”: false or counterfactual information is “mis-information” Be “about” something This human-centered approach emphasizes meaning and use of message 8/30/2001 Information Organization and Retrieval

Information from the Human Perspective Levels in cognitive processing perception observation/attention reasoning, assimilating, forming inferences Knowledge: “justified true belief” Belief: an idea held based on some support; an internally accepted statement, result of inductive processes combining observed facts with a reasoning process Does information require a human mind? Communication and information transfer among ants A tree falls in the forest … is there information there? Existence of quarks 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Meaning vs. Form Form of information as the information itself Meaning of a signal vs. the signal itself What aspects of a document are information? Representation (Norman 93) Why do we write things down? Socrates thought writing would obliterate serious thought Sounds and gestures fade away Artifacts help us to reason Anything not present in the representation can be ignored Things left out of the representation are often what we don’t know how to represent 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Consider Borges infinite Library of Babel… It has all possible data combinations of letters Does it therefore contain all possible information? What about all possible knowledge? What about Wisdom? Is the Internet a prototype Library of Babel? 8/30/2001 Information Organization and Retrieval

Information Hierarchy Wisdom Knowledge Information Data 8/30/2001 Information Organization and Retrieval

Information Hierarchy Data The raw material of information Information Data organized and presented by someone Knowledge Information read, heard or seen and understood Wisdom Distilled and integrated knowledge and understanding 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data? 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Information Theory Claude Shannon, 1940’s, studying communication Ways to measure information Communication: producing the same message at its destination as that seen at its source Problem: a “noisy channel” can distort the message Between transmitter and receiver, the message must be encoded Semantic aspects are irrelevant Noise Channel Receiver Desti-nation Message source Trans-mitter 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Information Theory Better called “Communication Theory” Communication may be over time and space Source Decoding Encoding Destination Message Channel Noise Storage Source Decoding (Retrieval/Reading) Encoding (writing/indexing) Destination Message 8/30/2001 Information Organization and Retrieval

What kinds of information are there? Text books, periodicals, WWW, memos, ads published/refeered Film Photos, other Images Broadcast TV, Radio Telephone Conversations Databases 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval How much information is there? (Estimates courtesy Hal Varian and Peter Lyman: http://www.sims.berkeley.edu/emc) 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval How Much Information? Stored Information Print Film Optical Magnetic Communicated Internet Broadcast Phone Mail 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Print Annual Production Books 968,735 = 8 Terabytes (compressed image) Newspapers 22643 = 25 Terabytes Journals 40000 = 2 Terabytes Magazines 80000 = 10 Terabytes Office Documents 7.5x10^9 pages = 195 Terabytes I.e. 7,500,000,000 TOTAL 240 Terabytes (1200 scanned, 24 text) 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Print Library of Congress Printed book collection About 18 Million books About 130 Terabytes (compressed image) For all of LC we should also assume 13M photographs, 5MB each = 65 TB 4M maps, say 200 TB 500K files, 1GB each = 500 TB 3.5M sound recordings, ~2000 TB Grand total: 3 petabytes (~3000 terabytes) Books in Print 3.2 Million titles About 26 Terabytes 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Film and Image Film Photographs = 410 Petabytes per year Movies = 16 Terabytes (Commercial Production of about 4000 films) X-Rays = 17.2 Petabytes 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Optical Media CD-Music 90,000 items = 58 TB CD-ROM 1,000 items = 3 TB DVD-Video 5,000 items = 22 TB Total 83 TB Total compressed 29 TB 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Magnetic Media Audio Tape 184,200,000 = 184.2 Petabytes Video Tape 355,000,000 = 1420 Floppy disks = 0.07 Removable disks = 1.69 Hard Disks = 500 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Totals Stored Per Year Medium Type of content Terabytes/Year Terabytes/Year Upper Bound Lower Bound Paper Books 8 1 Newspapers 25 2 Periodicals 12 1 Office documents 195 19 SUBTOTAL 240 23 Film Photographs 410,000 41,000 Cinema 16 16 X-Rays 17,200 17,200 SUBTOTAL 427,216 58,216 Optical Music CDs 58 6 Data CDs 3 3 DVDs 22 22 SUBTOTAL 83 31 Magnetic Camcorder 300,000 300,000 Disk drives 1,393,000 277,210 SUBTOTAL 1,693,000 577,210 TOTAL 2,120,539 635,480 8/30/2001 Information Organization and Retrieval

Internet Traffic -- Historical Dec 1996 = 1500Tb Dec 1997 = 3000Tb Tb Nov ‘92 Apr ‘95 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Internet Traffic Percentage Nov ‘92 Apr ‘95 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Currently... There are an estimated 2.5 Billion pages on the Web About 25-50 Terabytes (surface web) About 7500 further Terabytes in web-accessed DBs. 610 Billion email messages per year = 11285 TB Internet Traffic is doubling every 100 days - An estimated 62 Million Americans now use the internet (US Commerce Dept 1998) Radio took 38 years to get 50 M listeners, TV took 13 years, the Net took 4 years... 8/30/2001 Information Organization and Retrieval

Internet - Recent Statistics 5 M Level 2 Domains (NW June 1999) 43.2 Million Hosts (NW January 1999) 206/246 IP countries (NW July 1998) 300 Million Users (Newsbytes, Mar 2000) (830 Million Telephone Terminations) Source: Vint Cerf 8/30/2001 Information Organization and Retrieval 4

Information Organization and Retrieval Internet Hosts (000s) 1989-2006 Source: Vint Cerf 8/30/2001 Information Organization and Retrieval 5

Projected Voice and Data Traffic Gb/s Source: America's Network, May 15, 1998 8/30/2001 Information Organization and Retrieval

Users on the Internet - May 1999 CAN/US - 90.65M Europe - 40.09M Asia/Pac - 26.97M Latin Am - 5.29M Africa - 1.14M Mid-east - 0.88 M --------------------------- Total - 165M Source: Vint Cerf 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Language Distribution of Web Content Source: Jack Xu: Excite 8/30/2001 Information Organization and Retrieval

Language Distribution on a 634 Million Web Pages Corpus 8/30/2001 Information Organization and Retrieval

Sources on Information, Computer, and Network Use http://www.sims.berkeley.edu/emc/ http://www.cs.cmu.edu/afs/cs.cmu.edu/user/bam/www/numbers.html Statistical snippets extracted from the news http://www.wcom.com/about_the_company/cerfs_up/ Vint Cerf’s pages http://www.firstmonday.dk/issues/issue3_10/coffman/index.html The size and growth rate of the Internet by K.G. Coffman and Andrew Odlyzko 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Human Memory Landauer 86: Human brain holds 200MB looked at rate of information intake and rate of forgetting, and amount of information adults need for normal tasks 6B people on earth implies total memory of all people alive about 1,200 petabytes Another way: estimate that people take in a byte/sec lifetime 250,000 days or 2B sec result is 2 GB (doesn’t count synthesizing new info) 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Information Overload “The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.” (Varian & Lyman) “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden) 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley 8/30/2001 Information Organization and Retrieval

Information Life Cycle Creation Utilization Searching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Authoring/Modifying Converting Data+Information+Knowledge to New Information. Creating information from observation, thought. Editing and Publication. Gatekeeping 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Organizing/Indexing Collecting and Integrating information. Affects Data, Information and Metadata. “Metadata” Describes data and information. More on this later. Organizing Information. Types of organization? Indexing 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Storing/Retrieving Information Storage How and Where is Information stored? Retrieving Information. How is information recovered from storage How to find needed information Linked with Accessing/Filtering stage 8/30/2001 Information Organization and Retrieval

Distribution/Networking Transmission of information How is information transmitted? Networks vs Broadcast. 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Accessing/Filtering Using the organization created in the O/I stage to: Select desired (or relevant) information Locate that information Retrieve the information from its storage location (often via a network) 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Using/Creating Using Information. Transformation of Information to Knowledge. Knowledge to New Data and New Information. 8/30/2001 Information Organization and Retrieval

Key issues in this course How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs. Retrieving How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them. Organizing 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Key Issues Creation Utilization Searching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering 8/30/2001 Information Organization and Retrieval

Information Organization and Retrieval Next Week Introduction to IR The search process 8/30/2001 Information Organization and Retrieval