CS The Age of Infinite Storage

Slides:



Advertisements
Similar presentations
Challenges in Using Lifetime Personal Information Stores based on MyLifeBits Gordon Bell, Jim Gemmell, Roger Lueder SIGIR University of Sheffield, July.
Advertisements

Universal Memex (A Research Project for Discussion)
CS597A: Managing and Exploring Large Datasets Kai Li.
Unit 3—Part A Computer Memory
How much information? Adapted from a presentation by: Jim Gray Microsoft Research Alex Szalay Johns Hopkins University.
The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ.
Dr. Michael D. Featherstone Summer 2013 Introduction to e-Commerce Web Analytics.
Section 1 # 1 CS The Age of Infinite Storage.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2011 Chapter 1 THE WORLDS OF DATABASE SYSTEMS 1.
Section 1 # 1 CS The Age of Infinite Storage.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
CSCI 765 Big Data and Infinite Storage One new idea introduced in this course is the emerging idea of structuring data into vertical structures and processing.
Unit 2—Part A Computer Memory Computer Technology (S1 Obj 2-3)
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Units and Significant Digits
Dimensional Analysis -Blake Schmidt. In science, numbers have meaning…we need UNITS! e.g. If I ask you to measure the length of the lab bench, and you.
Floppy Disk Drive Lesson 5 CES Industries, Inc.. 1. Evolved from audio tape to floppy disk drives, with the first being an 8” disk to modern 3 1/2” 2.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Section 1 # 1 CS 766 Introduction: 1. The Age of Infinite Storage. 2. Concurrency Control. 3. Recovery.
Section 13.1 – Secondary storage management (Former Student’s Note)
The Wonderful World of Computers Larry Holder The University of Tennessee at Martin.
Vannevar Bush: As we may think. Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name,
Dr. ClincyLecture 3 Slide 1 CS Chapter 1 (1 of 2) Dr. Clincy Professor of CS.
12 Physics Lesson #1 Physics studies fundamental questions about two entities. What are these two entities?
1.3 What Is in There?.  Memory  Hard disk drive  Motherboard  CPU.
Reminders Talk in English during all activities inside the class. Don’t answer the questions if you haven’t been asked. Don’t shout the answers.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Introduction to Computers
computers, transistors, how does computer work?
Metric Units.
Section 2 Terms Autumn Buchsenschutz.
Nanotechnologies for Electronics
Understanding Big Data
12 Physics Lesson #1 Physics studies fundamental questions
What is Information? What will we retrieve with information retrieval?
Data Representation N4/N5.
Drill! Drill! Drill! 1 – Name two different things that a chemistry lab neophyte might do their first time in the lab. 2 – Name 5 different things that.
BIG Data 25 Need-to-Know Facts.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Computer Memory Digital Literacy.
How much information? Adapted from a presentation by:
The Metric System & Unit Conversions: aka Dimensional Analysis
Bits & Bytes How Computers Represent Data
9/2- 7th Grade Agenda Learning Objective: Learn the powers of 10
THE WORLDS OF DATABASE SYSTEMS
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
Information Technology
How to write numbers The 4 different ways to represent numbers:
Unit 3—Part A Computer Memory
The Wonderful World of Computers
Connected sources and available data
What is Information? What will we retrieve with information retrieval?
6 October 2016 Irmingard Eder Data Scientist, Munich Re
Unit 3—Part A Computer Memory
short term and long term speed, capacity, compression formats, access
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Physics and Mechanics Physics deals with the nature and properties of matter and energy. Common language is mathematics. Physics is based on experimental.
8/31 & 9/1 - 7th Grade Agenda Learning Objective: Learn about Metric Prefix Collect HW: Metrics Worksheet #1(5 Points) Video: Powers of 10 Metrics Lab.
9/5 & 9/6 - 7th Grade Agenda Collect HW: Signed Welcome Letter
9/12 - 7th Grade Agenda Learning Objective: Learn the powers of 10
Units and Significant Digits
Introduction to Chemical Principles
Section 13.1 – Secondary storage management (Former Student’s Note)
Jim Gray Microsoft Research
8/28 & 8/ th Grade Agenda Learning Objective: Learn the powers of 10
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Week 6 – Computer Hardware Basics
8/28 & 8/ th Grade Agenda Learning Objective: Learn the powers of 10
9/2 - 8th Grade Agenda Learning Objective: Learn about Scientific Inquiry Collect HW: Metrics Worksheet #2 Video: Power of 10 HW: Metrics Worksheet #3.
Presentation transcript:

CS 765 1. The Age of Infinite Storage Section 1 # 1

1. The Age of Infinite Storage has begun Many of us have enough money in our pockets right now to buy all the storage we will be able to fill for the next 5 years. So having the storage capacity is no longer a problem. Managing it is a problem (especially when the volume gets large). How much data is there? Section 1 # 2

Googi 10100 . . . Yotta 1024 Zetta 1021 Exa 1018 Peta 1015 Tera 1012 Giga 109 Mega 106 Kilo 103 Tera Bytes (TBs) are Here 1 TB costs < 1k$ to buy 1 TB may cost ~ 300k$/year to own Management and curation are the expensive part Searching 1 TB takes hours I’m Terrified by TeraBytes I’m Petrified by PetaBytes We are here I’m completely Exafied by ExaBytes I’m too old to ever be Zettafied by ZettaBytes, but you may be in your lifetime. You may be Yottafied by YottaBytes. You may not be Googified by GoogiBytes, but the next generation may be? Section 1 # 3

How much information is there? Yotta Zetta Exa Peta Tera Giga Mega Kilo How much information is there? Soon everything can be recorded and indexed. Most of it will never be seen by humans. Data summarization, trend detection, anomaly detection, data mining, are key technologies Everything! Recorded All Books MultiMedia All books (words) .Movie A Photo A Book 10-24 Yocto, 10-21 zepto, 10-18 atto, 10-15 femto, 10-12 pico, 10-9 nano, 10-6 micro, 10-3 milli Section 1 # 4

First Disk, in 1956 IBM 305 RAMAC 4 MB 50 24” disks 1200 rpm (revolutions per minute) 100 milli-seconds (ms) access time 35k$/year to rent Included computer & accounting software (tubes not transistors) 7th Grade C.S. lab Tech. Section 1 # 5

10 years later 30 MB 1.6 meters Section 1 # 6

Disk Evolution Kilo Mega Giga Tera Peta Exa Zetta Yotta Section 1 # 8

Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely” Section 1 # 9

Can you fill a terabyte in a year? Item Items/TB Items/day a 300 KB JPEG image 3 M 9,800 a 1 MB Document 1 M 2,900 a 1 hour, 256 kb/s MP3 audio file 9 K 26 a 1 hour 1 MPEG video 290 0.8 Bottom line: we will be able to keep LOTS of video, and vast amounts of smaller data types (audio, photos, documents). Note: probably not worth the time to delete an object Section 1 # 10

On a Personal Terabyte, How Will We Find Anything? Need Queries, Indexing, Data Mining, Scalability, Replication… If you don’t use a DBMS, you will implement one of your own! Need for Data Mining, Machine Learning is more important then ever! Of the digital data in existence today, 80% is personal/individual 20% is Corporate/Governmental DBMS Section 1 # 11

We’re awash with data! Network data: 100 terabytes ~ 1014 Bytes US EROS Data Center archives Earth Observing System (near Soiux Falls SD) Remotely Sensed satellite and aerial imagery data 15 petabytes ~ 1016 Bytes National Virtual Observatory (aggregated astronomical data) 10 exabytes ~ 1019 Bytes Sensor data from sensors (including Micro & Nano -sensor networks) 10 zettabytes ~ 1022 Bytes WWW (and other text collections) 10 yottabytes by 2020 ~ 1025 Bytes Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences) 10 gazillabytes by 2030 ~ 1028 Bytes? Stock Market prediction data (prices + all the above?) 10 supragazillabytes by 2040 ~ 1031 Bytes? Useful information must be teased out of these large volumes of raw data. AND these are some of the 1/5th of Corporate or Governmental data collections. The other 4/5ths of data sets are personnel! I made up these Name! Projected data sizes are overrunning our ability to name their orders of magnitude! Section 1 # 12

Parkinson’s Law (for data) Data expands to fill available storage Disk-storage version of Moore’s Law Available storage doubles every 9 months! How do we get the information we need from the massive volumes of data we will have? Querying (for the information we know is there) Data mining (for answers to questions we don't know to ask precisely Moore’s Law with respect to processor performance seems to be over (processor performance doubles every x months…). Note that the processors we find in our computers today are the same (or less powerful) as the ones we found a few years ago. That’s because that technology seems to have reached a limit (minaturizing). Now the directions is to put multiple processor on the same chip or die (e.g. Itel Nehalem has 16 or more) and to use other types of processor (such as General Purpose Graphics Processor, GP-GPUs) to increase performance. Main memory sizes are shoot up. What does that mean for database systems? Section 3 # 13

Thank you. Section 3 # 1