Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Similar presentations


Presentation on theme: "Section 1 # 1 CS 765 1. The Age of Infinite Storage."— Presentation transcript:

1

2 Section 1 # 1 CS 765 1. The Age of Infinite Storage

3 1. The Age of Infinite Storage has begun Many of us have enough money in our pockets right now to buy all the storage we will be able to fill for the next 5 years. So having the storage capacity is no longer a problem. Managing it is a problem (especially when the volume gets large). How much data is there? Section 1 # 2

4  Tera Bytes (TBs) are Here 1 TB costs  1k$ to buy 1 TB costs ~300k$/year to own  Management and curation are the expensive part  Searching 1 TB takes hours  I’m Terrified by TeraBytes  I’m Petrified by PetaBytes Googi 10 100... Yotta 10 24 Zetta 10 21 Exa 10 18 Peta 10 15 Tera 10 12 Giga 10 9 Mega 10 6 Kilo 10 3 We are here  I’m completely Exafied by ExaBytes  I’m too old to ever be Zettafied by ZettaBytes, but you may be in your lifetime.  You may be Yottafied by YottaBytes.  You may not be Googified by GoogiBytes, but the next generation may be? Section 1 # 3

5 How much information is there?  Soon everything can be recorded and indexed.  Most of it will never be seen by humans.  Data summarization, trend detection, anomaly detection, data mining, are key technologies Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movie All books (words) All Books MultiMedia Everything! Recorded A Photo 10 -24 Yocto, 10 -21 zepto, 10 -18 atto, 10 -15 femto, 10 -12 pico, 10 -9 nano, 10 -6 micro, 10 -3 milli Section 1 # 4

6 First Disk, in 1956  IBM 305 RAMAC  4 MB  50 24” disks  1200 rpm (revolutions per minute)  100 milli-seconds (ms) access time  35k$/year to rent  Included computer & accounting software (tubes not transistors) Section 1 # 5 7 th Grade C.S. lab Tech.

7 10 years later 1.6 meters 30 MB Section 1 # 6

8 In 2003, the Cost of Storage was about 1K$/TB. It’s gone steadily down since then. 12/1/1999 9/1/2000 9/1/2001 4/1/2002 11/4/2003 Section 1 # 7

9 Disk Evolution Kilo Mega Giga Tera Peta Exa Zetta Yotta Section 1 # 8

10 Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely” Section 1 # 9

11 Can you fill a terabyte in a year? ItemItems/TBItems/day a 300 KB JPEG image3 M9,800 a 1 MB Document1 M2,900 a 1 hour, 256 kb/s MP3 audio file 9 K26 a 1 hour 1 MPEG video2900.8 Section 1 # 10

12 On a Personal Terabyte, How Will We Find Anything?  Need Queries, Indexing, Data Mining, Scalability, Replication…  If you don’t use a DBMS, you will implement one of your own!  Need for Data Mining, Machine Learning is more important then ever! Of the digital data in existence today,  80% is personal/individual  20% is Corporate/Governmental DBMS Section 1 # 11

13 We’re awash with data! Network data: 10 terabytes by 2004 ~ 10 13 Bytes US EROS Data Center archives Earth Observing System (near Soiux Falls SD) Remotely Sensed satellite and aerial imagery data 15 petabytes by 2007 ~ 10 16 Bytes National Virtual Observatory (aggregated astronomical data) 10 exabytes by 2010 ~ 10 19 Bytes Sensor data from sensors (including Micro & Nano -sensor networks) 10 zettabytes by 2015 ~ 10 22 Bytes WWW (and other text collections) 10 yottabytes by 2020 ~ 10 25 Bytes Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences) 10 gazillabytes by 2030 ~ 10 28 Bytes? Stock Market prediction data (prices + all the above?) 10 supragazillabytes by 2040 ~ 10 31 Bytes? Useful information must be teased out of these large volumes of raw data. AND these are some of the 1/5 th of Corporate or Governmental data collections. The other 4/5 ths of data sets are personnel! I made up these Name! Projected data sizes are overrunning our ability to name their orders of magnitude! Section 1 # 12

14  Parkinson’s Law (for data) Data expands to fill available storage  Disk-storage version of Moore’s Law Available storage doubles every 9 months!  How do we get the information we need from the massive volumes of data we will have? Querying (for the information we know is there) Data mining (for the answers to questions we don't know to ask precisely). Section 3 # 13

15 Thank you. Section 3 # 1


Download ppt "Section 1 # 1 CS 765 1. The Age of Infinite Storage."

Similar presentations


Ads by Google