The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ.
Tera Bytes are Here 1 TB costs 1k$ to buy 1 TB costs 300k$/y to own Management & curation are expensive Searching 1TB takes hours I’m Terrified by TeraBytes I’m Petrified by PetaBytes Google Yotta Zetta Exa Peta Tera Giga 10 9 Mega 10 6 Kilo 10 3 We are here I’ll soon be Exafied byExaBytes I’m too old to ever be Zettafied by ZettaBytes But you may be in your lifetime You may even be Yottafied by YottaBytes You probably won’t ever be Googified by GoogiBytes But one should “never say never”.
How much information is there? Soon everything can be recorded and indexed. Most bytes will never be seen by humans. Data summarization, trend detection, anomaly detection, data mining, are key technologies Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movi e All books (words) All Books MultiMedia Everything ! Recorded A Photo Yocto, zepto, atto, femto, pico, nano, micro, milli
First Disk 1956 IBM 305 RAMAC 4 MB 50x24” disks 1200 rpm 100 ms access 35k$/y rent Included computer & accounting software (tubes not transistors) Me, at13.
10 years later 1.6 meters 30 MB
The Cost of Storage about 1K$/TB 12/1/1999 9/1/2000 9/1/2001 4/1/ /4/2003
E.g., A recent Purchase Order Company: NDSU Date:8/7/03 System Board:Intel D865 GBFL system board w/LAN 800mhz FSB Processor:Intel Pentium GHz Hard Drives:4 x 250 GB IDE (total = 1 TB) Controller:Onboard IDE Controller 2 nd IDE Controller: Video:Integrated Diskette Drive:1.44 MB Memory:4 GB 400 mhz memory CD/DVD Drive:DVD/CDRW Sound:Integrated AC97 Audio w/Soundmax Case:Performance Minitower ATX w/300 Watt PS Keyboard:Microsoft 104 Internet keyboard Mouse:Microsoft Intellimouse Optical Operating System:none Network Cards:Integrated Intel 10/100 Ethernet w/D845GEBV2L board Price:$2, Main expense is here
Disk Evolution Kilo Mega Giga Tera Peta Exa Zetta Yotta
Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely”
Trying to fill a terabyte in a year ItemItems/TBItems/day 300 KB JPEG3 M9,800 1 MB Doc1 M2,900 1 hour 256 kb/s MP3 audio 9 K26 1 hour 1.5 Mbp/s MPEG video
The Personal Terabyte How Will We Find Anything? Need Queries, Indexing, Data Mining, Pivoting, Scalability, Backup, Replication, Online update, Set-oriented access. If you don’t use a DBMS, you will implement one! Need Data Mining, Machine Learning! 80% of data is personal/individual 20% is Corporate, Governmental SQL ++ DBMS