Download presentation
Presentation is loading. Please wait.
Published byGabriella Higgins Modified over 9 years ago
1
Long tails and Archive systems Elliot Jaffe FDIS 2005
2
Archive Metrics What –Distribution of file sizes –Distribution of occupied storage –How are files accessed Why –System architecture –Scaling for access
3
File size studies UFS93 (1993) 12 million files UNIX only Avg. file size is 2k 90% of storage in 11% of files HUJI (2005) 4 million files UNIX + Windows Avg. file size is 8k 90% of storage in 5.5% of files
4
What’s Changed Then JAWS, NOW Online was expensive Offline tape storage Now Central File Servers Digital Libraries Online is cheap No offline storage XML Multimedia
5
Empirical Data
6
Questions What is the future of these distributions? Are the changes extensions of the tails with power laws, so that 10/90 and 20/80 rules no longer work and are the wrong way to think about them? Are the changes based on external factors that are unpredictable?
7
The Long Tail Chris Anderson (2004) –http://www.wired.com/wired/archive/12.10/tail.html The long tail of a distribution has tremendous mass and creates new market opportunities Amazon, Netflix, Wikipedia
8
Today’s landscape NOW File Servers Sarbanes Oxley Digital Libraries Storage Capacity Access Frequency
9
Next Steps Collecting data from large storage systems –File Sizes, Created, Last Modified, Last Access, Frequency of Reads Goal: New architectures for Digital libraries –Focus on Operations –Store large and small files differently –Store very-low access files in slow access
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.