MyLifeBits Jim Gemmell February, 2005
Conclusion We have entered an era of virtually unlimited storage, enabling the lifetime store We have entered an era of virtually unlimited storage, enabling the lifetime store To make the store useful we need annotation, typed links, and database features To make the store useful we need annotation, typed links, and database features More capture, more correlation – less work by the user More capture, more correlation – less work by the user
Collaborators Chief inspiration & guinea pig: Gordon Bell Chief inspiration & guinea pig: Gordon Bell Software development lead: Roger Lueder Software development lead: Roger Lueder MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers MSR Collaborators: Lyndsay Williams, Ken Wood, Kentaro Toyama, Ron Logan, Steve Drucker, Curtis Wong, Mary Czerwinski, Brian Meyers Interns: Josh Blumenstock, Evan Salomon, Aleks Aris Interns: Josh Blumenstock, Evan Salomon, Aleks Aris
Outline What is MyLifeBits What is MyLifeBits History/Motivation History/Motivation MyLifeBits system outline MyLifeBits system outline Demo Demo Future work Future work
MyLifeBits is: An experiment in lifetime storage An experiment in lifetime storage Digitizing Gordon Bell’s past Digitizing Gordon Bell’s past Capturing more of his future Capturing more of his future A software system A software system Capture Capture Storage & retrieval Storage & retrieval Organization & annotation Organization & annotation Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision Minimum requirement: fulfill Vannevar Bush’s 1945 “Memex” vision
Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks Full-text search, text & audio annotations, and hyperlinks
I am data
The guinea pig Has now scanned virtually all: Has now scanned virtually all: Books written (and read when possible) Books written (and read when possible) Personal documents (correspondence including memos and , bills, legal documents, papers written, …) Personal documents (correspondence including memos and , bills, legal documents, papers written, …) Photos Photos Posters, paintings, photo of things (artifacts, …medals, plaques) Posters, paintings, photo of things (artifacts, …medals, plaques) Home movies and videos Home movies and videos CD collection CD collection And, of course, all PC files And, of course, all PC files Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come Paperless throughout ” scanned, 12’ discarded. Paperless throughout ” scanned, 12’ discarded. Only 44 GB, incl. 10 wma, 14 SQL!!! Video: o(100) mov Only 44 GB, incl. 10 wma, 14 SQL!!! Video: o(100) mov
The 1 TB Life 1TB gives you 65+ years of: 1TB gives you 65+ years of: 100 messages a day (5KB each) 100 messages a day (5KB each) 100 web pages day (50KB each) 100 web pages day (50KB each) 5 scanned pages a day (100KB each) 5 scanned pages a day (100KB each) 1 book every 10 days (1 MB each) 1 book every 10 days (1 MB each) 10 photos per day (400 KB JPEG each) 10 photos per day (400 KB JPEG each) 8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) 8 hours per day of sound - e.g. telephone, voice annotations, and meeting recordings (8 Kb/s) 1 new music CD every 10 days (45 min each at 128 Kb/s) 1 new music CD every 10 days (45 min each at 128 Kb/s) It will take you 5 years to fill up your 80 GB drive It will take you 5 years to fill up your 80 GB drive Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video) Want video? Buy more cheap drives (1 TB/year lets you record 4 hours/day of 1.5 Mb/s video)
Trying to fill a terabyte in a year Gordon’s lifetime collection < 30 GB (12 GB is music CDs) Gordon’s lifetime collection < 30 GB (12 GB is music CDs) Item Per TB Per day Photo (400 KB JPEG) 2.7M photos 7.3K photos 1 MB document 1.0M docs 2.9K docs 128 kb/s audio 18.6K hours 51 hours 256 kb/s video 9.3K hours 26 hours 1.5 Mb/s video 1.6K hours 4 hours
“yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely” -Vannevar Bush, 1945
So you’ve got it – now what do you do with it? Can you find anything? Can you find anything? Can you organize that many objects? Can you organize that many objects? Once you find it will you know what it is? Once you find it will you know what it is? Once you’ve found it once, could you find it again? Once you’ve found it once, could you find it again?
“A record if it is to be useful … must be continuously extended, it must be stored, and above all it must be consulted” “The difficulty seems to be, not so much that we publish unduly … but rather that publication has been extended far beyond our present ability to make real use of the record” - Vannevar Bush
MyLifeBits Software MyLifeBits store database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy client Outlookinterface files Legacy applications VIBElogging
Entities & Links Annotates Caller in Phone Call Photo of Event Transcludes
MyLifeBits Schema (simplified) Images Music Phone calls Resources Relation- ships Relation- ship types Entity types Resource entities Event types Event log Events Tasks People Notes Messages Saved searches
DEMO
Future work: new capture modes/devices SenseCam Deja View Body Media Quindi
Future work: Visualizations Don't give me a little card image and say, "That's all you've got, because that's what I thought you should want for your virtual shoebox." There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. Don't give me a little card image and say, "That's all you've got, because that's what I thought you should want for your virtual shoebox." There have got to be multiple modalities and the designers have to be able to deal with that. … don't metaphor me in, don't give me only one way of looking at things. -Andy van Dam, Hypertext '87 Keynote Address Next Media Web Scout U. Maryland IN-SPIRE
Future work: UI UI Improvements UI Improvements User studies User studies
Future work: Content analysis & Data Mining “Creative thought and essentially repetitive thought are very different things. For the latter there are, and may be, powerful mechanical aids” – Vannevar Bush Is MyLifeBits just enough rope to hang yourself with? Is MyLifeBits just enough rope to hang yourself with? MyLifeBits must become MyPersonalAssistant MyLifeBits must become MyPersonalAssistant Content analysis and data mining Content analysis and data mining Doc similarity & “clean living” Doc similarity & “clean living” Document meta-data extraction Document meta-data extraction
Future work: scaling Just starting to hit performance problems Just starting to hit performance problems Stress tests & design modifications Stress tests & design modifications
BONUS SLIDES
Everything goes in a database You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) You need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, replication) If you don’t use one, you will find yourself creating one! If you don’t use one, you will find yourself creating one! Files as blobs, also sync with file system for legacy apps Files as blobs, also sync with file system for legacy apps SQL
CARPE ’04 The First ACM Workshop on Continuous Archival & Retrieval of Personal Experiences October 15 th 2004 Columbia University, New York, NY, USA
Dear Appy, How committed are you? Signed, Lost and Forgotten Data Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, these apps should be responsible for eternal support. But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? If things continue on their current path, it seems I will be completely un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric... By Gordon Bell
A Storocratic Oath 1. Do no harm to dates (File creation, Photo taken) 2. Do no harm to device created & other meta-data. Camera data & location data are sacred. Camera data & location data are sacred. 3. Support & aid the creation of critical meta- data. When/how the user feels like it When/how the user feels like it Auto-magically! Auto-magically! 4. Maintain user confidentiality
Classification wish list Download classifications rather than build them Download classifications rather than build them Definitions & synonyms should help find what I want Definitions & synonyms should help find what I want Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! Today it is too expensive to manually classify my scanned paper. E.g. “right time” meta-data is critical! Next year I hope “the system” can classify papers and other documents e.g. bills Next year I hope “the system” can classify papers and other documents e.g. bills In 10 years I expect all documents to appear electronically & classified with a little help from me In 10 years I expect all documents to appear electronically & classified with a little help from me
Personal Search is not Professional or Web search System sees every entry & access System sees every entry & access Everything, not just a professional life Everything, not just a professional life Limited to SIS, not an infinite amount, covers a profession & personal life Limited to SIS, not an infinite amount, covers a profession & personal life Web as seen by search engines MyLifeBits Knowledge breadth e.g. Dewey classification Depth e.g. information item types & coverage Professional user