A Personal Database for Everything Inspired by Memex Gordon Bell, Jim Gemmell, Roger Lueder Original slides:
Outline How has the project evolved? How do we use MyLifeBits? How is it built? How large is the database? What is the vision? What is left and how can you help?
I am data
Ambience and Presence: Being there while being here Dining at home on the “Orient Express”
History: The remote worker re-discovers the PERSONAL computer
Oct 1998 Can we scan your books and put them online? Raj Reddy Sure! Don’t worry about copyright stuff. Microsoft has lots of lawyers
1999 – Scanning starts in earnest “we” start to scan
My docs and archive Self.. Biographical X- Employer Employer X-Employer Project Employer Library/file cab Active Employer Library/file cab <1980s Library/file cab Library/file cab Project Business Invests, family $s, & Legal Personal, including Medical Library/file cab
Jim, I don’t need no stinkin’ database! Gordon, You should be using a database.
Now that it’s in Cyberspace How do you remember the 20,000+ file names? Or in which of 1500 folders they live? What’s about a tool for finding stuff?
Jan 2001 CACM “A Personal Digital Store” 16 GB; +2/yr A good place to stop Began search for search engines, especially for . Jim suggests that we build a system that would be easier to use and have many more capabilities.
Re-discovery of Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” Full-text search, text & audio annotations, and hyperlinks
2001 Capture goes beyond paper
Even more capture Telephone calls, more video, all web pages visited, usage logging, radio, TV…
SenseCam
Steve Mann timeline
“I sensed” Clarkson and Pentland MIT 2001 Visually impaired UW 2004
MyLifeBits Software
Everything goes in a database MyLIfeBits need all the features of a database (Consistency, Indexing, Pivoting, Queries, Speed/scalability, Backup, Replication) If we didn’t use one, we’ll eventually create one! Files as blobs; sync with file system for legacy apps We are part of Jim Gray’s Bay Area Research Lab SQL
MyLifeBits Software MyLifeBits store database Voice annotation tool Telephone capture tool TV capture tool TV EPG download tool Radio capture & EPG PocketPC transfer tool PocketRadio player Import files MyLifeBits Shell Browser tool Internet IM capture GPS import & Map display SenseCam Screen saver Text annotation tool MAPI interface Legacy client Outlookinterface files Legacy applications VIBElogging RoomCapture
MyLifeBits Schema (simplified) Images Music Phone calls Items Links Link types Entity types Resource entities Event types Event log Events Tasks People Notes Saved searches SenseCamData GPS data Window, key, mouse log Web pages
Demo Clips & Screens
747 Screen…
Vue de jour
Reports
Add item to collection(s)
Refine shell
Refine shell2
Pivoting: contact> call> t> web page
Refine by classification--dentist
GPS Photo location
SenseCam
Timeline
Google??
The Shape & Size of Gordon’s LifeBits
MyLifeBits 3/26/ K items 101 GB by number of Items.
MyLifeBits 3/26/ GB 206 K items By Size (GB) Bell Growth: 1GB/month =1 TB/lifetime Size (MB) by Type
of (incl. attachments)
YearMpixManufacturer Ricoh 19991Kodak 20012Canon 20023Sony 20034Sony 20055Panasonic YearMpixManufacturer Ricoh 19991Kodak 20012Canon 20023Sony 20034Sony 20055Panasonic 15,000 photos
Monthly & Lifetime Storage Use ItemDaily numberTotal* MB|GB Month|Life 1 MB Books|reports0.13 5KB s KB Image scans MB Photos KB Web pages|docs MB Music KB/s Listened audio, speech40,0001, KB Daily photos1,0001,250 2 GB/hr TV4200,000
Observations about use(rs) 1.On Apps: – Search is the “killer app” pretty much as Bush described. – Screen savers “memory refreshers” also provide ambience – Where did my day to? 2.Users are unwilling to spend time managing their computers or data. – User-input meta-data e.g. Dublin Core – naïve’ Librarian’s dream. – Meta-data, classification, etc. must be automatic – Great scheme for classification using facets. It requires work. 3.Time is the most important meta-data. Photos: place (GPS), subject. 4.Folders are a good and bad idea. – Most users don’t know what they are or how they work – If used, over time, they become useless: too many, miss-file, etc. 5.User should put “every” information fragment into the system. e.g., to dos, call backs, business cards numbers, attention events. It pays. 6.Same information in multiple places always becomes obsolete.
Evolution: Silo Apps on isolated DB islands vs. Cut & Paste across apps Contacts: , instant messages, phone, correspondence Family and organizational relationships Location of people, organizations, etc. Photo database: who, where, when, what… Money payees, phone, etc. Health providers and caregivers User written apps in excel or access
Common ground with WinFS: Items, Links & Meta-data Annotates Caller in Phone Call Photo of Event
PhotoFinder - Shneiderman and Kang
Challenges
The “dear appy” problem Dear Appy, How committed are you? Please come back to me. Forever yours truly, Lost and forgotten data Who’s responsible? – Media or 8 track cassette, 8” floppy – Evolving platform, file, and database – Evolving, incompatible standards & formats for legacy data that disregard ancestors – Evolving and/or disappearing apps
Automatic classification problem XML on bills and imported content… transactions We need to download classifications rather than build them – Definitions & synonyms should help find what I want Today it is too expensive to manually classify scanned paper. E.g. “right time” meta-data is critical! We hope “the system” can classify papers and other documents e.g. bills. Ideally, build Dublin Core In 10 years we need all documents to appear electronically & classified with a little help from me
More challenges Dear Appy: Monitoring and automatic migration of files that are unlikely to be understood on future platforms as well as platform migration. Get What I Need: Endless, but evolutionary improvements in search: misspellings, stemming synonyms Endless frontier of schema and extensions to them for new applications e.g. making org charts, family relationships. Capture, Archival and Retrieval of Personal Experiences (CARPE)… a whole new game! Versioning is essential Scaling.. We don’t know what happens at a Terabyte What can, should be, or will be in the cloud? Books… videos Will we be allowed to use such systems? Copyright laws vary: E.g. ripping CDs, copy of anything, photos, conversations
Challenges Data-types – Quantity expanding i.e. info explosion – New capabilities e.g. real time create new data-types – Meta-data to increase value & provide pivots Going beyond a PC to a distributed environment – Network environment, including media center – Into the cloud – Periphery… smart buildings, objects, – Backup, migration, and caching for beyond a Terabyte – Expanding network: PC > LANs > web > P2P Schema sharing among disparate systems CARPE (real time data capture) – Rooms, phone calls, SenseCam, Health transducers, etc. Security, privacy, forgetfulness, deniability, etc.