Research on Personal Dataspace Management Yukun Li liyukun@ruc.edu.cn Renmin University of China
Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions
Introduction Information explosion Information islands In 1945, Vannevar Bush predicted Personal Information Managemant Will become a serious problem. Today it comes into being… Information explosion Information islands
Introduction (Example) Where is it? My God, I forgot it! Distributed Storage Information island 4
Outline Introduction Related work CoreSpace based Framework for PDS OrientSpace: A prototype system Ongoing work Conclusions
Related work Concepts [PIM workshop2005 report] Personal dataspace - From databases to dataspaces. [Franklin M, etc SIGMOD Record, 2005] - Principles of dataspace systems [Halevy A ,etc. In PODS2006] - Data model: iDM [Dittrich J-P and Salles MAV…,VLDB 2006] Systems of personal data management - iMemex[L. Blunschi, J.-P. Etc . In CIDR, 2007] - Semex[X. Dong and A. Halevy. In CIDR 2005] - Others Systems for special data source management - Email data management - Desktop Search Engine
Related work The performance of personal data operation is still slow. The characters of personal dataspace are not modeled well. Components: Owner entity, Data Set, Service Attributes of Personal Dataspace Correlation, Controllable Characters: Versatile data sources From data to schema Pay-as-you-go Others The characters of user may be the key factor to improve the performance of data operation.
Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions
Research work User-centered framework for PDS CoreSpace of personal dataspace CoreSpace Query Strategy 9
Research Work A User-Centered Framework for PDS The characters of user may be the key factor to improve the performance of data operation.
Research Work Observation The personal data is always distributed, rough-and-tumble, personalized, heterogenous and evolutionary. But, are there some rules or patterns in the PDS? If the answer is yes, What are them? Observations: -Importance of objects are always different. -Importance of a certain object is dynamic. -People tend to visit a small data set in a period.
Research Work CoreSpace Two concepts : Object Weight (OW) Personal CoreSpace (PCS) Object Weight: To describe relation between the object and the owner, it can be defined as possibility that the object will be accessed in the future. Personal CoreSpace: It consists of the objects which OW is bigger than a given threshold. On the opposite, the full space of a person is made up of all objects with relation to the owner.
Research Work Preliminary experience Real personal data of three months Visited object number vs. Totle object number VisiteTime based object number
Research work ObjectWeight Computing(1) The features which will affect OW as below: - FileType - FileModifyTime - FileAccessFrequency - FileOwner - Personal Task - Association Between objects
Research Work ObjectWeight Computing(2) VF : Visit frequency It is described with visit times in a day S: an attenuation factor.
Research work More advantages of the concepts Data integration (ObjectWeight > 0) Data query (Scanning CoreSpace is enough in most cases) Data Indexing (Different strategies for Indexing CoreSpace and FullSpace ) Data Backup (Corespace-based backup strategy)
Research work CoreSpace-based Query Strategy Query Interface{ [attribute\\[keyword]*]*, K } f.g. “Title\\integration, uncertain" . It means "Please tell me the objects whose title contain the words Integration and and uncertain".
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
OrientSpace Functions Integration - Manual integration - Automatic integration Query - Extend Keyword Query - Results-based Navigation - CoreSpace explorer
OrientSpace Data Storage(vertical model) Oid Attribute Value A1 Name Mike A2 Jone P1 Class paper Title ‘Index Database’ Author P2 ‘Data stream…’ reference P3 ‘Mining …’ class E1 Email attachment Advantages: An universal model to describe any object. Question: A great number of join operation lead to low performance.
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
Ongoing work ObjectWeight Computing - Computing Model of OW - Data set ObjectWeight based Data Operation Strategy - Integration, Backup, Query, Consistency, etc. OrientSpace Systems
Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions
Conclusions Propose a new concept CoreSpace for PDS. It will result in many research issues including index, integration, storage, backup, query and so forth. The following topics will be focused on in my PhD project User-centered data model (CoreSpace) CoreSpace-based Data Operation(Query) Implement a prototype system
Thanks, Questions ?
A Framework for Integration of PDS
Main Interface of OrientSpace
Wrapper-based Integration
From Data to Schema Integration
Personal CoreSpace Explorer