Research on Personal Dataspace Management

Slides:



Advertisements
Similar presentations
Efficient Top-k Search across Heterogeneous XML Data Sources Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Rui Zhou 1 1 Swinburne University of Technology.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Chapter 5: Introduction to Information Retrieval
Introduction to Databases
Search Engines and Information Retrieval
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Adaptive Query Processing for Data Aggregation: Mining, Using and Maintaining Source Statistics M.S Thesis Defense by Jianchun Fan Committee Members: Dr.
ADVISE: Advanced Digital Video Information Segmentation Engine
Adaptive Book: A Platform for teaching, learning and student modeling Ananda Gunawardena School of Computer Science Carnegie Mellon University.
File Systems and Databases
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom.
Databases Chapter 11.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Mrs. Maninder Kaur 1Maninder Kaur
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Exploring Personal CoreSpace For DataSpace Management Li Yukun and Xiaofeng Meng WAMDM Lab Renmin University of China.
Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005.
Database Design - Lecture 1
Search Engines and Information Retrieval Chapter 1.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Search and Navigation Based on the paper, “Improved Search Engines and Navigation Preference in Personal Information Management” Ofer Bergman, Ruth Beyth-Marom,
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng.
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Distinguishing humans from robots in web search logs preliminary results using query rates and intervals Omer Duskin Dror G. Feitelson School of Computer.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
March 8, 2007 From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System Jens Dittrich Lukas Blunschi.
Advanced Database Aggregation Query Processing
CLASS INHERITANCE TREE (CIT)
Database Management:.
Place Identification in Location Based Urban VANETs
Online Frequent Episode Mining
Architecture Components
Chapter 6 Database Design
Data Mining: Concepts and Techniques Course Outline
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
A. P. Shah Institute of Technology
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
File Systems and Databases
Robotic Search Engines for the Physical World
Data Mining Chapter 6 Search Engines
MCN: A New Semantics Towards Effective XML Keyword Search
Magnet & /facet Zheng Liang
Introduction to Information Retrieval
Database Design Hacettepe University
Apply Work Flow Technologies to VO --- A Draft
Academic & More Group 4 谢知晖 王逸雄 郭嘉宋 程若愚.
Actively Learning Ontology Matching via User Interaction
Course Instructor: Supriya Gupta Asstt. Prof
Expert Knowledge Based Systems
Fading Schemas… Alon Y. Halevy.
Dataspace: a new concept of data management
CSE 444 Database Management Systems Autumn 1997 University of Washington Introduction and Welcome © 1997 UW CSE 12/12/2019.
Presentation transcript:

Research on Personal Dataspace Management Yukun Li liyukun@ruc.edu.cn Renmin University of China

Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions

Introduction Information explosion Information islands In 1945, Vannevar Bush predicted Personal Information Managemant Will become a serious problem. Today it comes into being… Information explosion Information islands

Introduction (Example) Where is it? My God, I forgot it! Distributed Storage Information island 4

Outline Introduction Related work CoreSpace based Framework for PDS OrientSpace: A prototype system Ongoing work Conclusions

Related work Concepts [PIM workshop2005 report] Personal dataspace - From databases to dataspaces. [Franklin M, etc SIGMOD Record, 2005] - Principles of dataspace systems [Halevy A ,etc. In PODS2006] - Data model: iDM [Dittrich J-P and Salles MAV…,VLDB 2006] Systems of personal data management - iMemex[L. Blunschi, J.-P. Etc . In CIDR, 2007] - Semex[X. Dong and A. Halevy. In CIDR 2005] - Others Systems for special data source management - Email data management - Desktop Search Engine

Related work The performance of personal data operation is still slow. The characters of personal dataspace are not modeled well. Components: Owner entity, Data Set, Service Attributes of Personal Dataspace Correlation, Controllable Characters: Versatile data sources From data to schema Pay-as-you-go Others The characters of user may be the key factor to improve the performance of data operation.

Outline Introduction Related work Research work OrientSpace: A prototype system Ongoing work Conclusions

Research work User-centered framework for PDS CoreSpace of personal dataspace CoreSpace Query Strategy 9

Research Work A User-Centered Framework for PDS The characters of user may be the key factor to improve the performance of data operation.

Research Work Observation The personal data is always distributed, rough-and-tumble, personalized, heterogenous and evolutionary. But, are there some rules or patterns in the PDS? If the answer is yes, What are them? Observations: -Importance of objects are always different. -Importance of a certain object is dynamic. -People tend to visit a small data set in a period.

Research Work CoreSpace Two concepts : Object Weight (OW) Personal CoreSpace (PCS) Object Weight: To describe relation between the object and the owner, it can be defined as possibility that the object will be accessed in the future. Personal CoreSpace: It consists of the objects which OW is bigger than a given threshold. On the opposite, the full space of a person is made up of all objects with relation to the owner.

Research Work Preliminary experience Real personal data of three months Visited object number vs. Totle object number VisiteTime based object number

Research work ObjectWeight Computing(1) The features which will affect OW as below: - FileType - FileModifyTime - FileAccessFrequency - FileOwner - Personal Task - Association Between objects

Research Work ObjectWeight Computing(2) VF : Visit frequency It is described with visit times in a day S: an attenuation factor.

Research work More advantages of the concepts Data integration (ObjectWeight > 0) Data query (Scanning CoreSpace is enough in most cases) Data Indexing (Different strategies for Indexing CoreSpace and FullSpace ) Data Backup (Corespace-based backup strategy)

Research work CoreSpace-based Query Strategy Query Interface{ [attribute\\[keyword]*]*, K } f.g. “Title\\integration, uncertain" . It means "Please tell me the objects whose title contain the words Integration and and uncertain".

Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions

OrientSpace Functions Integration - Manual integration - Automatic integration Query - Extend Keyword Query - Results-based Navigation - CoreSpace explorer

OrientSpace Data Storage(vertical model) Oid Attribute Value A1 Name Mike A2 Jone P1 Class paper Title ‘Index Database’ Author P2 ‘Data stream…’ reference P3 ‘Mining …’ class E1 Email attachment Advantages: An universal model to describe any object. Question: A great number of join operation lead to low performance.

Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions

Ongoing work ObjectWeight Computing - Computing Model of OW - Data set ObjectWeight based Data Operation Strategy - Integration, Backup, Query, Consistency, etc. OrientSpace Systems

Outline Introduction Related work CoreSpace based Framework for PDSMS OrientSpace: A prototype system Ongoing work Conclusions

Conclusions Propose a new concept CoreSpace for PDS. It will result in many research issues including index, integration, storage, backup, query and so forth. The following topics will be focused on in my PhD project User-centered data model (CoreSpace) CoreSpace-based Data Operation(Query) Implement a prototype system

Thanks, Questions ?

A Framework for Integration of PDS

Main Interface of OrientSpace

Wrapper-based Integration

From Data to Schema Integration

Personal CoreSpace Explorer