Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)

Slides:



Advertisements
Similar presentations
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Advertisements

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
U-DBMS: A Database System for Managing Constantly-Evolving Data (VLDB 2005) Reynold Cheng Hong Kong Polytechnic University.
Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Attribute databases. GIS Definition Diagram Output Query Results.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
Crowd-Augmented Social Aware Search Soudip Roy Chowdhury & Bogdan Cautis.
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Secure Cloud Database using Multiparty Computation.
Database System Concepts and Architecture
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Chapter No 4 Query optimization and Data Integrity & Security.
INFORMATION MANAGEMENT Unit 2 SO 4 Explain the advantages of using a database approach compared to using traditional file processing; Advantages including.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
ITGS Databases.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.
July 14 th SAM 2008 Las Vegas, NV An Ad Hoc Trust Inference Model for Flexible and Controlled Information Sharing Danfeng (Daphne) Yao Rutgers University,
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Fall CSE330/CIS550: Introduction to Database Management Systems Prof. Susan Davidson Office: 278 Moore Office hours: TTh
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Secure Data Outsourcing
Mining of Massive Datasets Edited based on Leskovec’s from
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Data Engineering Research Group 4 faculty members David Cheung Ben Kao Nikos Mamoulis Reynold Cheng About 15 research students (12 PhD, 3 MPhil)
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Analytics and OR DP- summary.
CS & CS Probabilistic Data Management
Probabilistic Data Management
Probabilistic Data Management
CS & CS ST: Probabilistic Data Management
Uncertain Data Mobile Group 报告人:郝兴.
Data Engineering Research Group
CSCE 4143 Section 001: Data Mining Spring 2019.
Ensuring Correctness over Untrusted Private Database
Presentation transcript:

Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)

Success Stories Papers in the past 5 years - 58 in top DB and DM conferences (9 SIGMOD, 15 VLDB, 17 ICDE, 3 EDBT, 4 CIKM, 3 SIGKDD, 6 ICDM) - 28 in top DB and DM journals (5 TODS, 7 VLDBJ, 15 TKDE, 1 TKDD) PhD alumni with faculty positions (Rutgers, HKPolyU, Mexico State U, Aalborg U, Macau U, Renmin U)

Reynold Cheng Background HKU (BSc, MPhil 95-00), Purdue (PhD, 00-05), HKPolyU (Asst. Prof, 05-08) Research Database management, uncertainty management, data mining, spatial databases

Uncertainty Management Data Uncertainty Reynold Cheng 4 sensor network GPS Data is often imprecise and erroneous Handle data uncertainty, or service quality can be degraded!

Uncertainty Management Reynold Cheng 5 The ORION Database Treat data uncertainty as a first-class citizen A probabilistic query provides answers with probabilities (e.g., Mary has a 80% chance to be in HKU)

Uncertainty Management Reynold Cheng 6 Create a table with UNCERTAIN type CREATE table T( k INTEGER primary key, a UNCERTAIN); Insert Gaussian pdf (μ,σ) Insert into T values (2,‘(g, μ, σ )’); Display uncertain info. of a if a > 5 SELECT a FROM T where a > 5; Equality join of uncertain attributes (=% returns probability of equality) SELECT R.k, S.k, R.a =% S.a FROM R,S WHERE R.a = S.a; Entities with prob. giving min value of a (e.g., {(3,0.5), (5,0.3), (11,0.2)}) SELECT Emin(T.a) from T; Min value of a for table T (UNCERTAIN) SELECT Vmin(T.a) from T; Queries in ORION ka 1U[5,10] 2G(2, 0.1)

David Cheung Background CUHK (BSc), Simon Fraser (MSc, PhD 83-88) Research Security and authentication in outsourced databases; data interoperability theory; queries on community networks

Outsourcing Data Mining Tasks Frequent itemset mining DB Frequent itemsets Data Owner Data Miner (service provider) Outsourcing DB

Integrity concern Is the result correct? DB Frequent itemsets Data Owner Data Miner (service provider) Outsourcing DB Scenario 1: Honest but careless service provider Example: incorrect implementation of mining algorithm, mistakes in settings Scenario 2: Lazy service provider Example: just execute on a sampled database to save cost Scenario 3: Malicious service provider Example: paid by a competitor of the data owner to return a wrong result; or provider/network falls victim of a malicious attack

others Solution: artificial itemset planting Frequent Itemsets L Data Owner Data Miner Outsourcing DB Audit DB DB’ L’ L’ is frequent itemsets in DB’ 1 Generate an artificial database DB’ so that the frequent itemsets L’ in DB’ are controlled and known to the data owner 2 Service provider works on combined database 3 Verify L’

Ben Kao Background HKU (BSc 86-89), Princeton-Stanford (PhD 89-95) Research Database Systems, Information Retrieval, Data Mining

Finding Key Moments in Social Networks “Distance” between two Facebook users over a 1-year period. (They are disconnected before Day 178 and finally became friends on Day 365.) the users are disconnected finally friends

How did they (u and v) become friends? To understand how friendships are established, we need to study the events that happened at certain “key moments”. For example, what happened (Events (a), (b) or (c) above) that led to the shortening of two users’ distance from each other? But first, we need to discover those “key moments” so we know which “snapshots” of the Facebook graph we should look at.

Evolving Graph Sequence (EGS) Processing We model the dynamics of a social network as a (big) sequence of (big) evolving graph snapshots. We study efficient graph algorithms for identifying key moments (snapshots at which sharp changes in certain key measures are observed). Such key moments help social network analysts investigate the various properties of gigantic social networks.

Nikos Mamoulis Background UPatras (BSc-MSc 90-95), HKUST (PhD 97-00), CWI (00-01) Research Spatial Databases, Managing and Mining Complex Data Types, Privacy and Security, Information Retrieval.

Snippets of Data Subjects in Databases Web results of “Faloutsos” DBLP database

Data Subject Schema Graph...based on database schema

Object Summary of a Given Entity...based on DS Schema Graph and actual data

Software Engineering Group Prof. T.H. Tse (PhD LSE) Research Software Engineering: program testing, debugging, and analysis with application on object-oriented software, concurrent systems, pervasive computing, service-oriented applications, graphic applications, and numerical programs.

Rey nold Che ng 20 Architecture of ORION PostgreSQL 8.0

Solution 2: verification of checksums DB Returned frequent itemsets L’ Data Owner Outsourcing Data Miner (service provider) After we get the mining result, we scan DB and collect info for verification ? DB

Size-l Object Summary of a Given Entity Problem: Object Summary (OS) could be too large and overwhelming Solution: Keep only the l most important tuples (or attribute-value pairs) Optimization problem: Select an l-sized subtree of the OS with maximum cumulative importances Methods: -Dynamic programming: exact but expensive -Greedy heuristics: fast and near-optimal Methods: -Dynamic programming: exact but expensive -Greedy heuristics: fast and near-optimal