Data Engineering Research Group

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Advertisements

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
U-DBMS: A Database System for Managing Constantly-Evolving Data (VLDB 2005) Reynold Cheng Hong Kong Polytechnic University.
Data Engineering Research Group 4 faculty members Reynold Cheng David Cheung Ben Kao Nikos Mamoulis 20 research students (10 PhD, 10 MPhil)
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
ON LINK-BASED SIMILARITY JOIN A joint work with: Liwen Sun, Xiang Li, David Cheung (University of Hong Kong) Jiawei Han (University of Illinois Urbana.
Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.
Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.
An Overview of Our Course:
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
Search Engines and Information Retrieval Chapter 1.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Overview of CS Class Jiawei Han Department of Computer Science
ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.
Data Engineering Research Group 4 faculty members David Cheung Ben Kao Nikos Mamoulis Reynold Cheng About 15 research students (12 PhD, 3 MPhil)
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Introduction to IR Research
Probabilistic Data Management
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Ishan Sharma Abhishek Mittal Vivek Raj
Clustering Uncertain Taxi data
Jiawei Han Computer Science University of Illinois at Urbana-Champaign
Preference Query Evaluation Over Expensive Attributes
CS & CS Probabilistic Data Management
Probabilistic Data Management
Data Science Research in Big Data Era
机器感知与智能教育部重点实验室学术报告 Key Laboratory of Machine Perception (Minister of Education) Peking University Scalable, Robust and Integrative Algorithms for Analyzing.
Lecture 16: Probabilistic Databases
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Probabilistic Data Management
CS & CS ST: Probabilistic Data Management
Efficient Cache-Supported Path Planning on Roads
Publishing in Top Venues
Probabilistic Databases
Uncertain Data Mobile Group 报告人:郝兴.
Case Mining From Large Databases
CSCE 4143 Section 001: Data Mining Spring 2019.
Efficient Processing of Top-k Spatial Preference Queries
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Data Engineering Research Group 2 faculty members Reynold Cheng Ben Kao 20 research students (10 PhD, 10 MPhil)

Success Stories Papers in the past 5 years - 58 in top DB and DM conferences (9 SIGMOD, 15 VLDB, 17 ICDE, 3 EDBT, 4 CIKM, 3 SIGKDD, 6 ICDM) - 28 in top DB and DM journals (5 TODS, 7 VLDBJ, 15 TKDE, 1 TKDD) PhD alumni with faculty positions (Rutgers, HKPolyU, Mexico State U, Aalborg U, Macau U, Renmin U)

Reynold Cheng Background Research HKU (BSc, MPhil 95-00), Purdue (PhD, 00-05), HKPolyU (Asst. Prof, 05-08) Research Database management, uncertainty management, data mining, spatial databases

Data Uncertainty sensor network GPS Handle data uncertainty, or service quality can be degraded! Data is often imprecise and erroneous Reynold Cheng

The ORION Database Treat data uncertainty as a first-class citizen A probabilistic query provides answers with probabilities (e.g., Mary has a 80% chance to be in HKU) Different types of queries have different answers in probabilistic form, and evaluation techniques Reynold Cheng

Queries in ORION k a 1 U[5,10] 2 G(2, 0.1) Create a table with UNCERTAIN type CREATE table T( k INTEGER primary key, a UNCERTAIN); Insert Gaussian pdf (μ,σ) Insert into T values (2,‘(g,μ,σ)’); Display uncertain info. of a if a > 5 SELECT a FROM T where a > 5; Equality join of uncertain attributes (=% returns probability of equality) SELECT R.k, S.k, R.a =% S.a FROM R,S WHERE R.a = S.a; Entities with prob. giving min value of a (e.g., {(3,0.5), (5,0.3), (11,0.2)}) SELECT Emin(T.a) from T; Min value of a for table T (UNCERTAIN) SELECT Vmin(T.a) from T; Reynold Cheng

Ben Kao Background Research HKU (BSc 86-89), Princeton-Stanford (PhD 89-95) Research Database Systems, Information Retrieval, Data Mining

Finding Key Moments in Social Networks the users are disconnected finally friends A research topic is on social network analysis. For example, the figure shows how the distance (the number of hops on the facebook graph) between two real facebook users changed over a one-year period. The two users were not connected on the facebook graph before Day 178. Their distance changed at several “key moments” : Days 178, 186, 304, 365. “Distance” between two Facebook users over a 1-year period. (They are disconnected before Day 178 and finally became friends on Day 365.)

How did they (u and v) become friends? To understand how friendships are established, we need to study the events that happened at certain “key moments”. For example, what happened (Events (a), (b) or (c) above) that led to the shortening of two users’ distance from each other? But first, we need to discover those “key moments” so we know which “snapshots” of the Facebook graph we should look at.

Evolving Graph Sequence (EGS) Processing We model the dynamics of a social network as a (big) sequence of (big) evolving graph snapshots. We study efficient graph algorithms for identifying key moments (snapshots at which sharp changes in certain key measures are observed). Such key moments help social network analysts investigate the various properties of gigantic social networks.