Research of Database UNSW Some slides are taken from Wenjie Zhang.

Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang

Group Overview Research Field: core topics in DB, DM, IR, MM Group Size: 8 staff members; 20+ PhD students Research support: Consistent success in government research grant applications

Some recent research projects Xuemin Lin and Wenjie Zhang: Efficiently Processing Pattern-based Structure Queries over Large Graphs, ARC Discovery Grant (2015 - 2017 ), $397,500 Wenjie Zhang and Lei Chen, Continuous Loyalty-based Similarity Queries over Moving Objects, ARC Discovery Project (2015-2017), $266,300 Lijun Chang, Efficient Cohesive-Subgraph Search over Large Graphs, ARC Early Career Research Award (2015-2017), $372, 000 Xuemin Lin, Probablistic Search Over Large-Scale Uncertain Graphs, ARC Discovery Project(2014-2016), $413,000 Xuemin Lin and Wenjie Zhang, Ranking Complex Objects in a Multi- dimensional Space, ARC Discovery Project(2012-2014), $350,000 Wenjie Zhang, Continuously Monitoring Uncertain Objects in a Multi- dimensional Space, ARC Early Career Research Award (2012-2014), $375,000

What is research ? Research comprises "creative work undertaken on a systematic basis in order to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications”. ---- wikipedia

Research degrees & projects Master by Research PhD Research projects: 18UoC / 24UoC

Some research topics Location based services Preference queries on multi-dimensional data

Location based services Services that integrate a user’s location with other information to provide added value to a user.

Examples  Navigation and travel  Geo-social networking  Gaming  Retail  Advertisement and many many more…

 Location-based services have a bright future Number of mobiles > World’s population 24% use LBS and 94% of these find LBS valuable LBS are a bonanza for start-ups (est. market $13B in 2014) $21B in 2015

Past Research  Shortest Path Query  Range Query  k-Nearest Neighbors Query  Reverse Nearest Neighbors Query  k-Closest Pairs Query and other similar queries…

Shortest path query What is the shortest path from here to airport

Range Query Return the coffee shops within 300 meters.

K Nearest Neighbor Queries Return the closest fuel stations.

Reverse Nearest Neighbor Query Return the cars for which my fuel station is the nearest fuel station.

K-Closest Pairs  Return the closest pair of McDonald’s.

Variations Static queries VS continuous queries Euclidean distance VS network distance

Some research topics Location based services Preference queries on multi-dimensional data

Preference queries on massive multi- dimension data DBG@UNSW18 Massive multidimensional data are collected everyday  location data from various Observational Mechanisms. - Smart Phone 0.36 billion this year in China – largest smart phone market, expect 0.45 billion next year. Baidu Location based service receives 3.5 billion location requests on average each day. - Sensor - Radio Frequency Identification (RFID) - Global Position System (GPS)

Background  Other Multi-dimensional data from various applications - Environment monitoring Measure light, temperature, humidity… - Finance and economic data purchase transactions, stock transactions … - User behavior data click streams, shopping records, … - Network data Network monitoring data - etc. DBG@UNSW19

Problems Investigated DBG@UNSW20 Given a large number of multi-dimensional objects, we investigate the following representative and fundamental queries. Rank-based Queries Top k query, Quantile query, Influence maximization Dominance-based Queries Skyline query, representative skyline query, dominating queries Spatial Keyword queries

DBG@UNSW21 Rank-based queries 1. Top k query p2 p1 p3 X : academic score p4 p6 p5 p7 p8 Y: research score f(p) = x + y

2. Φ-quantile : summarize score distribution DBG@UNSW22 Rank-based queries (cont.) The first element in a sorted list with the cumulative weight not smaller than Φ, where Φ is a number in (0, 1]. Sorted elements: 3 3 6 7 8 9 12 13 15 20 0.5 quantile (median)0.8 quantile

Other Statistics DBG@UNSW23 Rank-based queries (cont.) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Find all elements with frequency > 0.1% Top-k most frequent elements What is the frequency of element 3? What is the total frequency of elements between 8 and 14? How many elements have non-zero frequency?

Rank-based queries (cont.) Reverse rank-based queries (ongoing….) – How can an object be the top-1 result ? – For most users ? – With minimum cost ?

Dominance-based queries DBG@UNSW 25  n-dimensional numeric space D = (D 1, …, D n )  on each dimension, a user preference ≺ is defined  two points, u dominates v (u ≺ v), if -  D i (1 ≤ i ≤ n), u.D i ≺ = v.D i -  D j (1 ≤ j ≤ n), u.D j ≺ v.D j p2 p1 p3 p4 p6 p5 p7 p8 Y: research score X : academic score

DBG@UNSW26 Dominance-based queries (cont.) Skyline : points not dominated by other points. - candidates of best options in multi-criteria decision applications.

Dominance-based queries (cont.) Top-k dominating queries: objects with the highest dominating ability

New challenges (1) Massive Streaming data  Arrive at high speed and the volume of the data is extremely large. - Twitter : 140 million users and over 340 million tweets per Day - 200Mb/sec from a single sensor node for reading of the weather data - AT&T collects 600-800 Gigabytes of NetFlow data each day - Square Kilometre Array (SKA) project : a few exabytes (10 18 bytes) of data per day for a single beam per square kilometer,

Streaming Algorithm DBG@UNSW29 Stream processing Engine Synopses in Memory Data Streams ( Approximate ) Answer  One scan only  Processing time ( fast )  Synopsis size ( small )  Accuracy ( a good tradeoff with synopsis size )

New Challenges (2) DBG@UNSW30 The data may be uncertain for various reasons.  Limits of the measuring devices  Noise  Delay or loss in data transfer.  Privacy  Data integration The uncertainty of the data may be described continuously or discretely.

New Challenges (3) DBG@UNSW31 Enriched spatial data  Textual data - Twitter, Weibo, Fourquare  The user profile - age, gender, preference, etc.  Multimedia data - photos, videos

 An enormous amount of spatio-textual objects available in many applications Online local search e.g., online yellow pages  Social network services e.g., Facebook, Flickr, Twitter Spatial-Textual Objects Spatial keyword search DBG@UNSW32

Top k spatial keyword search p1 (pizza, coffee,sushi) p3 (pizza, sushi) p2 (pizza, coffee,steak) p4 (coffee, sushi) p5 (pizza, steak,seafood) pizza,coffee DBG@UNSW33

A little bit about BIG Data What is big data ? – Four Vs: Value, Velocity, Variety, Verocity How Big ? – Even scanning (linear algorithm) not applicable How to handle ? – New computational paradigms

A little bit about BIG Data A recent Mckinsey Global Institute report forecasts a serious shortage of data science and engineering professionals in 2018. Data scientist: the sexiest job of the 21 st century

Thank you! Questions?

Research of Database UNSW Some slides are taken from Wenjie Zhang.

Similar presentations

Presentation on theme: "Research of Database UNSW Some slides are taken from Wenjie Zhang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research of Database UNSW Some slides are taken from Wenjie Zhang.

Similar presentations

Presentation on theme: "Research of Database UNSW Some slides are taken from Wenjie Zhang."— Presentation transcript:

Similar presentations

About project

Feedback