Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research of Database UNSW Some slides are taken from Wenjie Zhang.

Similar presentations


Presentation on theme: "Research of Database UNSW Some slides are taken from Wenjie Zhang."— Presentation transcript:

1 Research of Database Group @ UNSW Some slides are taken from memebers @DBG Wenjie Zhang

2 Group Overview Research Field: core topics in DB, DM, IR, MM Group Size: 8 staff members; 20+ PhD students Research support: Consistent success in government research grant applications

3 Some recent research projects Xuemin Lin and Wenjie Zhang: Efficiently Processing Pattern-based Structure Queries over Large Graphs, ARC Discovery Grant (2015 - 2017 ), $397,500 Wenjie Zhang and Lei Chen, Continuous Loyalty-based Similarity Queries over Moving Objects, ARC Discovery Project (2015-2017), $266,300 Lijun Chang, Efficient Cohesive-Subgraph Search over Large Graphs, ARC Early Career Research Award (2015-2017), $372, 000 Xuemin Lin, Probablistic Search Over Large-Scale Uncertain Graphs, ARC Discovery Project(2014-2016), $413,000 Xuemin Lin and Wenjie Zhang, Ranking Complex Objects in a Multi- dimensional Space, ARC Discovery Project(2012-2014), $350,000 Wenjie Zhang, Continuously Monitoring Uncertain Objects in a Multi- dimensional Space, ARC Early Career Research Award (2012-2014), $375,000

4 What is research ? Research comprises "creative work undertaken on a systematic basis in order to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications”. ---- wikipedia

5 Research degrees & projects Master by Research PhD Research projects: 18UoC / 24UoC

6 Some research topics Location based services Preference queries on multi-dimensional data

7 Location based services Services that integrate a user’s location with other information to provide added value to a user.

8 Examples  Navigation and travel  Geo-social networking  Gaming  Retail  Advertisement and many many more…

9  Location-based services have a bright future Number of mobiles > World’s population 24% use LBS and 94% of these find LBS valuable LBS are a bonanza for start-ups (est. market $13B in 2014) $21B in 2015

10 Past Research  Shortest Path Query  Range Query  k-Nearest Neighbors Query  Reverse Nearest Neighbors Query  k-Closest Pairs Query and other similar queries…

11 Shortest path query What is the shortest path from here to airport

12 Range Query Return the coffee shops within 300 meters.

13 K Nearest Neighbor Queries Return the closest fuel stations.

14 Reverse Nearest Neighbor Query Return the cars for which my fuel station is the nearest fuel station.

15 K-Closest Pairs  Return the closest pair of McDonald’s.

16 Variations Static queries VS continuous queries Euclidean distance VS network distance

17 Some research topics Location based services Preference queries on multi-dimensional data

18 Preference queries on massive multi- dimension data DBG@UNSW18 Massive multidimensional data are collected everyday  location data from various Observational Mechanisms. - Smart Phone 0.36 billion this year in China – largest smart phone market, expect 0.45 billion next year. Baidu Location based service receives 3.5 billion location requests on average each day. - Sensor - Radio Frequency Identification (RFID) - Global Position System (GPS)

19 Background  Other Multi-dimensional data from various applications - Environment monitoring Measure light, temperature, humidity… - Finance and economic data purchase transactions, stock transactions … - User behavior data click streams, shopping records, … - Network data Network monitoring data - etc. DBG@UNSW19

20 Problems Investigated DBG@UNSW20 Given a large number of multi-dimensional objects, we investigate the following representative and fundamental queries. Rank-based Queries Top k query, Quantile query, Influence maximization Dominance-based Queries Skyline query, representative skyline query, dominating queries Spatial Keyword queries

21 DBG@UNSW21 Rank-based queries 1. Top k query p2 p1 p3 X : academic score p4 p6 p5 p7 p8 Y: research score f(p) = x + y

22 2. Φ-quantile : summarize score distribution DBG@UNSW22 Rank-based queries (cont.) The first element in a sorted list with the cumulative weight not smaller than Φ, where Φ is a number in (0, 1]. Sorted elements: 3 3 6 7 8 9 12 13 15 20 0.5 quantile (median)0.8 quantile

23 Other Statistics DBG@UNSW23 Rank-based queries (cont.) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Find all elements with frequency > 0.1% Top-k most frequent elements What is the frequency of element 3? What is the total frequency of elements between 8 and 14? How many elements have non-zero frequency?

24 Rank-based queries (cont.) Reverse rank-based queries (ongoing….) – How can an object be the top-1 result ? – For most users ? – With minimum cost ?

25 Dominance-based queries DBG@UNSW 25  n-dimensional numeric space D = (D 1, …, D n )  on each dimension, a user preference ≺ is defined  two points, u dominates v (u ≺ v), if -  D i (1 ≤ i ≤ n), u.D i ≺ = v.D i -  D j (1 ≤ j ≤ n), u.D j ≺ v.D j p2 p1 p3 p4 p6 p5 p7 p8 Y: research score X : academic score

26 DBG@UNSW26 Dominance-based queries (cont.) Skyline : points not dominated by other points. - candidates of best options in multi-criteria decision applications.

27 Dominance-based queries (cont.) Top-k dominating queries: objects with the highest dominating ability

28 New challenges (1) Massive Streaming data  Arrive at high speed and the volume of the data is extremely large. - Twitter : 140 million users and over 340 million tweets per Day - 200Mb/sec from a single sensor node for reading of the weather data - AT&T collects 600-800 Gigabytes of NetFlow data each day - Square Kilometre Array (SKA) project : a few exabytes (10 18 bytes) of data per day for a single beam per square kilometer,

29 Streaming Algorithm DBG@UNSW29 Stream processing Engine Synopses in Memory Data Streams ( Approximate ) Answer  One scan only  Processing time ( fast )  Synopsis size ( small )  Accuracy ( a good tradeoff with synopsis size )

30 New Challenges (2) DBG@UNSW30 The data may be uncertain for various reasons.  Limits of the measuring devices  Noise  Delay or loss in data transfer.  Privacy  Data integration The uncertainty of the data may be described continuously or discretely.

31 New Challenges (3) DBG@UNSW31 Enriched spatial data  Textual data - Twitter, Weibo, Fourquare  The user profile - age, gender, preference, etc.  Multimedia data - photos, videos

32  An enormous amount of spatio-textual objects available in many applications Online local search e.g., online yellow pages  Social network services e.g., Facebook, Flickr, Twitter Spatial-Textual Objects Spatial keyword search DBG@UNSW32

33 Top k spatial keyword search p1 (pizza, coffee,sushi) p3 (pizza, sushi) p2 (pizza, coffee,steak) p4 (coffee, sushi) p5 (pizza, steak,seafood) pizza,coffee DBG@UNSW33

34 A little bit about BIG Data What is big data ? – Four Vs: Value, Velocity, Variety, Verocity How Big ? – Even scanning (linear algorithm) not applicable How to handle ? – New computational paradigms

35 A little bit about BIG Data A recent Mckinsey Global Institute report forecasts a serious shortage of data science and engineering professionals in 2018. Data scientist: the sexiest job of the 21 st century

36 Thank you! Questions?


Download ppt "Research of Database UNSW Some slides are taken from Wenjie Zhang."

Similar presentations


Ads by Google