Prominent Streak Discovery in Sequence Data Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, Yong Yu ( 1 Shanghai Jiao Tong University; 2 The University of.

Slides:



Advertisements
Similar presentations
EMGT 501 HW #1 Solutions Chapter 2 - SELF TEST 18
Advertisements

Longest Common Subsequence
Finding, Monitoring, and Checking Claims Computationally Based on Structured Data Brett Walenz, You (Will) Wu, Seokhyun (Alex) Song, Emre Sonmez, Eric.
Academic Content Standards Patterns, Functions, and Algebra Standard 8 th Grade 1. Relate the various representations of a relationship; i.e., relate.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Kobe Bryant The Best. He Is The Next Jordan! NBA Scoring Leaders NBA Scoring Leaders Pts Allen Iverson 34.2 Kobe Bryant 32.2 LeBron James 28.9.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
MATH 109 Exam 1 Review. Jeopardy DataFunctions Picturing Functions Straight As An Arrow Potpourri
CONFIDENTIAL 1 Grade 8 Algebra1 Frequency and Histograms.
Robert Bornstein Jamie Favors, James Thomas, Allison Charland, Shawn Padrick Department of Meteorology and Climate Science San Jose State University 4.
 Compound Inequality › Two inequalities that are joined by the word and or the word or.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Amy LeHew Elementary Math Facilitator Meeting October 2012.
Frequency Polygon.  A frequency polygon is a line graph in which points are joined by straight lines and closed to make a polygon  A frequency Polygon.
Cumulative Frequency, Box Plots, Percentiles and quartiles.
Get up, get moving is a new campaign that helps people notice the importance of fitness. We help them get the sense that healthy living is vital for us;
Traffic modeling and Prediction ----Linear Models
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 Hawkes Learning Systems. All rights reserved. Hawkes Learning Systems: College Algebra.
Warm Up Problem of the Day Problem of the Day Lesson Presentation
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Longest Increasing Subsequences in Windows Based on Canonical Antichain Partition Erdong Chen (Joint work with Linji Yang & Hao Yuan) Shanghai Jiao Tong.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Objectives Describe the central tendency of a data set.
Additional Example 1: Organizing and Interpreting Data in a Frequency Table
Frequency Tables, Stem-and-Leaf Plots, and Line Plots
Skewness & Kurtosis: Reference
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
 Methods to take large amounts of data and present it in a concise form › Want to present height of females and males in STA 220 › Could measure everyone.
Data Management+ Laboratory Dynamic Skylines Considering Range Queries Speaker: Adam Adviser: Yuling Hsueh 16th International Conference, DASFAA 2011 Wen-Chi.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Winter-Quiz- Intermezzo On each slide you will find a screenshot from R-studio / or some previous theoretical statistical class material. You can find.
Warmup What is the graph telling you?. Lesson 9 Data Display.
Archived Data Management System Study Advisory Committee Meeting May 14, 2003.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
CONFIDENTIAL 1 Grade 8 Algebra1 Organizing and Displaying Data.
Day 15: Data Goal: To organize data in various graphs. AND To describe the central tendency of data. Standard: – Describe a data set using data.
Slide 3-1 Copyright © 2008 Pearson Education, Inc. Chapter 3 Descriptive Measures.
The Number System Ordering Integers 1 © 2013 Meredith S. Moody.
Chapter 3: Section 2 Measures of Variance. Paint Comparison: How many months will they last??? Brand ABrand B Average for Brand.
On the R ange M aximum-Sum S egment Q uery Problem Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan.
7-1 Frequency Tables, Stem-and-Leaf Plots, and Line Plots Course Frequency Tables, Stem-and- Leaf Plots, and Line Plots Course 2 Warm Up Warm Up.
Do Now from 1.2a Find the domain of the function algebraically and support your answer graphically. Find the range of the function.
International Conference on Data Engineering (ICDE 2016)
Inna Khomenko, Oleksandr Dereviaha
By: Sibo Wang, Xiaokui Xiao, Yin Yang, Wenqing Lin
Calculating Median and Quartiles
Warm Up What do you know about the graph of f(x) = (x – 2)(x – 4)2 ?
5-4 Compound Inequalities Again!
COORDINATES, GRAPHS AND LINES
Warm Up Problem of the Day Problem of the Day Lesson Presentation
Graphs of Sine and Cosine Functions
Querying, Exploration, and Analytics of Entity Data Graphs
Climate Graphs What do they tell us?.
Climate Graphs What do they tell us?.
Numerical Measures: Skewness and Location
Molecular Therapy - Nucleic Acids
Chapter 5 Continuous Random Variables and Probability Distributions
7.2 Graphing Polynomial Functions
Constructing Box Plots
Displaying Numerical Data Using Box Plots
Honors Statistics Chapter 4 Part 3
X y y = x2 - 3x Solutions of y = x2 - 3x y x –1 5 –2 –3 6 y = x2-3x.
Please copy your homework into your assignment book
Divide & Conquer Sorting
EQ: How do we construct dot plots and histograms
Which is equivalent - 2x = -30 2x = 13 -2x = 30
5-4 Compound Inequalities Again!
Survey on Coverage Problems in Wireless Sensor Networks - 2
Presentation transcript:

Prominent Streak Discovery in Sequence Data Xiao Jiang, Chengkai Li, Ping Luo, Min Wang, Yong Yu ( 1 Shanghai Jiao Tong University; 2 The University of Texas at Arlington; 3 HP Labs China) 1. Motivation Prominent streaks stated in news articles: This month the Chinese capital has experienced 10 days with a maximum temperature in around 35 degrees Celsius – the most for the month of July in a decade. The Nikkei 225 closed below for the 12th consecutive week, the longest such streak since June He (LeBron James) scored 35 or more points in nine consecutive games and joined Michael Jordan and Kobe Bryant as the only players since 1970 to accomplish the feat. 2. Problem Formulation In a sequence of values, a streak is the triple of left-end, right-end, and the minimum value in the interval. E.g We define the Dominance Relationship: Streak dominates streak iff or Prominent streaks are streaks that are not dominated by others. Task 1: given a data sequence, compute the prominent streaks. E.g. 6. Experiments Skyline Operation 4. Local Prominent Streak Streak dominates streak locally iff s1 dominates s2 and Local prominent streaks (LPS) are streaks that are not locally dominated by others. Property 1: prominent streaks are also LPS Property 2: the number of LPS is less than or equal to sequence length Conclusion: LPS is a good set of candidate streaks Monitoring: Real-world data sequence often grows, with newly appended values. Task 2: keep the prominent streaks up-to-date 1. Maintain a list of streaks when scanning the sequence rightward. 2. After scanning the kth value, right-ends of streaks in the list are all k. 3. When scanning (k+1)-th value, try to extend the streaks rightward. 4. After scanning the last value, all the streaks in the list are LPS. 3.1 Streaks whose v is less than (k+1)-th value should be extended. 3.2 Only the longest streak of the rest should be extended. 3.3 The streaks whose v is greater than (k+1)-th value are LPS. 7. Interesting Prominent Streaks Traffic count of Wikipedia page of Lady Gaga (Wiki2) more than half of the prominent streaks are around Sep. 12 th (VMA 2010) at least 2000 traffic hourly lasting for almost 4 days 3. Solution Framework Data Value Sequence Candidate Streaks Prominent Streaks 5. Linear LPS Method As the streaks share the same right-end, the minimum values are increasing if the streaks are listed in the increasing order of left-ends. Figure below: l-v plot for the above example. Real-world datasets, sequence length from thousands to millions. A variety of application scenarios, including meteorology, finance, network traffic. Data setsLength#PSBaseline(ms)LLPS(ms) Gold River Melb Melb Wiki Wiki Wiki SP HPQ IBM AOL WC >1 hour3404 A closer look at SP500 (S&P 500 index, 06/ /2000) Monitoring prominent streaks of AOL and WC98: Melbourne daily min/max temperature between 1981 and 1990 (Melb1 & Melb2) more than 2000 days with min temperature above zero 6 days: the longest streak above 35 degrees Celsius Cumulative Execution Time at Various Positions, for Different Reporting frequencies Total Execution Time by Reporting Frequencies