Presentation is loading. Please wait.

Presentation is loading. Please wait.

LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.

Similar presentations


Presentation on theme: "LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington."— Presentation transcript:

1 LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington

2 What is a Lahar? Lahar -- SIGMOD 2008 -- Christopher Re2 This is a Lahar May 18, 1980 ~ 8:27am… a few minutes later It’s a massive, fast stream of dirt(y data) Our system, Lahar, processes queries on massive, dirty streams of data

3 Event Queries Lahar -- SIGMOD 2008 -- Christopher Re 3 CB A D E  Motivating App: RFID  Event queries as Cayuga, Sase and Snoop  Complex sequences using projections, predicates,… Joe entered office 422 at t=8 Query: “Alert when Joe enters 422” i.e. Joe outside 422, inside 422

4 Challenges: Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re4 6 th Floor in CS building Blue ring is Joe’s Location Antennas

5 6 th Floor in CS building Challenges: Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re5 Blue ring is Joe’s Location Antennas Two Problems: 1.Missed Readings 2.Granularity Mismatch  Propose: infer location, keep probs & query with Lahar  Model Based View [Deshpande et al] of an HMM Lahar retains probabilities, achieves higher quality (P/R) and is still efficient.

6 Outline Lahar -- SIGMOD 2008 -- Christopher Re6  RFID streams to probabilistic streams  Lahar queries on probabilistic streams  Query algorithms: Regular and Extended Regular  Experiments

7 Tracking Joe’s Location Lahar -- SIGMOD 2008 -- Christopher Re7 Blue ring is ground truth Antennas 6 th Floor in CS building

8 Probabilities via particle filter Lahar -- SIGMOD 2008 -- Christopher Re8 Each orange particle is a guess of Joe’s location Blue ring is ground truth Antennas Particles guess many locations per timestep, so data are uncertain 6 th Floor in CS building

9 TagtLocP Joe74220.4 Hall30.4 Hall40.2 Joe84220.6 Hall30.2 Hall40.2 Sue7…… From particles to a probabilistic stream Lahar -- SIGMOD 2008 -- Christopher Re9 At(tag,loc) Query Particle Filter output via At – a model based view

10 (0.4+0.2) * 0.6 = 0.36 TagtLocP Joe74220.4 Hall30.4 Hall40.2 Joe84220.6 Hall30.2 Hall40.2 Sue7…… Semantics of the Model Lahar -- SIGMOD 2008 -- Christopher Re10 At(tag,loc) TagtLoc Joe7Hall4 Joe8422 Sue7… Prob = 0.2 * 0.6 * … “Joe enters 422” @ t=8 A query q returns the probability that q is true at each time t possible stream (worlds) Probability outside 422 (in Hall3,Hall4)

11 Outline Lahar -- SIGMOD 2008 -- Christopher Re11  RFID streams to probabilistic streams  Lahar queries on probabilistic streams  Query algorithms: Regular and Extended Regular  Experiments

12 Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re12 Alert when Joe is in hallway 4 and later in office 422 Inspired by Cayuga [Demers et al 2006, White et al 2007]

13 Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re13 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007]

14 Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re14 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Alert when Joe is in hallway 4, and immediately in office 422

15 Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re15 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4Joe in 422 Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)

16  Regular Queries (Efficient, streamable)  Alert when Joe enters 422  Extended Regular (Efficient, streamable)  Alert when anyone enters 422 A hierarchy of Lahar queries Lahar -- SIGMOD 2008 -- Christopher Re16

17 A hierarchy of Lahar queries Lahar -- SIGMOD 2008 -- Christopher Re17  Regular Queries (Efficient, streamable)  Alert when Joe enters 422  Extended Regular (Efficient, streamable)  Alert when anyone enters 422  Safe (Efficient, but not streamable)  Unsafe (Inefficient)

18 Outline Lahar -- SIGMOD 2008 -- Christopher Re18  RFID streams to probabilistic streams  Lahar queries on probabilistic streams  Query algorithms: Regular and Extended Regular  Experiments

19 Review: A non-probabilistic example Lahar -- SIGMOD 2008 -- Christopher Re19 Alert me when Joe enters 422 TagTLoc Joe7Hall 4 Joe8422 TagTLoc Joe7Hall 4 Joe8423 Accept at t = 8 {} {1} {2} {} {1} {} Final Joe in Hall4Joe in 422 1 2

20 … now with probabilities Lahar -- SIGMOD 2008 -- Christopher Re Final Joe in Hall4Joe in 422 1 2 Accept t=8 with p = 0.3 Alert me when Joe enters 422 {} 1.0 {} 0.5, {1} 0.5 {} 0.65, {1} 0.05, {2} 0.3 Distribution on States TagTLocP Joe7Hall40.5 Joe84230.3 4220.6

21 Lies in the preceding slides… (technical details) Lahar -- SIGMOD 2008 -- Christopher Re21  Richer predication: “Alert when Joe enters any office”  Translate query and input into an alphabet Final Joe in Hall4Joe in 422 1 2  Key Technical Detail:  Alphabet is small in data  Streamable  See paper for compilation

22 Extension to Extended regular Lahar -- SIGMOD 2008 -- Christopher Re22 “Alert when anyone enters 422”

23 Extension to Extended regular Lahar -- SIGMOD 2008 -- Christopher Re23  Algorithm:  (Obs1) suggests run automaton for each person  (Obs2) suggests multiply to get prob any is true Space = O(# persons), not # timesteps: can stream “Alert when anyone enters 422” (Obs 1) Each query is regular(Obs 2) disjoint sets of events Hence, probabilistically independent

24 Summary of Contributions  Regular Queries (Efficient, streamable)  Compiled to an automaton,streaming, O(1) space  Extended regular (Efficient, streamable)  Streaming with O(m) space, i.e. # of persons.  See paper for Markovian correlations, more sophisticated predication, complete compilation and static analysis algorithms  Safe (Efficient, but not streamable)  Unsafe (Inefficient, most #P-hard)

25 Outline Lahar -- SIGMOD 2008 -- Christopher Re25  RFID streams to probabilistic streams  Lahar queries on probabilistic streams  Query algorithms: Regular and Extended Regular  Experiments

26 Experimental Setup Lahar -- SIGMOD 2008 -- Christopher Re26  Quality: How is P/R affected by keeping probs?  52 objects, 352 locations, 10k sq. ft.  2x30min trace with 10 min break in between  Participants marked down true locations

27 Experimental Setup Lahar -- SIGMOD 2008 -- Christopher Re27  Quality: How is P/R affected by keeping probs?  52 objects, 352 locations, 10k sq. ft.  2x30min trace with 10 min break in between  Participants marked down true locations  “Alert when anyone enters a coffee room”  Baseline: Most Likely Estimate (MLE)  Each timestep/Each person: most likely location

28 Quality: Realtime – Improve over MLE? Lahar -- SIGMOD 2008 -- Christopher Re 28  Declare an event “true”, if its Pr > threshold  Vary threshold Precision Recall F1 10% improvement in F1

29 Performance: Is the cost too high? Lahar -- SIGMOD 2008 -- Christopher Re29 Synthetic Data – Same query

30 Related Work Lahar -- SIGMOD 2008 -- Christopher Re30  Event Queries – Deterministic  Cayuga, SASE, SnoopIB  Model-Based Views  BBQ, recently, Kanagal et al ICDE 08  Probabilistic Databases  Mystiq, Trio, MayBMS, Maryland, Purdue,MCDB  Particle Filters on HMMs  Doucet, Godsill

31 Conclusion Lahar -- SIGMOD 2008 -- Christopher Re31  Showed Lahar  Processed output of several inference tasks (HMMs)  Applies more generally than just RFID  Quality (F1) gains by keeping probability  Performance usable in real-time  Lots of concurrent tags  No indexing!

32 Lahar -- SIGMOD 2008 -- Christopher Re32

33 Overview of Regular Query Algorithm Lahar -- SIGMOD 2008 -- Christopher Re33 1. Compile an event query q 1. Automaton (A) over a language L 2. Mapping (M) events to subsets of L 2. Runtime – Input is set of events E 1. Map E into subsets of L via M 2. Maintain set of possible states of A Deterministic Probabilistic stays same distribution Size of distribution depends only on the query, q. NB: example to follow For details, see paper

34 Why are ER queries hard? Lahar -- SIGMOD 2008 -- Christopher Re34  Regular Queries ~ Regular Expressions  Mapping is non-trivial  Inspired by Cayuga [Demers et al. 06]  Queries have #P-combined complexity  Encode mDNF as regular expression  Intuition: n-sized automaton leads to  Extended regular ~ 1 NFA per/person  k persons implies O(k)-size automaton  Exponential cost When ER, can avoid blowup

35 Regular and Extended Regular Lahar -- SIGMOD 2008 -- Christopher Re35  Query is regular if no variable is shared between subgoals  Query is extended regular if any variable shared by two subgoals, is shared by all subgoals p is shared between subgoals

36 Correlations Lahar -- SIGMOD 2008 -- Christopher Re36

37 Sequencing by example Lahar -- SIGMOD 2008 -- Christopher Re37  Sequencing is parameterized [Cayuga] Time Semicolon means “the next event among those that match next goal” Semicolon is not “after”

38 Compilation by example Lahar -- SIGMOD 2008 -- Christopher Re38  Each goal “corresponds” to two letters:  move (m) – the query should advance  accept (a) – the next subgoal accepts Any other maps to empty set Final Does not contain Does contain

39 Subtle example.. Lahar -- SIGMOD 2008 -- Christopher Re39  What about: Any other maps to empty set Final Does not contain Does contain

40 CUT II Lahar -- SIGMOD 2008 -- Christopher Re40

41 Motivating Apps Lahar -- SIGMOD 2008 -- Christopher Re41  RFID apps  Diary and Active Calendar Application.  Alert if I go to a database meeting.  Supply chain  Alert if Mach 3 razors are being stolen  Many independent HMMs  Elder care [Intel/UW]  Alert if elder takes their medicine with water  Activity Recognition  Financial applications on predictive HMM  Alert if head-and-shoulders market

42 Compile Select and Filter Lahar -- SIGMOD 2008 -- Christopher Re42  Intuition: goal maps to two letters:  match (m) : matches filter  accept (a) : accepted by select Final Does not contain Does contain language and automaton are the same for both queries

43 Wrinkle in the language: Filter v. Selection Lahar -- SIGMOD 2008 -- Christopher Re43 “Alert next time Joe is in 502 after he is in 501” Time Yes No “Alert if the next place Joe is in after 501 is 502” At

44 Recap of Algorithms Lahar -- SIGMOD 2008 -- Christopher Re44  Regular Queries  Compiled them to an NFA, then used image  Data complexity O(1)  Extended regular  Several regulars multiplied together  Depends on number of distinct people in the data, not number of time steps.

45 Lahar -- SIGMOD 2008 -- Christopher Re45  Text1    Eculid   uclid    

46 Lahar Queries by Example Lahar -- SIGMOD 2008 -- Christopher Re46 Alert when Joe is in hallway 4 and later in office 422 Joe in Hall4Joe in 422 Alert when Joe is in hallway 4, and immediately in office 422 Joe in Hall4Joe in 422 Inspired by Cayuga [Demers et al 2006, White et al 2007] Challenge with probabilities: Naïve approach is exponential; unavoidable (#P)

47 Quality: Archived – Improve over Viterbi? Lahar -- SIGMOD 2008 -- Christopher Re 47  Smoothing v. Viterbi (MAP)  Lahar tracks of Markovian Correlations  Viterbi leverages correlations for MAP estimate PrecisionRecallF1 Approx ~30% gain in F1


Download ppt "LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington."

Similar presentations


Ads by Google