1 The Stream Star Schema Stephen A. Broeker 1010.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

2 Casa 15m Perspectiva Lateral Izquierda.
Repaso: Unidad 2 Lección 2
1 A B C
Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.
Simplifications of Context-Free Grammars
Variations of the Turing Machine
Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
Select from the most commonly used minutes below.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
David Burdett May 11, 2004 Package Binding for WS CDL.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt RhymesMapsMathInsects.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Media-Monitoring Final Report April - May 2010 News.
Break Time Remaining 10:00.
Factoring Quadratics — ax² + bx + c Topic
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
Bellwork Do the following problem on a ½ sheet of paper and turn in.
1 The Royal Doulton Company The Royal Doulton Company is an English company producing tableware and collectables, dating to Operating originally.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
BEEF & VEAL MARKET SITUATION "Single CMO" Management Committee 22 November 2012.
Chapter 20 Network Layer: Internet Protocol
TESOL International Convention Presentation- ESL Instruction: Developing Your Skills to Become a Master Conductor by Beth Clifton Crumpler by.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
Chapter 1: Expressions, Equations, & Inequalities
1..
1 TV Viewing Trends Rivière-du-Loup EM - Diary Updated Spring 2014.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
Before Between After.
Subtraction: Adding UP
Bell Busters! Unit 1 #1-61. Purposes of Government 1. Purposes of government 2. Preamble to the Constitution 3. Domestic tranquility 4. Common defense.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

1 The Stream Star Schema Stephen A. Broeker 1010

2 Conclusion The Stream Star Schema processes data streams in real- time. Up to gigabits per second. Stream Star performance is O(1). 2020

3 phone calls road traffic network traffic website traffic power supplies credit card transactions sensor arrays financial markets are data rich. But real-time analysis po Large Fast Dynamic Data Streams 3030

4 phone calls road traffic network traffic website traffic power supplies credit card transactions sensor arrays financial markets Data rich. But poor in real-time analysis. Large Fast Dynamic Data Streams 4040 phone calls road traffic network traffic website traffic power supplies credit card transactions sensor arrays financial markets

5 What are the consequences? Large Fast Dynamic Data Streams 5050

6 hard tosee patternshard tosee patterns Therefore difficult to detect problems. Large Fast Dynamic Data Streams 6060

7 Network monitoring at high speed is difficult: Packets arrive every nanosecond on a 1Gbps NIC Must use SRAM for per-packet processing Traditional solution of sampling is inherently not accurate due to the loss of data. Challenge of Network Monitoring 7070

8 Achieve real-time OLAP for massive data streams. Achieve cybernetic control for systems that depend on rapid data analysis. Vision 8080

9 Detection 9090

10 Forensics 10

11 Data RATES are measured in bits per second. So, Gigabits (Gb) Gigabytes (GB). Data Rates versus Data Storage Lowercase b 11

12 Data RATES are measured in bits per second. Data STORAGE is measured in Bytes. So, Gigabits (Gb) Gigabytes (GB). Data Rates versus Data Storage Lowercase bUppercase B 12

13 Ethernet Network Interface Card transferring data at 1 Gbps. Data accumulates at 450MB per hour. Thats 10.5 TB per day, 73.8 TB per week! Data Storage based on Data Rate 13

14 What if BYTES were pennies? Picturing Orders of Magnitude X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA 10 6 = = = =

15 What if BYTES were pennies? Picturing Orders of Magnitude X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA 10 6 = = = =

16 What if BYTES were pennies? Picturing Orders of Magnitude X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA 10 6 = = = =

17 What if BYTES were pennies? Picturing Orders of Magnitude X At 1Gbps, 2.2 PB accumulate per month. Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA 10 6 = = = =

18 What if BYTES were pennies? Picturing Orders of Magnitude X Used with permission: © Copyright 2001 Alan Taylor – The Mega Penney Project - KOKOGIAK MEDIA =

19 The network stream is segmented into flows, which are inserted into a database. Observed database input rate for 1 Gb Ethernet NIC: 700,000 flows per hour. Existing databases cant keep up! From Streaming Data to Database 18

20 Disk Star Schema STREAM Star Schema Consider 2 Database Schemas 19

21 So wheres the star? Disk Star Schema From Fact Table to Dimension Tables Content Table Sender Table Subject Table Recipient Table Destination IP Table Content Destination IP Sender Recipient Subject Thats all there is to the star concept. Heres the star. 20

22 Value of the Disk Star Schema Conserve Disk Space 21

23 Dimensions Each Dimension gets a key. 22

24 Resulting in a Dimension Table 1NF: No Repeating Groups 23

25 Thus deriving a Fact Table. Substitute Keys for Facts 24

26 Disk Star Schema = Slow data insertion time. Relational databases are normalized to conserve space. Speed is sacrificed. So real-time analysis is compromised. 25 Slow Bottleneck

27 Disk Star Schema 26

28 Disk Star Schema 27

29 Disk Star Schema 28

30 Disk Star Schema 29

31 Dimension table insertion time depends on the table size which is O (log n ) where n is the number of records in a table. Disk Star Schema insertion time, is the sum of all dimension table insert times O ( Ʃ 1 i l (log n i )) where l is the number of attributes in the database and n i is the number of values for attribute i. Cant fill dimension tables fast enough! Bottleneck 30

32 1,000,000,000 bit Ethernet NIC (1Gb) 700,000 Observed Flows per hour 460 MBs per hour, 10.5 TBs a day All we can get is a snapshot-analysis! Short Pause to Review Numbers 31

33 Disk Star Schema STREAM Star Schema Consider 2 Database Schemas 32

34 Stream Star Schema 33 Stream Star Schema

35 34 Stream Star Schema

36 Stream Star Schema 35 Stream Star Schema

37 Disk Star Schema Nearly 1:1 Correspondence between string attributes and Dimension tables. 36

38 Disk Star Schema Two kinds of tables - fact, dimension. All string dimensions have dimension tables. Minimize disk space. Dimension tables can be large. Long insert time = O ( Ʃ 1 i l (log n i )) No string duplication. 37

39 Many:1 38 Stream Star Schema

40 Three kinds of tables - fact, dimension, string. Few dimension tables. Dimension tables are small. Minimizes insertion time. I n s e r t t i m e i s c o n s t a n t. Allow string duplication. Allow string duplication. 39 Stream Star Schema

41 Side x Side Comparison SlowFast OldNew 40

42 Test Results 41

43 Test Results The magnified area is different because I measured the insert time for (1, 10, 100) as opposed to (1000, 2000, 3000) streams. 42

44 Test Results The magnified area is different because of how MySQL works. I can only present a hypothesis since I dont have the MySQL source code. But I suspect that MySQL is optimized for less than 100 streams for this problem. 43

45 Conclusion 44

46 Conclusion The Stream Star Schema processes data streams in real- time. Up to gigabits per second. Stream Star performance is O(1). 45

47 Hope Detection Forensics RFID 46

48 Theres data flow 47

49 And then theres DATA FLOW! 48

50 Disk Star Schema handles 3 million flows per hour, about this much. 49

51 The Stream Star Schema handles 113 million flows per hour! Disk Star Schema handles 3 million flows per hour, about this much. 50

52 Nearly 40x Faster! 51

53 For The Future Implement the Stream Star Schema in the Cloud. Use multiple Stream Star Schema computer nodes to handle an infinite stream. Storage could be handled similarly to S3. 52

54 For The Future The Stream Star Schema fully supports the analysis of high-speed data streams thus enabling security applications and forensic processing. 53

55 END