Data Mining on Streams  We should use runlists for stream data mining (unless there is some spatial structure to the data, of course, then we need to.

Slides:



Advertisements
Similar presentations
Why to learn OSI reference Model? The answer is too simple that It tells us that how communication takes place between computers on internet but how??
Advertisements

Greedy Algorithms (Huffman Coding)
With PGP-D, to get pTree info, you need: the ordering (the mapping of bit position to table row) the predicate (e.g., table column id and bit slice or.
Prepared By: Eng.Ola M. Abd El-Latif
EEC-484/584 Computer Networks
PART III DATA LINK LAYER. Position of the Data-Link Layer.
The OSI Model A layered framework for the design of network systems that allows communication across all types of computer systems regardless of their.
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
OSI Model. Open Systems Interconnection (OSI) is a set of internationally recognized, non-proprietary standards for networking and for operating system.
Types of Addresses in IPv4 Network Range
Chapter 10 File System Interface
Midterm Review - Network Layers. Computer 1Computer 2 2.
The OSI Model An ISO (International standard Organization) that covers all aspects of network communications is the Open System Interconnection (OSI) model.
COMPUTER NETWORKS Ms. Mrinmoyee Mukherjee Assistant Professor
The OSI Model.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
1 Data Link Layer Lecture 20 Imran Ahmed University of Management & Technology.
PART III DATA LINK LAYER. Position of the Data-Link Layer.
Protocols Rules governing the communication process, the language of the deal between the devices, must reflect Layers protocols define format, order of.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 14.
MODULE I NETWORKING CONCEPTS.
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
CSC 600 Internetworking with TCP/IP Unit 5: IP, IP Routing, and ICMP (ch. 7, ch. 8, ch. 9, ch. 10) Dr. Cheer-Sun Yang Spring 2001.
THE OSI MODEL ISO is the organization.OSI is the model. ISO  International Standards Organization OSI  Open Systems Interconnection.
Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 PART III: DATA LINK LAYER ERROR DETECTION AND CORRECTION 7.1 Chapter 10.
Knowledge Discovery in Protected Vertical Information Dr. William Perrizo University Distinguished Professor of Computer Science North Dakota State University,
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
TCP/IP Protocol Suite Suresh Kr Sharma 1 The OSI Model and the TCP/IP Protocol Suite Established in 1947, the International Standards Organization (ISO)
Computer Network Lab. 1 3 장 OSI 기본 참조 모델 n OSI : Open System Interconnection n Basic Reference Model : ISO-7498 n Purpose of OSI Model ~ is to open communication.
20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
IP – Subnetting and CIDR
Network Models.
Behrouz A. Forouzan TCP/IP Protocol Suite, 3rd Ed.
The OSI Model Prof. Choong Seon HONG.
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
IP - The Internet Protocol
Data Link Layer.
Decision Tree Induction for High-Dimensional Data Using P-Trees
CIS 321 Data Communications & Networking
Efficient Ranking of Keyword Queries Using P-trees
IST 220 Lectures for: Dec. 8, 2009 Dec. 10, 2009
Part III Datalink Layer 10.
IP - The Internet Protocol
North Dakota State University Fargo, ND USA
Yue (Jenny) Cui and William Perrizo North Dakota State University
Transport Layer Our goals:
Packetizing Error Detection
PTrees (predicate Trees) fast, accurate , DM-ready horizontal processing of compressed, vertical data structures Project onto each attribute (4 files)
Chapter 3: Open Systems Interconnection (OSI) Model
Packetizing Error Detection
Physical Database Design
Vertical K Median Clustering
Chapter 7 Error Detection and Correction
North Dakota State University Fargo, ND USA
Packetizing Error Detection
Net 323 D: Networks Protocols
OSI Model. Overview:  Review  OSI Model  Layer 1 - The Physical Layer  Layer 2 - The Data Link Layer  Layer 3 - The Network Layer  Layer 4 - The.
The Multi-hop closure theorem for the Rolodex Model using pTrees
Vertical K Median Clustering
CSE 313 Data Communication
North Dakota State University Fargo, ND USA
Error Detection and Correction
Data Link Layer. Position of the data-link layer.
Presentation transcript:

Data Mining on Streams  We should use runlists for stream data mining (unless there is some spatial structure to the data, of course, then we need to use spatially oriented techniques) since a runlist can be truncated at one end and appended to at the other very easily (a Ptree, even a 1-D Ptree cannot accommodate such activity gracefully. However, if the data is spatial and there is a need for the continuity advantage of 2-D Ptrees, then Ptrees should be used!).  We begin with some slides reviewing Ptrees, RunLists and etc. Then move to stream Data Mining.

6. 1 st half of 1 st of 2 nd is  st half of 2 nd half not  st half is not pure1  Whole file is not pure1  0 Horizontal structure Processed vertically (scans) P 11 P 12 P 13 P 21 P 22 P 23 P 31 P 32 P 33 P 41 P 42 P nd half of 2 nd half is  horizontally process these Ptrees using one multi-operand logical AND operation. Ptrees vertical partition ; compress each vertical bit slice into a basic Ptree; R( A 1 A 2 A 3 A 4 ) A table, R(A 1..A n ), is a horizontal structure (set of horizontal records) processed vertically (vertical scans) 1-Dimensional Ptrees are built by recording the truth of the predicate “pure 1” recursively on halves, until there is purity, P 11 : 3. 2 nd half is not pure1  nd half of 1 st of 2 nd not  > R[A 1 ] R[A 2 ] R[A 3 ] R[A 4 ] R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 R 41 R 42 R 43 Eg, to count, s, use “pure ”: level P 11 ^P 12 ^P 13 ^P’ 21 ^P’ 22 ^P’ 23 ^P’ 31 ^P’ 32 ^P 33 ^P 41 ^P’ 42 ^P’ 43 = level = level

1.1 st run is Pure0  0:000 truth:start R( A 1 A 2 A 3 A 4 ) Run Lists : Another way to handle vertical data. Generalized Ptrees using standard run length compression of vertical bit files (alternatively, using Lempl Zipf?, Golomb?, other?) Run Lists: record the type and start-offset of pure runs. E.g., RL 11 : > R[A 1 ] R[A 2 ] R[A 3 ] R[A 4 ] R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 R 41 R 42 R nd run is Pure1  1: rd run is Pure0  0: th run is Pure1  1:110 RL 11 0:000 1:100 0:101 1:110 (to complement, flip purity bits) Eg, to count, s, use “pure ”: RL 11 ^RL 12 ^RL 13 ^RL’ 21 ^RL’ 22 ^RL’ 23 ^RL’ 31 ^RL’ 32 ^RL 33 ^RL 41 ^RL’ 42 ^RL’ 43 RL 11 RL 12 RL 13 RL 21 RL 22 RL 23 RL 31 RL 32 RL 33 RL 41 RL 42 RL 43 0:000 1:100 0:101 1:110 1:000 0:100 1:101 0:000 1:001 0:010 1:100 0:101 1:110 1:000 0:100 1:000 0:110 1:000 0:010 1:011 0:100 1:000 0:100 1:000 0:010 0:000 1:010 0:000 1:010 0:000 1:010 0:101 1:000 0:001 1:010 0:100 1:101 0: R 11

Other Indexes on RunLists We could put Pure0-Run, Pure1-Run and Mixed-Run Indexes on RLs: R 11 RL 11 00:0 11:100 00:101 11:110 01:1000 1RL :1 0110:2 0RL :4 0101:1 start length MRL :8 Or since we would not traverse the RL very often make it a link list and just concat indexes 0RL :4 0101:1 MRL :8 1RL :1 0110:2 START

Indexed RunLists ANDing R 11 1RL :1 0110:2 0RL :4 0101:1 MRL :8: R 34 RL 11 1RL :5 0RL :2 0101:4 MRL :3: :2:10 RL 34 1RL 11^ :1 0RL 11^ :4 0101:4 MRL 11^ :7: RL 11^34

Indexed RunLists ANDing And RL 0 s 1 st, then? R 11 1RL :1 0110:2 0RL :4 0101:1 MRL :8: R 34 RL 11 1RL :5 0RL :2 0101:4 MRL :3: :2:10 RL 34 0RL 11^ :4 0101:4... RL 11^34

Indexed Pure RunLists (no mixed) ANDing. Only need 0RLs! Of course, you need 1RL’s to use as 0RL-comps (maintain 1RLs or construct 0RL-comps on the fly?) To get 1-counts, count 0’s and subtract from total R 11 0RL :4 0101:1 1000:1 1010:1 1100:1 1110: R 34 0RL :2 0011:1 0101:4 1010:1 0RL 11^ :4 0101:4 1010:1 1100:1 1110:1 0-count = sum of lengths = 11 1-count=16-11 = 5

0RL 11’ 0,4,1,1,2,1,1,1,1,1,1,1,1 Zero RunLists ANDing of 34 and 11’ (with pure1 gaps) (0RL 11’ is 0RL 11 with a prefixed 0) R R 34 0RL 34 2,1,1,1,4,1,1, R 11’ 0RL 11’^34 2, 1 1 Intra-Run cursors 1,1, 3 1 1, 4 1 2, 1 1 6, 1 1 7, 1 1 8, 1 1 9, 1 1 1,1, 1 2 1, 1 3 1,1, The 1count of the result is Total minus the 0count or 16 – 13 = 3 So, the coding of this AND program seems straight forward following the animation An intra-run cursor for each operand and a list cursor for each operand and one for the result. We, of course, need the 1RLs too (e.g., for 0RL of a complement). Next let’s allow the red gaps to be mixed and insist that the gaps in a 0RL and its corresponding 1RL be compatible.

0rl ANDing of 34 and 11’ with selected mixed gaps, differentiated by a prefix bit. We will use colors on the slides, pure gap=1, mixed gap= R R 34 0rl 34 2,3:101,4,1,1, R 11’ 1rl 34 0,11: ,5 0rl 11 4,12: rl 11 0,6:000010,2,8: rl 34 2,3:101,4,1,1,5 0rl 11’ 0,6:111101,2,8: : :1010 Note we have to flip mixeds

zmrl 11’ 0,4,1,1,2,0,8 Take the philosophy that we will follow a pointer to long mixed runs only when necessary. Otherwise we will sequence straight across R R 34 zmrl 34 2,3,4,1,1, R 11’ zmrl 11’^34 2, 1 1 zmrl 34 2,3,4,1,1,5 101 zmrl 11 4,1,1,2,0,

When the 16-bit window moves left (e.g., add 100 to 0rl 11 ). zmrl 11 0,1,5,1,1,2,0, rl 11’ 4,1,1,2,1,1,1,1,1,1,1, R 11 0rl 11’ 0,1,6,1,1,2,1,1,1,1,1 zmrl 11 4,1,1,2,0,

Network Security Application (Network security through Vertical Structured data) Network layers do their own partitioning  Packets, frames, etc. (usually independent of any intrinsic data structuring – e.g., record structure) Fragmentation/Reassembly, Segmentation/Reassembly Data privacy is compromised when the horizontal (stream) message content is eavesdropped upon at the reassembled level (in network  A standard solution is to host-encrypt the horizontal structure so that any network reassembled message is meaningless.  Alt.: Vertically structure (decompose, partition) data (e.g., basic Ptrees). Send one Ptree per packet Send intra-message packets separately  Trick flow classifiers into thinking the multiple packets associated with a particular message are unrelated. The message is only meaningful after destination demux-ing  Note: the only basic Ptree that holds actual information is the high-order bit Ptree. Therefore encrypt it! It seems like there ought to be a whole range of killer ideas associated with the concept of using vertical structuring data within network transmission units  Active networking? (AND basic Ptrees (or just certain levels of) at active net nodes?)