Download presentation
Presentation is loading. Please wait.
Published byLeslie Poole Modified over 9 years ago
1
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA mhd@lyle.smu.edu Support provided by Fulbright Grant and IIIT Allahabad IIIT Allahabad 1
2
VoIP Data Outline VoIP overview CDR CDR Example using EMM IIIT Allahabad 2
3
VoIP Overview http://www.voipmechanic.com/what-is-voip.htm IIIT Allahabad 3
4
VoIP Advantages Travel Cost reduction Additional Features: Voice messages, call forwarding, logs, caller ID, … Integration of business tools Common network infrastructure IIIT Allahabad 4
5
VoIP Disadvantages Need reliable broadband internet connection Voice quality IIIT Allahabad 5
6
Telephone-VoIP Steps Analog Telephone Adapter (ATA) converts analog phone call to digital signal. Sent over internet as data packets. Converted back to digital analog. IIIT Allahabad 6
7
VoIP Codec Software on server or ATA that converts voice signal into digital data. COmpressor – DECompressor COder – DECoder Sample (8000, 24000, 32000 times per second) Sort Compress Packetize IIIT Allahabad 7
8
Protocols SIP (Session Initiation Protocol) Signaling to set up and tear down sessions. SDP (Session Description Protocol) Describe call RTP (Realtime Transport Protocol) Exchange data/voice packets Media Transport to transmit packets IIIT Allahabad 8
9
SIP Setup Connect Disconnect Syntax similar to HTTP Bind to IP address using SIP registration URLs for address format: mhd@lyle.smu.edumhd@lyle.smu.edu Independent of application or data types Uses RTP and SDP IIIT Allahabad 9
10
SIP Overview http://www.voipmechanic.com/sip-basics.htm IIIT Allahabad 10
11
VoIP Data Packet [4] IIIT Allahabad 11
12
VoIP Data Any of this digital data could be saved and analyzed. Typically only statistical/summary information about the calls is saved These Call Detail Records (CDR) are use for billing and analysis IIIT Allahabad 12
13
Call Detail Record Log of VoIP usage May be by account Typical attributes: Source Destination Duration of call Amount billed Total usage time in billing period Remaining time in billing period Total charge in billing period The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user. IIIT Allahabad 13
14
CDR Generation [3] Usually created through special Authentication, Authorization, and Accounting (AAA) server. May also be created by logging capabilities at gateway or router using a syslog server software. Normally simply csv format. Normally uses UDP, so underlying data packets are not sequenced and may be lost (Redundancy of servers can help.) Timestamps between routers can be synchronized using a Network Time Protocol (NTP). CDR generated for both forward and return leg of call. http://www.cisco.com/en/US/tech/tk1077/technologies _tech_note09186a0080094e72.shtml http://www.cisco.com/en/US/tech/tk1077/technologies _tech_note09186a0080094e72.shtml IIIT Allahabad 14
15
Example: CISCO CDR Data VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. Over 1.5 million call trials were logged 272,646 connected calls 66 attributes including source, destination, starting time, duration, routing/switching, device, etc Application: Anomaly Detection (Classification) Goal: Find unusual call patterns based on type and time of call Technique: New data structure, New classification algorithm, New visualization technique Sample of raw csv data: http://lyle.smu.edu/~mhd/iiit/start.csv IIIT Allahabad 15
16
CISCO Preprocessing Remove the attributes other than source, destination, starting time, duration from the logs. Count the connected calls and discard unconnected calls. The total number of connected calls was 272,646.5 phone classes: internal, local, national, international, unknown. 25 link classes (source class + destination class) Data is aggregated into 15 minute time intervals. The total number of time points is 5422 and the total number of attributes is 26. Add two attributes, namely, type of day (workday or weekend) and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space. http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls IIIT Allahabad 16
17
CISCO Data Visualization IIIT Allahabad http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png 17
18
IIIT Allahabad Spatiotemporal Stream Data Records may arrive at a rapid rate High volume (possibly infinite) of continuous data Concept drifts: Data distribution changes on the fly Data does not necessarily fit any distribution pattern Multidimensional Temporal Spatial Data are collected in discrete time intervals, Data are in structured format, Data hold an approximation of the Markov property. 18
19
IIIT Allahabad Spatiotemporal Environment Events arriving in a stream At any time, t, we can view the state of the problem as represented by a vector of n numeric values: V t = V1V1 V2V2 …VqVq S1S1 S 11 S 12 …S 1q S2S2 S 21 S 22 …S 2q …………… SnSn S n1 S n2 …S nq Time 19
20
IIIT Allahabad Data Stream Modeling Single pass: Each record is examined at most once Bounded storage: Limited Memory for storing synopsis Real-time: Per record processing time must be low Summarization (Synopsis )of data Use data NOT SAMPLE Temporal and Spatial Dynamic Continuous (infinite stream) Learn Forget Sublinear growth rate - Clustering 20
21
IIIT Allahabad MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: S ={N 1,N 2, …, N m }, and A = {L ij | i 1, 2, …, m, j 1, 2, …, m} and Each arc, L ij = is labeled with a transition probability P ij = P(N j | N i ). 21
22
IIIT Allahabad Extensible Markov Model (EMM) Time Varying Discrete First Order Markov Model Nodes are clusters of real world states. Learning continues during application phase. Learning: Transition probabilities between nodes Node labels (centroid/medoid of cluster) Nodes are added and removed as data arrives 22
23
IIIT Allahabad EMM Creation <18,10,3,3,1,0,0><17,10,2,3,1,0,0><16,9,2,3,1,0,0><14,8,2,3,1,0,0><14,8,2,3,0,0,0><18,10,3,3,1,1,0.> 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1 23
24
IIIT Allahabad EMMRare EMMRare algorithm indicates if the current input event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs: The frequency of the node at time t+1 is below this threshold The updated transition probability of the MC transition from node at time t to the node at t+1 is below the threshold 24
25
Sublinear Growth Rate IIIT Allahabad 25
26
Rare Event in Cisco Data IIIT Allahabad 26
27
References 1.VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm.http://www.voipmechanic.com/what-is-voip.htm 2.Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634. 3.Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068, February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml. http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml 4.Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml. http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml 5.“VoIPThink”, http://www.en.voipforo.com, Accessed February 1, 2012.http://www.en.voipforo.com 6.Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. 7.Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) 8.Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. 9.Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.) 10.Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634. IIIT Allahabad 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.