Presentation is loading. Please wait.

Presentation is loading. Please wait.

Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

Similar presentations


Presentation on theme: "Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak."— Presentation transcript:

1 Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak. A Technical Seminar on

2  Stream Data Classification.  Novel class Detection.  Data Generation.  Training Classifiers.  Steps Involved.  Applications.  Conclusion.  Future Scope.

3  Stream Data  Stream Data : Sequence of data or packets.  Managing online transactions requires classification of data.  Minimize space and time required.  Dynamic nature  Dynamic nature of data.

4  Intrusion Detection : - On a network, data arriving may also contain attacks, viruses, worms etc. Hence we need to classify them and the cause of their arrival. Here, stream data classification can be used.

5  Infinite Length: - Fast and continuous. - Impractical to store. - Incremental learning.  Concept Drift: - Underlying concept of stream changes. - Updations in classifier. - Classifiers must adapt to changes.

6  Concept Evolution: - New classes evolve in data. - Example - Example: During intrusion detection in network, a new type to attack evolves.  Feature evolution: - New features evolve. - Example - Example: Text streams on Twitter.  Labelling of Data: - Difficult Process. - Data arrives at huge speed.

7  Novel class: -Let M be the current ensemble of classification models. A class c is an existing class if at least one of the models Mi in M has been trained with class c. Otherwise, c is a novel class.  Single model or an ensemble of models can be used.

8  Chunks of data are created.  Recent chunks are classified.  Labelling is done.  Data is ready for training.

9  K clusters are built.  Cluster summaries are saved. Pseudopoints  Also Known as Pseudopoints.  Summary contains data: - centroid of cluster. - radius of cluster. - frequency of data points.

10  Classfication of test instance Xj by Mi: -pseudopoint ‘h’Є Mi, its centroid is closest to Xj, predicted class will be the one with highest frequency in ‘h’. - point is classified by the voting of all models.  Decision Boundary of ‘Mi’: - equal to Union of feature spaces encompassed by pseudo points. Decision Boundary of ‘M’: - equal to union of Mi, where Mi belong to M.

11  Lossy Fixed : - Same feature set is used.  Lossy Local: - Each model or training chunk has its own featue set.  Lossless Homogenizing: - Both model and the incoming instance expand their feature set. - union of the feature sets is performed. - best technique.

12  Outlier Detection using Adaptive Threshold.  Novel Class Detection.  Simultaneous Novel Class Detection.

13  Check whether the instance is Outlier. - F_outlier or Outlier.  Adaptive Threshold is used.  Lesser False Alarm Rate: -Marginal False-Novel Instance. -Marginal False-Existing Instance.

14  F_outliers occur due to 3 reasons: -Noise, concept drift or concept evolution.  Get F_outliers occurring due to concept evolution.  Here we need to calculate: - Distance between Outlier and existing class pseudopoint. - Cohesion between different outliers in buffer.

15  Possibility of occurrence of multiple novel classes simultaneously.  Principle: -Cohesion between instances of same class should be high. -Distance between instances of different classes shoud be more.  Graphs are used.  Two Phases: 1. Separation phase. 2. Merging Phase.

16  Network security.  Social Media.  Credit Card Frauds etc.

17  To classify and detect Novel Classes in feature based stream data using some tool in more efficient way.

18  Majority of the algorithms used for “Classification and Detection of novel Classes” suffer from either feature-evolution or False alarm rate.  The methodology adapts properly to normal concept- drifts, but for handling abrupt drifts it takes time.  Multiple novel classes are generated and separated efficiently.

19  Work can be done on making the cluster size dynamic and adaptive.  Work can be done on handling abrupt drift efficiently.  If existing class is divided into two, then work can be done on judging whether they have same feature space, or whether they are novel or not.

20  M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Feature Based Sream Data,” IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2013.  M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge andData Eng,vol. 23, no. 6, pp. 859-874, June 2011.  M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.

21  A Review of Classification and Novel Class DetectionTechnique of Data Streams by Manish rai, Rekha Pandit2  M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams,” IEEE Trans. Knowledge andData Eng, vol. 25, no. 7, July 2009.  M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classication and Novel Class Detection in Data Streams with Active Mining,”.

22


Download ppt "Date : 21 st of May, 2014. Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak."

Similar presentations


Ads by Google