Ph. D Student: TA Minh Thuy: USTH 2010 Director of thesis: Prof. LE Thi Hoai An Co-director: Dr. Lydia Boujeloud – Assala LITA, EA3097 - UFR MIM University.

Slides:



Advertisements
Similar presentations
Presentation at Society of The Query conference, Amsterdam November 13-14, 2009 (original title: Learning from Google: software design as a methodology.
Advertisements

Managerial Accounting and Cost Classification
Mining Association Rules from Microarray Gene Expression Data.
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
Spatiotemporal Pattern Mining For Travel Behavior Prediction UIC IGERT Seminar 02/14/2007 Chad Williams.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Xyleme A Dynamic Warehouse for XML Data of the Web.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Business Intelligence
Patterns for Developing Ideas in Writing
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Models of Influence in Online Social Networks
The Evolution of Management Thinking
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Multimedia Databases (MMDB)
© 2016 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Machine Learning An Introduction. What is Learning?  Herbert Simon: “Learning is any process by which a system improves performance from experience.”
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Implementing Query Classification HYP: End of Semester Update prepared Minh.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Software Testing Definition Software Testing Module ( ) Dr. Samer Odeh Hanna.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 4 Decision Support System & Artificial Intelligence.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Distributed Models for Decision Support Jose Cuena & Sascha Ossowski Pesented by: Gal Moshitch & Rica Gonen.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Data Mining and Decision Support
Introduction to Economics FREC 150 Dr. Steven E. Hastings Introduction to Agricultural and Natural Resources.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.
Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.
AUDIT QUALITY AND ASSURANCE 2 ND AND 3 RD OCTOBER 2014 HILTON HOTEL MATERIALITY IN PLANNING AND PERFORMING THE AUDIT (ISA 320) 1.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Department of Applied English (International Business) Ming-Chuan University, April 10, 2010.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Survey on Different Data Mining Techniques for E- Crimes
Queensland University of Technology
Automatic cLasification d
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Introduction C.Eng 714 Spring 2010.
Machine Learning Ali Ghodsi Department of Statistics
Adrian Tuhtan CS157A Section1
The Evolution of Management Thinking
How to communicate about a Bilan Carbone® assessment
How to communicate about a Bilan Carbone® assessment
Data Mining: Introduction
Aim: How do scientists work?
State University of Telecommunications
Presentation transcript:

Ph. D Student: TA Minh Thuy: USTH 2010 Director of thesis: Prof. LE Thi Hoai An Co-director: Dr. Lydia Boujeloud – Assala LITA, EA UFR MIM University Paul Verlaine - Metz - France Techniques d’optimisation et de recherche opérationnelle en fouille de données évolutives et temporelles

About me Objective: Development new models Development new optimization methods Problems: unsupervised classification and selection of variables for data mining evolution and temporal (data stream). Start date: 1 Dec 2010 Team work: Algorithms and Optimization Category: Information Technology. Fields of research: Data Mining, Data Stream, Clustering, Classification, Feature Selection 2

Context For many recent applications, the concept of a data stream is more appropriate than a data set. The volume of such data is so large that it may be impossible to store the data on disk. Furthermore, even when the data can be stored, the volume of the incoming data may be so large that it may be impossible to process any particular record more than once. The fact that the data in the streams show the temporal correlations. Such temporal correlations can help detect the important data evolution characteristics, and can used to develop efficient mining algorithms. 3

Context The stream model is motivated by emerging applications involving massive data sets; Examples: telephone records, customer click streams, multimedia data, financial transactions,... In these cases, the data have a evolving continuously. Examples, the dynamism of the services: content, structure, promotions,... or the change of user’s behavior, client’s interest,...or depend on time: time of the day, day of the week,...or depend on the events: summer vacations, new year,... Therefore, the data stream poses some special challenges of data mining algorithms. It its necessary to design the mining algorithms effectively in order to account for changes in underlying structure of the data stream. 4

Problems: Problem 1: Clustering data stream. The existing methods of mining data streams focus on the whole period of data. Consequently : only detected those predominant in the entire period of analysis. The behaviors occurring in short periods of time are not detected. Model for clustering data stream problem: fix windows Dividing the analyzed time period into more significant sub periods, with the aim of detect the evolution of old patterns or the emergence of the new ones, which would not have been revealed by a global analysis over the whole time period. 5

Problems: Problem 2: Detecting changes in data streams. In data stream, the data patterns may evolve over time. How about the change of data over time? - Disappears in a cluster of behavior - Appearance in a cluster of behavior - Splitting a cluster of behavior - Combine two or more clusters of behavior - No change Model for detection change data stream problem: sliding windows 6

Problems Problem 3: Feature selection based clustering. An object can be presented by variables of different types (quantitative, qualitative or structured). The nature of the variables is bound to influence the definition of similarity between objects and the choice is very important. The question is to choose among those relevant variables and eliminating those that are redundant. Applications include: medical diagnosis (cancer risk assessment, detection of cardiac arrhythmia,…) text categorization (classification of - spam or not, classification of web pages,…) pattern recognition (face recognition, handwritten digit,...) …. 7

Methodology Using mathematic techniques to process the data mining problem, including optimization techniques. A lot of optimization problems in real-world is non convex. To solve the optimization problem non convex, we study mathematical techniques DC programming and DCA (Difference convex algorithm). DC Programming and DCA (DC Algorithms) introduced in 1985 by Pham Dinh Tao and developed by Le Thi Hoai An and Pham Dinh Tao since 1994 to become a classic and now increasingly popular. 8

Results: TA Minh Thuy, LE-THI Hoai An, Lydia Boudjeloud-Assala: Clustering Data Stream Based on Sub-Windows: A DC Programming Approach – 15th Austrian - French - German conference on Optimization, International conference AFG11 - Toulouse, France, Septembre 2011, pp