MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.

Slides:

Advertisements

Similar presentations

DECISION TREES. Decision trees  One possible representation for hypotheses.

Advertisements

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.

Decision Tree Algorithm

Feature Selection for Regression Problems

Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Chapter 5 Data mining : A Closer Look.

Ensemble Learning (2), Tree and Forest

Evaluating Performance for Data Mining Techniques

Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.

Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.

Algorithmic Detection of Semantic Similarity WWW 2005.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.

Data Mining and Decision Support

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Data Transformation: Normalization

k-Nearest neighbors and decision tree

Data Science Algorithms: The Basic Methods

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Data Mining (and machine learning)

Advanced Artificial Intelligence Classification

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Panagiotis G. Ipeirotis Luis Gravano

Text Categorization Berlin Chen 2003 Reference:

Chapter 7: Transformations

Feature Selection Methods

Data Pre-processing Lecture Notes for Chapter 2

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

Lecture 16. Classification (II): Practical Considerations

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presentation transcript:

MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy Sit, Kevin Mack

BACKGROUND AND PROBLEM DEFINITION “A large body of research in supervised learning deals with the analysis of single label data, where training examples are associated with a single label l from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y (union) L. Such data are called multi-label” (Tsoumakas et al). Applications in ranking web pages. Web pages are often multi labeled. For example “cooking” and “food network” and “iron chef” might all apply to the same page. How do you rank and classify that along other pages that have some of the same labels, but not all of the same labels?

TECHNICAL HIGHLIGHTS OF PROBLEM SOLVING Problem Transformation: divide the problem into several single label problems and solve them using known algorithms. Algorithm Adaption: Change an existing algorithm so you can use it on a multi label problem. Dimensionality Reduction: Reduce the number of random variables in the data set or reduce the number of dimensions in the labels. The goal here to remove white noise so you can focus on the relationships that matter or concern you. Evaluation Measures: How good of a job did you do? How accurately does your model classify examples? Ex. How often are labels miss classified? How often does a less important label get a higher rank than a more important label?

ILLUSTRATION OF METHODS INTRODUCED Problem Transformation: divide the problem into several single label problems and solve them using known algorithms. Label Powerset - Treats multi labels as if they are a single label, and then ranks them according to highest support. Similar to first step of Apori. Binary Relevance – Assigns a + or - classifier to each label. If an instance has that label it gets a +, if not it gets a -. Similar to 1 rule. Algorithm Adaption: Change an existing algorithm so you can use it on a multi label problem. Adapted C4.5 Tree – Multi labels are leaves, and multi labels are ordered so as to reduce entropy where p(l j) = relative frequency of class l j and q(l j) = 1¡ p(l j). ML – KNN: Same as regular KNN, choose x nearest neighbors, and then use an aggregation algorithm like ML. ML uses the posteriror principle, which is concerned with what can be known about the data set without learning (prior probabilities) and after learning (posterior probabilities).

DIMENSIONALITY REDUCTION TECHNIQUES Feature Selection: Select a subset of the dimensions for some purpose. Ex. To minimize a loss function Wrapper – A guided search of feature set. Select based on some criteria Filter – Look for something specific in the data set. Ex. An informed search based on the result of LP learning. Feature Extraction: Transform the data set into a lower dimensional data set using some algorithm or reasoning. Uses various statistical and linear algebra techniques. Exploiting Label Structure: Create a general to specific tree. An example cannot be associated with a label (some leaf node) without being associated with its parent nodes. Create a relationship be tracing the path from root to leaf.

SCALING UP PROBLEMS If you are analyzing a complicated data set with thousands of labels you can run into problems. Number of training examples is significantly more than the number of actual examples: far more combinations of labels and classifiers than actual labels and classifiers. Output is to complex and not helpful. Problem complexity and performance: It can take too much memory and/or processing time to classify everything. One technique for reducing complexity is to use a hierarchy tree, such as HOMER (hierarchy of multi label classifiers). Each layer is a subset of the labeled data set. Uses predictive algorithms to decide how to divide up the data set as you descend down the tree.

MULTI LABEL DATA MINING SOFTWARE BoosTexterMatlab Mulan

TEAM’S OPINION ON THE METHOD/RESEARCH WORK Thorough coverage of the spectrum of techniques and considerations in multi label data mining. Great place to discover new techniques and algorithms. Didn’t go into very much detail into any one algorithm. Seemed best suited as a resource or an introduction. Didn’t compare or contrast methods. Wasn’t sure when to use problem transformation, algorithm adaption, or reduction.