TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK 2 nd May 2013.
Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Predicting Risk of Re-hospitalization for Congestive Heart Failure Patients (in collaboration with ) Jayshree Agarwal Senjuti Basu Roy, Ankur Teredesai,
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Large-Scale Entity-Based Online Social Network Profile Linkage.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Classification of the aesthetic value of images based on histogram features By Xavier Clements & Tristan Penman Supervisors: Vic Ciesielski, Xiadong Li.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
CS771 Machine Learning : Tools, Techniques & Application Gaurav Krishna Y Harshit Maheshwari Pulkit Jain Sayantan Marik
CSE881 project Diabetes Risk Classification Jingshu Chen Qingpeng Zhang Ming Wu STATE UNIVERSITY.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CSc288 Term Project Data mining on predict Voice-over-IP Phones market Huaqin Xu.
An Example of Course Project Face Identification.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
Logan Lebanoff Mentor: Haroon Idrees. Two-layer method  Trying a method that will have two layers of neural networks.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Titanic: Machine Learning from Disaster
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Week8 Fatemeh Yazdiananari.  Fixed the issues with classifiers  We retrained SVMs with the new UCF101 histograms  On temporally untrimmed videos: ◦
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Sentiment Analysis of Twitter Messages Using Word2Vec
An Empirical Comparison of Supervised Learning Algorithms
Machine Learning: Methodology Chapter
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector Matt Thomson 17/11/2016.
Predict whom survived the Titanic Disaster
Source: Procedia Computer Science(2015)70:
Mixture of SVMs for Face Class Modeling
Machine learning project excersie
Basic machine learning background with Python scikit-learn
Schizophrenia Classification Using
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
CAMCOS Report Day December 9th, 2015 San Jose State University
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Classification of Networks using Machine Learning
A Unifying View on Instance Selection
CellNetQL Image Segmentation without Feature Definition
Principal Component Analysis
Image Classification via Attribute Detection
Application of Logistic Regression Model to Titanic Data
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Machine Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Classifying bacteriophage based on genomic information
Classification Breakdown
Earthen Mounds Recognition Using LiDAR Images
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Identifying Severe Weather Radar Characteristics
Preposition error correction using Graph Convolutional Networks
By Hossein Hematialam and Wlodek Zadrozny Presented by
Machine Learning: Methodology Chapter
CAMCOS Report Day December 9th, 2015 San Jose State University
Practice Project Overview
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Credit Card Fraudulent Transaction Detection
Presenter: Donovan Orn
An introduction to Machine Learning (ML)
Presentation transcript:

TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2 MyTeam TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2

Titanic Kaggle Challenge Task What sorts of people were likely to survive? Binary classification Database Instances are people Basic features: Ticket class, Sex, Age, #siblings, #parents, Ticket number, Passanger fare, Port of embarkation Tools python and scikit-learn

Evaluation methodology 75/25 train/eval split: Train set: 891 instances Evaluation set: 419 instances Evaluation metric: accuracy (ratio of correctly classified instances)

Baseline Most frequent class classifier: 64.13% Baseline system: NaiveBayes classifier Given features as they are accuracy of 78.48%

Feature Extraction (TeamMember1) Split level and number of cabin Pl. B45 -> B, 45 (2 features) Missing age attributes Replacement with mean Mean grouped by sex and ticket type Bins for age child < 18 <= adult < 60 <= senior Title as new feature Dr, Mr, Mrs, stb.

Extra techniques (TeamMember1) Feature selection keep only k-best features evaluation of features: Chi2 PMI

Results -- Features (TeamMember1)

Results – Extra techniques (TeamMember1)

Machine Learning models (TeamMember2) NaiveBayes SVM Neural Nets (3-layer) C 4.5 RandomForestClassifier K-NN GridSearch for parameter-tuning for each of the classifiers

Extra techniques (TeamMember2) Dimension reduction PCA SVD t-SNE

Results – Models (TeamMember2)

Results – Extra techniques (TeamMember2)

Results – class level evaluation (TeamMember2) Best system features: … t-SNE to 6 dimensions Random Forest accuracy of 85.2% Class Precision Recall F1 0.78 0.92 0.85 1 0.80 0.54 0.64

Conclusions Best features Best classifier cabin level title Best classifier RandomForest Feature selection and dimension reduction could further help 78.5% -> 85.2%