Titanic: Machine Learning from Disaster

Slides:



Advertisements
Similar presentations
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Advertisements

Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Weka solution for the 2004 KDD Cup Protein Homology Prediction task Bernhard Pfahringer Weka Group, University of Waikato, New Zealand.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Machine Learning CS 165B Spring 2012
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
1 INE 1020 Introduction to Internet Engineering Tutorial 3 Discussion on Homework 1.
Introduction to KDD project #1 KDD Cup Predicting Excitement at DonorsChoose.org.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Digital Library Syllabus Uploader Will Cameron CSC 8530 October 19, 2006 Project Presentation 2.
Scaling up Decision Trees. Decision tree learning.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Team Dogecoin: An Experience in Predicting Hospital Readmissions Acknowledgements The Problem Hospitals in the UK must keep track of which patients, once.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster.
Konstantina Christakopoulou Liang Zeng Group G21
Kaggle Competition Prudential Life Insurance Assessment
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
CS210: Programming Languages Overview of class Dr. Robert Heckendorn.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Mining of Massive Datasets Edited based on Leskovec’s from
Kaggle Competition Rossmann Store Sales.
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Edmodo Learning with Social Networking. General Information Tool Name: Edmodo URL:
Random Forests Feb., 2016 Roger Bohn Big Data Analytics 1.
A competition to encourage Young Professionals across the globe to address a real world engineering challenge Organized by the IET Young Professionals.
Titanic and Decision Trees Supplement. Titanic Predictions and Decision Trees Variable Selection Approaches – Hypothesis Driven – Data Driven – Kitchen.
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Kaggle competition Airbnb Recruiting: New User Bookings
Usman Roshan Dept. of Computer Science NJIT
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Restaurant Revenue Prediction using Machine Learning Algorithms
Advanced data mining with TagHelper and Weka
Trees, bagging, boosting, and stacking
Predict whom survived the Titanic Disaster
TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
Source: Procedia Computer Science(2015)70:
Basic machine learning background with Python scikit-learn
© 2013 ExcelR Solutions. All Rights Reserved An Introduction to Creating a Perfect Decision Tree.
© 2013 ExcelR Solutions. All Rights Reserved Examples of Random Forest.
© 2013 ExcelR Solutions. All Rights Reserved Data Mining - Supervised Decision Tree & Random Forest.
CIKM Competition 2014 Second Place Solution
CIKM Competition 2014 Second Place Solution
Machine Learning practical
Machine Learning with Weka
Final Project Description
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Machine Learning to Predict Experimental Protein-Ligand Complexes
Application of Logistic Regression Model to Titanic Data
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Recitation #1 Tel Aviv University 2016/2017 Slava Novgorodov
Predicting Loan Defaults
Competition Based Teaching of Machine Learning
Welcome.
March Madness Data Crunch Overview
Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei
Presentation transcript:

Titanic: Machine Learning from Disaster Kaggle Competition Titanic: Machine Learning from Disaster

kaggle What is Kaggle? A data science competitions : Upload your predictions. Scores your solution Shows your score on the leaderboard

Registration Site: https://www.kaggle.com/competitions Account: IKDD1(Group Number)

Titanic Competition url: https://www.kaggle.com/c/titanic Data url: https://www.kaggle.com/c/titanic/data Leaderboard: https://www.kaggle.com/c/titanic/leaderboard

Classification

Prediction

Titanic Attribute Description:

Decision Tree

Sklearn – Python tool Simple and efficient tools for data mining and data analysis! Decision tree url : http://scikit- learn.org/stable/modules/tree.html

Provided by Kaggle gendermodel - python genderclassmodel - python myfirstforest - python

Homework 1 Registration Apply a simple algorithm to build the classifier Use the classifier to predict the survival passengers Submit the result to Kaggle Deadline: next Thursday (11/19)

Homework 2 Oral report The illustration of x-level decision tree Deadline: next Thursday (11/26)

Final project Registration Try different algorithms to build the best classifier Use the classifier to predict the survival passengers Submit the result to Kaggle

Final project Deadline: 12/2 23:59 Submission: Submit the results to kaggle Email your project to sydang.ncku@gmail.com Project file content: code prediction result report

Grading Homework 1: 20% Homework 1: 10% Final Project : 70% The ranking: 30% Algorithm and coding : 30% Report: 10%

Report The details of the your best method The description of the methods that you tried The important attributes or surprised features you found

XGBoost General purpose gradient boosting library, including generalized linear model and gradient boosted decision tree SITE: http://dmlc.ml/

tslm A linear model with time series components SITE: http://www.inside- r.org/packages/cran/forecast/docs/tslm

randomForest Random Forest (RF) is a powerful classification tool. When given a set of data, RF generates a forest of classification trees, rather than a single classification tree. Each of these trees generates a classification for a given set of attributes. The classification from each tree can be thought of as a vote; the most votes determines the classification. SITE: http://www.r-bloggers.com/a-brief-tour-of-the- trees-and-forests/

Important attribute Pclass Sex Fare Embarked

Important attribute Title ('Capt', 'Don', 'Major', 'Sir’,'Dona', 'Lady', 'the Countess', 'Jonkheer’) Mother (Sex='female' & Parch>0 & Age>18 & Title!='Miss') Child (Parch>0 & Age<=18) FamilyNum (Parch+SibSp+1) Pclass (Pclass & age & sex)