Www.postersession.com The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers.
Chapter 7 – Classification and Regression Trees
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.
Ensemble Learning: An Introduction
Classification: Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Three kinds of learning
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Ensemble Learning (2), Tree and Forest
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning CUNY Graduate Center Lecture 1: Introduction.
1 Statistics 202: Statistical Aspects of Data Mining Professor David Mease Tuesday, Thursday 9:00-10:15 AM Terman 156 Lecture 11 = Finish ch. 4 and start.
Why did the Titanic sink? Koh, Eunjin. RMS Titanic Titanic was an Olympic class passenger liner that became infamous for its collision with an iceberg.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Support Vector Machines: a different approach to finding the decision boundary, particularly good at generalisation finishing off last lecture …
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Machine Learning Queens College Lecture 2: Decision Trees.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Chapter 6 Data Mining 1. Introduction The increase in the use of data-mining techniques in business has been caused largely by three events. The explosion.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
Titanic: Machine Learning from Disaster
Linear Discriminant Analysis and Logistic Regression.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Decision Tree Learning
RMS Titanic 15 April What do you know about the Titanic?
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
1 Illustration of the Classification Task: Learning Algorithm Model.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
R.M.S. TITANIC OUR JOURNEY. MEET THE PASSENGERS ABOARD THE UNSINKABLE.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Trees by Muhammad Owais Zahid
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
Introduction to Machine Learning
Introduction to Machine Learning and Tree Based Methods
Decision Trees.
Predict whom survived the Titanic Disaster
Ch9: Decision Trees 9.1 Introduction A decision tree:
TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
Introduction to Data Science Lecture 7 Machine Learning Overview
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Introduction Feature Extraction Discussions Conclusions Results
Classification and Prediction
ID3 Algorithm.
Random Survival Forests
Application of Logistic Regression Model to Titanic Data
Decision Trees By Cole Daily CSCI 446.
Model generalization Brief summary of methods
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
INTRODUCTION TO Machine Learning 2nd Edition
Decision trees MARIO REGIN.
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Machine Learning in Business John C. Hull
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system and evaluate its ability to predict survival accurately. The training data had 891 passenger observations. Each observation contained information regarding passenger’s identification number, survival status, the class level in which the passenger stayed, the passenger’s name, the passenger’s sex, the passenger’s age, the number of siblings the passenger had present, the number of parents and siblings the passenger had present, ticket number, fare amount, passenger’s cabin, and from where the passenger boarded. The data was given in a “.csv” file. Prediction Algorithms Conclusions Titanic: Machine Learning from Disaster Aleksandr Smirnov, Dylan Kenny, Matthew Kiggans Louisiana State University, MATH 4020, Professor Peter Wolenski Final Results Decision trees are the basis of the prediction method used. The accuracy of this prediction method can be increased by growing more trees and averaging their result. There are two effective ways of increasing the number of trees. Random Forests grow trees based on randomly selected subsets of training data from within the overall set, while Extra Trees randomly selects subsets of the data and use random splits for each node in its decision trees. Averaging results from several decision trees helps to deal with overfitting. Below is a graphical example of overfitting in linear regression. While the curve perfectly fits the training data, it will have poor performance on the test data. Information Theory Linear Regression Order online at Entropy Gini Decision trees Decision trees show possible outcomes by examining different methods of evaluation. The branches of the tree can be found by using Shannon entropy or the Gini function. FemaleMale Class 1 or 2Class 3 Class 1 Class 2 or 3 The goal is to progress till the lowest entropies are achieved for the best prediction. Here it is seen, an accurate prediction can be made about the survival of females in class 1 or 2. Introduction Performed featured engineering techniques Changed alphabetic values to numeric Calculated family size Extracted title from name and deck label from ticket number Used linear regression algorithm to fill in missing ages Used several prediction algorithms in python Decision tree Random forests Extra trees Achieved our best score % correct predictions Feature Engineering To achieve the best results, data should be manipulated into a format that works effectively and efficiently with the programing language Python. Feature engineering encapsulates altering alphabetic into numeric data, pulling a title out of a passenger’s name, calculating family size, and filling in missing values for age and fare. The final data was machine friendly. The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. Nodes in the decision tree are chosen to maximize information gain criteria.