Kaggle competition Airbnb Recruiting: New User Bookings

Slides:



Advertisements
Similar presentations
IS240: Information System Analysis & Design
Advertisements

Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Machine Learning CS 165B Spring 2012
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
1 INE 1020 Introduction to Internet Engineering Tutorial 3 Discussion on Homework 1.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Machine Learning Lecture 1. Course Information Text book “Introduction to Machine Learning” by Ethem Alpaydin, MIT Press. Reference book “Data Mining.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
D. Heynderickx DH Consultancy, Leuven, Belgium 22 April 2010EuroPlanet, London, UK.
Team Dogecoin: An Experience in Predicting Hospital Readmissions Acknowledgements The Problem Hospitals in the UK must keep track of which patients, once.
Titanic: Machine Learning from Disaster
Konstantina Christakopoulou Liang Zeng Group G21
Kaggle Competition Prudential Life Insurance Assessment
CPS 216: Advanced Database Systems Shivnath Babu.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Mining of Massive Datasets Edited based on Leskovec’s from
Kaggle Competition Rossmann Store Sales.
Holt et al. with Connect Psychology Connect for Psychology provides an engaging and interactive environment for students to master the chapter content.
Show Me Potential Customers Data Mining Approach Leila Etaati.
Edmodo Learning with Social Networking. General Information Tool Name: Edmodo URL:
COM621 – Interactive Web Development 2015/2016 Module Co-Ordinator: Dr. Pratheepan Yogarajah Room:
Kaggle Winner Presentation Template. Agenda 1.Background 2.Summary 3.Feature selection & engineering 4.Training methods 5.Important findings 6.Simple.
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Usman Roshan Dept. of Computer Science NJIT
Holt et al. with Connect Psychology
Tech Level 3 Cyber Security
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Restaurant Revenue Prediction using Machine Learning Algorithms
Recommendation in Scholarly Big Data
Software Configuration Management
Introduction to Machine Learning
SPMS ACCOMMODATION MODULE
BEST SEO COMPANY IN UDAIPUR
Calibration from Probabilistic Classification
Advanced data mining with TagHelper and Weka
The Development Environment and Your First C Program
Source: Procedia Computer Science(2015)70:
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
© 2013 ExcelR Solutions. All Rights Reserved An Introduction to Creating a Perfect Decision Tree.
Data Mining: Concepts and Techniques Course Outline
Combining Base Learners
CIKM Competition 2014 Second Place Solution
Machine Learning practical
Data Analytics at CNU Dmitriy Shaltayev
Machine Learning with Weka
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Machine Learning to Predict Experimental Protein-Ligand Complexes
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Analytics: Its More than Just Modeling
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Lecture 06: Bagging and Boosting
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Core Methods in Educational Data Mining
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Welcome.
A Machine Learning Analysis of US Census Salary Data.
March Madness Data Crunch Overview
Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei
Presentation transcript:

Kaggle competition Airbnb Recruiting: New User Bookings Advanced Network Database Lab Kaggle competition Airbnb Recruiting: New User Bookings Where will a new guest book their first travel experience?

Registration Site: https://www.kaggle.com/competitions Account: IKDD1(Group Number)

Airbnb AirBed&Breakfast https://www.airbnb.com.tw/ Book rooms with locals, rather than hotels https://www.airbnb.com.tw/

Airbnb Competition url: https://www.kaggle.com/c/airbnb-recruiting- new-user-bookings Data url: https://www.kaggle.com/c/airbnb-recruiting- new-user-bookings/data Leaderboard: https://www.kaggle.com/c/airbnb-recruiting- new-user-bookings/leaderboard

Data Attribute

Classification

Prediction

Decision Tree

Sklearn – Python tool Simple and efficient tools for data mining and data analysis! Decision tree url : http://scikit- learn.org/stable/modules/tree.html

Homework 1 Registration Apply a simple algorithm to build the classifier Use the classifier to predict the country a new user will make his or her first booking Submit the result to Kaggle Deadline: next Thursday (12/10)

Homework 2 Oral report Deadline: next Thursday (12/17)

Homework 3 Try different algorithms to build the best classifier Use the classifier to predict the survival passengers Submit the result to Kaggle

Final project Deadline: 12/23 23:59 Submission: Submit the results to kaggle Email your project to cwchang.ncku@gmail.com Project file content: code prediction result report

Report The details of the your best method The description of the methods that you tried The important attributes or surprised features you found

Grading Homework 1: 20% Homework 2: 10% Final Project : 70% The ranking: 30% Algorithm and coding : 30% Report: 10%

XGBoost General purpose gradient boosting library, including generalized linear model and gradient boosted decision tree SITE: http://dmlc.ml/

tslm A linear model with time series components SITE: http://www.inside- r.org/packages/cran/forecast/docs/tslm

H2o.randomForest Random Forest (RF) is a powerful classification tool. When given a set of data, RF generates a forest of classification trees, rather than a single classification tree. Each of these trees generates a classification for a given set of attributes. The classification from each H2O tree can be thought of as a vote; the most votes determines the classification. SITE: http://docs.h2o.ai/h2oclassic/datascience/rf.ht ml