Data Analysis of Tennis Matches Fatih Çalışır. 1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Analysis of World Cup Finals. Outline Project Understanding – World Cup History Data Understanding – How to collect the data Data Manipulation – Data.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Indian Statistical Institute Kolkata
Chapter 1 Data Presentation Statistics and Data Measurement Levels Summarizing Data Symmetry and Skewness.
Introduction to Data Analysis
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Subgroup Discovery Finding Local Patterns in Data.
What is Statistical Modeling
Introduction to Data Mining with XLMiner
Lecture Notes for Chapter 2 Introduction to Data Mining
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Three kinds of learning
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
ROC Curves.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
Chapter 5 Data mining : A Closer Look.
SCATTER PLOTS AND LINES OF BEST FIT
Exploratory Data Analysis: Two Variables
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)
Implementing Query Classification HYP: End of Semester Update prepared Minh.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Loan Default Model Saed Sayad 1www.ismartsoft.com.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Unit 4: Describing Data After 8 long weeks, we have finally finished Unit 3: Linear & Exponential Functions. Now on to Unit 4 which will last 3 weeks.
Roger Federer VS Rafael Nadal
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
Subjects Review Introduction to Statistical Learning Midterm: Thursday, October 15th :00-16:00 ADV2.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Unit 3 Section : Regression  Regression – statistical method used to describe the nature of the relationship between variables.  Positive.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
Using Appropriate Displays. The rabbit’s max speed is easier to find on the ________. The number of animals with 15 mph max speed or less was easier to.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
A Smart Tool to Predict Salary Trends of H1-B Holders
Lecture 21: GIS Analytical Functionality (V)
Distributions cont.: Continuous and Multivariate
Features & Decision regions
Katie Monier Katie Monier The ATP Rankings are the Association of Tennis Professionals' (ATP) merit-based method for determining the rankings in men's.
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Random Rectangles When given the cue turn the paper over. Within 5 seconds make a guess for the average area of the rectangles. When given the cue turn.
Somi Jacob and Christian Bach
Unit 4: Describing Data After 10 long weeks, we have finally finished Unit 3: Linear & Exponential Functions. Now on to Unit 4 which will last 5 weeks.
Part I Review Highlights, Chap 1, 2
Chapter 7: Transformations
Identifying Severe Weather Radar Characteristics
Objective 1: Use Weka’s WrapperSubsetEval (Naïve Bayes
About Data Analysis.
Presentation transcript:

Data Analysis of Tennis Matches Fatih Çalışır

1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP 500 Dubai Domain of the Data 4 Types of Tennis Tournaments

3.ATP World Tour 1000  ATP 1000 Paris  ATP 1000 Shanghai... 4.Grand Slams  Australian Open  Roland Garros  Wimbeldon  US Open Domain of the Data

Men’s Single Year ATP 500 Tournament 9 ATP 1000 Tournament 4 Grand Slams Domain of the Data

Source of Data Internet Official Websites of the Players ATP( Association of Tennis Professionals ) Homa Page 2010 Result Archive

Data Construction From different tables Each table from different website Combining easily

Data Construction Players Table

Data Construction Tournament Results Table

Data Construction Tournament Info Table

Data Construction Final Data Table 29 features 1453 instances

Aim of the Project Classification Finding weights for attributes

Missing Values Players’ Height Players’ Weight Players’ BMI Players’ Date of being Professional

Missing Values Players’ Height  Consider players with same weight  Take the average Players’ Weight  Consider players with same height  Take the average

Missing Values Players’ Height and Weight  If both of them are missing  Remove the row Players’ Date of beign Professional  Consider players with same age  Take the average

Data Understanding Min,Max,Median,Average values for numeric attributes

Data Understanding Occurrence table for categorical and numeric attributes

Data Understanding Histogram for numeric attributes

Data Understanding Box Plot for main characteristics of numerical attributes

Data Understanding Scatter Plot to relate two attributes

Feature Selection Linear Correlation

Feature Selection Backward Elemination Naive Bayes for Ranking

Feature Selection 28 attributes reduced to 19 attributes Atrributes are meaningful

Weight of Attributes RIMARC to find weights

Classification KNIME Decision Tree – C4.5 Gain Ratio Qualitiy Meauser

Classification 1017 instances for training 436 instances for testing 842 positive instances 611 negative instances Training and test data is randomly selected

Classification Decision Tree

Classification

Classification Confusion Matrix

Classification

Classification Accuracy Statistics

Classification Naive Bayes Classifier  Confusion Matrix

Classification

Classification Accuracy Statistics

Classification C4.5 vs Naive Bayes Decision Tree (C4.5)Naive Bayes

Classification C4.5 vs Naive Bayes Decision Tree (C4.5) Naive Bayes