1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Analysis of World Cup Finals. Outline Project Understanding – World Cup History Data Understanding – How to collect the data Data Manipulation – Data.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Data Analysis of Tennis Matches Fatih Çalışır. 1.ATP World Tour 250  ATP 250 Brisbane  ATP 250 Sydney... 2.ATP World Tour 500  ATP 500 Memphis  ATP.
Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Final Project- Mining Mushroom World. Agenda Motivation and Background Determine the Data Set (2) 10 DM Methodology steps (19) Conclusion.
SVM—Support Vector Machines
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Contraceptive Method Choice 指導教授 黃三益博士 組員 :B 王俐文 B 謝孟凌 B 陳怡珺.
Introduction to Data Mining with XLMiner
By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.
Decision Tree Rong Jin. Determine Milage Per Gallon.
A Classification Approach for Effective Noninvasive Diagnosis of Coronary Artery Disease Advisor: 黃三益 教授 Student: 李建祥 D 楊宗憲 D 張珀銀 D
Decision Tree Algorithm
Mining Baseball Statistics
Data Mining.
Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Chapter 5 Data mining : A Closer Look.
指導教授:黃三益 教授 學生: M 陳聖現 M 王啟樵 M 呂佳如.
An Exercise in Machine Learning
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
by B. Zadrozny and C. Elkan
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Weka Project assignment 3
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Chapter 9 – Classification and Regression Trees
DATA MINING FINAL REPORT Vipin Saini M 許博淞 M 陳昀志 M
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Slides for “Data Mining” by I. H. Witten and E. Frank.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Analyzing Stock Quotes using Data Mining Techniques Name of Student: To Yi Fun University Number: First Presentation, Final Year Project, 2013.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Worcester Polytechnic Institute CS548 Spring 2016 Decision Trees Showcase By Yi Jiang and Brandon Boos ---- Showcase work by Zhun Yu, Fariborz Haghighat,
Classification Using Top Scoring Pair Based Methods Tina Gui.
Team: flyingsky Reporter: YanJie Fu & ChuanRen Liu Institution: Chinese Academy of Sciences.
Machine Learning: Ensemble Methods
Waging WAR: MLB Player Valuation Using Advanced Metrics
Evolving Decision Rules (EDR)
Presented by Khawar Shakeel
Advanced data mining with TagHelper and Weka
Worthing College Sports Science Liam Lee 2015
NBA Draft Prediction BIT 5534 May 2nd 2018
Prepared by: Mahmoud Rafeek Al-Farra
The Math of Baseball Will Cranford 11/1/2018.
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Aleysha Becker Ece 539, Fall 2018
Somi Jacob and Christian Bach
Data Mining Classification: Alternative Techniques
Identifying Severe Weather Radar Characteristics
Machine Learning in Business John C. Hull
Presentation transcript:

1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖

2 Introduction Baseball sport in Taiwan  CPBL (Chinese Professional Baseball League) MLB (Major League Baseball)  Baseball sport in USA Cy Young Award since 1956  Baseball Writers Association of America  Weighted scores  Each league has one winner per year.

3 Measurements There are no definite rules be used to judge. Nevertheless, many measurements could be used to judge whether a pitcher is good or not.  Wins  ERA  WHIP  G/F etc.

4 Aim of the study To analysis the historical statistics of pitchers. Building a predictive model. To predict the Cy Young Award winner of the year in the future.

5 Data mining procedure Ten data mining methodology steps

6 Step 1 : Translate the Problem Directed data mining problem  Target variable: Cy Young Award  Classification  Decision tree Purposes  Gambling game  Predictive activities

7 Step 2 : Select Appropriate Data Just MLB statistics data (1871 ~ 2006)  Cy Young Award: 1956 ~ 2006 total records List of Cy Young Award winners “Time” factor  1999 as the dividing year. Because of the emerging items. Variables: to remove the items that are not representative of a pitcher.

8 Step 3 : Get to know the data The materials that we used all come from MLB official site These data have already been disclosed for a lot of years The quality of data is very good some attributes has value since 1999

9 Step 4 : Create a model set We divide the data into training data and testing data We do not create a balanced sample The record of MLB is not the seasonal materials we will pick the materials since 1999

10 Step 5 : Fix problems with the data These data are taken from MLB official side No missing values single source

11 Step 6 : Transform data to bring information to the surface There are no combinations of attributes We delete some attributes We add a attribute-Year We add a attribute (CyYoungAward_Winner) for classification

12 Step 7 : Build Models Tools Used Weka Crash Problem Blank Attributes Build Model Handling Blank Attributes

13 Tools Used

14 Weka Crash Problem Raw data  data instances  42 attributes Weka crashed during model construction Give Weka more memory

15 Blank Attributes

16 Build Model MLB 1956~2006  with blank attributes  ADTree MLB 1956~2006  without blank attributes  ADTree MLB 1999~2006  ADTree

17 Handling Blank Attributes

~2006, with blank attributes, ADTree

~2006, with blank attributes, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as NONWINNER 5834WINNER

~2006, without blank attributes, ADTree

~2006, without blank attributes, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as NONWINNER 6230WINNER

~2006, ADTree

~2006, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as NONWINNER 133WINNER

24 Not good enough for gambling Step 8 : Assess Models(1/2) === Confusion Matrix === NONWINNERWINNER<-- classified as NONWINNER 6230WINNER === Confusion Matrix === NONWINNERWINNER<-- classified as NONWINNER 133WINNER

25 Step 8 : Assess Models(2/2) Some attributes are more important Number of Appearance of Attributes in Different Models WBBWPCTOBAWHIPK/9ERAGF 1956~2006 ADTree ~2006 Without Blank Attributes ADTree ~2006 ADTree ~2006 Without Blank Attributes J

26 Step 9 : Deploy Models To implement a computer program with the built model. To predict the Cy Young Award winner more easily.

27 Step 10 : Assess Results To compare the predictive and the final Cy Young Award winner directly. Not “business” but “interest”.  Assessment from the judgment of the person.

28 Conclusions We have used the classification technology to set up the model of predicting We find the accuracy of the built model is not high Some factors that we are not to consider It can not use in the place with essential benefits Just for fun