1 Data Mining Know thy tools Stop treating data miners as black boxes. Looking inside is (1) fun, (2) easy, (3) needed. 2.

Slides:



Advertisements
Similar presentations
Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.
Advertisements

1 © 2009 APQC. ALL RIGHTS RESERVED. Measurement Assessment Tool to Connect Questions to Measures u Based on “Tree Diagram” format u Can start at different.
Evaluation of Clustering Techniques on DMOZ Data  Alper Rifat Uluçınar  Rıfat Özcan  Mustafa Canım.
Motivating software developers Dr Tracy Hall Adjunct Professor, University of Oslo Reader, Brunel University, UK.
T. E. Potok - University of Tennessee Software Engineering Dr. Thomas E. Potok Adjunct Professor UT Research Staff Member ORNL.
Tracy Hall, Brunel University David Bowes, University of Hertfordshire Andrew Kerr, University of Hertfordshire.
ITEC 451 Network Design and Analysis. 2 You will Learn: (1) Specifying performance requirements Evaluating design alternatives Comparing two or more systems.
Decision Tree Rong Jin. Determine Milage Per Gallon.
“Software project economics: a roadmap” Martin Shepperd Proc. of 30th IEEE Intl. Conf. on Softw. Eng. MInneapolis, May 2007.
1 As Class Convenes  Find your team  Get the team folder  Remove any returned work  Sign the attendance sheet.
1 Meta-analysis issues Carolyn Mair and Martin Shepperd Brunel University, UK.
One-Way Analysis of Covariance One-Way ANCOVA. ANCOVA Allows you to compare mean differences in 1 or more groups with 2+ levels (just like a regular ANOVA),
Hazards Analysis & Risks Assessment By Sebastien A. Daleyden Vincent M. Goussen.
Decision analysis and Risk Management course in Kuopio
Slide 1 Justin Brickell Donald E. Porter Vitaly Shmatikov Emmett Witchel The University of Texas at Austin Secure Remote Diagnostics.
Functional Testing Test cases derived from requirements specification document – Black box testing – Independent testers – Test both valid and invalid.
Read Only Memory (ROM) Number of words Size of word A block diagram of a ROM consisting of k inputs and n outputs is shown below. The inputs provide the.
System Modeling Nur Aini Masruroh.
CPIS 357 Software Quality & Testing
IV&V Facility 1 FY2002 Initiative: Software Architecture Metrics Hany Ammar, Mark Shereshevsky, Nicholay Gradetsky, Diaa Eldin Nassar, Walid AbdelMoez,
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
(3) Identifying Effect Size (ES) for each study. Overview General Information to keep in mind:  The goal is to convert each study to a single effect.
Meta-analysis. Overview Definition  A meta-analysis statistically combines the results of several studies that address a shared research hypotheses.
Making Statistics Surprising
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Green Overview Ronald Duncan (Chairman) 7 th October 2010.
What Do We Know about Defect Detection Methods P. Runeson et al.; "What Do We Know about Defect Detection Methods?", IEEE Software, May/June Page(s):
Software Engineering Experimentation Software Metrics Jeff Offutt
1 f02kitchenham5 Preliminary Guidelines for Empirical Research in Software Engineering Barbara A. Kitchenham etal IEEE TSE Aug 02.
PROCESS MODELLING AND MODEL ANALYSIS © CAPE Centre, The University of Queensland Hungarian Academy of Sciences Statistical Model Calibration and Validation.
THE IRISH SOFTWARE ENGINEERING RESEARCH CENTRELERO© What we currently know about software fault prediction: A systematic review of the fault prediction.
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
Chapter Title CHAPTER 3 RESEARCH METHODOLOGY Research Methodology This part of the study includes Project Design, Project Development and Operation,
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
1 Common Mistakes in Performance Evaluation (1) 1.No Goals  Goals  Techniques, Metrics, Workload 2.Biased Goals  (Ex) To show that OUR system is better.
Black Box Testing : The technique of testing without having any knowledge of the interior workings of the application is Black Box testing. The tester.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Slide 1 SPIN 23 February 2006 Norman Fenton Agena Ltd and Queen Mary University of London Improved Software Defect Prediction.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Software Quality Assurance and Testing Fazal Rehman Shamil.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
(3) Identifying Effect Size (ES) for each study. Overview General Information to keep in mind:  The goal is to convert each study to a single effect.
OXFORD SOFTWARE ENGINEERING Software Engineering Services & Consultancy Slide 1.1 © OSEL 2005 Page 1 of 30 Analysis of Defect (and other) Data SPIN London,
Manual Testing Concepts Instructor: Surender. Agenda  Content: 1. Testing Overview I. What is testing II. Who does testing III. When to Start Testing.
Perspectives on fault data quality Tracy Hall Reader in Software Engineering Brunel University Two short talks on this topic…
Software Testing By Souvik Roy. What is Software Testing? Executing software in a simulated or real environment, using inputs selected somehow.
A rich question calling for qualitative research Academy of Management PDW, August 2008 The Power of Richness IV Ann Langley, HEC Montréal.
6  sixsigm a The Lean Innovation Six Sigma Black belt 5-days program will cover the most contemporary process improvement practices.
What is Software Test Automation?
Equivalence partitioning
Equivalence partitioning
Software Engineering (CSI 321)
Domain Testing Functional testing which tests the application by giving inputs and evaluating its appropriate outputs. system does not accept invalid and.
Software Engineering Experimentation
INCOSE Usability Working Group
ITEC 451 Network Design and Analysis
Software Engineering Experimentation
Software Engineering Experimentation
Unit I Flash Cards Start.
Linear Regression.

Summary.
CSE403 Software Engineering Autumn 2000 More Testing
White Box testing & Inspections
TYPES OF TESTING.
Hazards Analysis & Risks Assessment
Assignment 7 Due Application of Support Vector Machines using Weka software Must install libsvm Data set: Breast cancer diagnostics Deliverables:
A Simple Template for Strategy
Presentation transcript:

1 Data Mining

Know thy tools Stop treating data miners as black boxes. Looking inside is (1) fun, (2) easy, (3) needed. 2

INFOGAIN: (the Fayyad and Irani MDL discretizer) in 55 lines Input: [ (1,X), (2,X), (3,X), (4,X), (11,Y), (12,Y), (13,Y), (14,Y) ] Output: 1, 11 dsfdsdssdsdsddsdsdsfsdfsdsdfsdsdf 3 E = Σ –p*log 2 (p)

Know thy tools Stop treating data miners as black boxes. Looking inside is (1) fun, (2) easy, (3) needed. 4

Know thy tools Stop treating data miners as black boxes. Looking inside is (1) fun, (2) easy, (3) needed. 5

It doesn't matter what you do but does matter who does it! Martin Shepperd, Brunel University, West London, UK 6

Systematic Review Conducted by Tracy Hall and David Bowes – T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. “A systematic literature review on fault prediction performance in software engineering”, Accepted for publication in TSE (download from BURA). Located 208 relevant primary studies Due to reporting requirements used 18 studies that contain 194 results – binary classifiers, confusion matrix, context details 7

Matthews correlation coefficient 8

(iv) Research Group 9

ANOVA Results Factor% of var Author group61% Metric family3% Author/metric9% Everything else 8% (but not significant) Residuals19% 10

Final word We cannot ignore the fact that the main determinant of a validation study result is which research group undertakes it. 11

Know thy tools Stop treating data miners as black boxes. Looking inside is (1) fun, (2) easy, (3) needed. 12