The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Supporting further and higher education A template for describing instances of (e)learning.
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Test Automation Success: Choosing the Right People & Process
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Alternate Software Development Methodologies
April 28, 2015 Virginia Tech. Data Analytics “Analytics is the combustion engine of business, and it will be necessary for organizations that want to.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 28 Slide 1 Process Improvement.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
When to use Data Mining. Introduction An important question that should be answered before you commence any data mining project is whether data mining.
360-degree feedback Briefing for Participants Full Circle Feedback
Computer Engineering 203 R Smith Agile Development 1/ Agile Methods What are Agile Methods? – Extreme Programming is the best known example – SCRUM.
SE 450 Software Processes & Product Metrics 1 Defect Removal.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Notion of a Project Notes from OOSE Slides - modified.
1 CSE591 (575) Data Mining 1/21/ /6/2003 Computer Science & Engineering ASU.
The Software Product Life Cycle. Views of the Software Product Life Cycle  Management  Software engineering  Engineering design  Architectural design.
Chapter 2- Software Process Lecture 4. Software Engineering We have specified the problem domain – industrial strength software – Besides delivering the.
Deriving Performance Metrics From Project Plans to Provide KPIs for Management Information Primavera SIG October 2013.
12 Steps to Useful Software Metrics
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Customer Focus Module Preview
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Software Engineering Experimentation Software Engineering Specific Issues (Mostly CS as well) Jeff Offutt
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
S/W Project Management
Unit 2: Engineering Design Process
User Modeling Lecture # 5 Gabriel Spitz 1. User-Interface design - Steps/Goals.
N By: Md Rezaul Huda Reza n
COMP3503 Intro to Inductive Modeling
Software Project Failure Software Project Failure Night Two, Part One CSCI 521 Software Project Management.
1M.Sc.(I.T.), VNSGU, Surat. Structured Analysis Focuses on what system or application is required to do. It does not state how the system should be implement.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Software Engineering - Spring 2003 (C) Vasudeva Varma, IIITHClass of 39 CS3600: Software Engineering: Standards in Process Modeling CMM and PSP.
Capability Maturity Models Software Engineering Institute (supported by DoD) The problems of software development are mainly caused by poor process management.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 25 Slide 1 Process Improvement l Understanding, Modelling and Improving the Software Process.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Software Engineering Principles Principles form the basis of methods, techniques, methodologies and tools Principles form the basis of methods, techniques,
1 Exploring Data Mining Implementation By Karim Hirji, IBM Canada Chichang Jou, Tamkang University.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data mining. Data mining, at its core, is the transformation of large amounts of data into meaningful patterns and rules.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Oct. 11, 2011– May 8, 2012 Facilitators: Carol Mayer & Ginni Winters.
1 Chapter 3 1.Quality Management, 2.Software Cost Estimation 3.Process Improvement.
10 Aug 2010 ECE/BENG-492 SENIOR ADVANCED DESIGN PROJECT Meeting #7.
PROC-1 1. Software Development Process. PROC-2 A Process Software Development Process User’s Requirements Software System Unified Process: Component Based.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Scientific Methods and Terminology. Scientific methods are The most reliable means to ensure that experiments produce reliable information in response.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
Requirements Validation
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE.
Requirements Engineering Process
AGILE XP AND SCRUM © University of LiverpoolCOMP 319slide 1.
IT323 - Software Engineering 2 1 Tutorial 4.  List the main benefits of software reuse 2.
1 CP586 © Peter Lo 2003 Multimedia Communication Multimedia Development Team.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
QAA COLLABORATIVE PROVISION AUDIT DRAFT REPORT. QAA CPA Process Submission by the University of Self Evaluation Document (SED) (December 2005) Selection.
Software Project Configuration Management
Software Verification and Validation
Software Quality Engineering
Case Study 1 By : Shweta Agarwal Nikhil Walecha Amit Goyal
Presentation transcript:

The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird, Zimmerman, & Schulte – Microsoft Research Presentation By: Ebeid Soliman & Mason Schoolfield

Motivation This paper is a reflection of the authors’ applied data mining work, discussions with researchers, and software engineering practitioners. Document methods and experience from industrial practitioners The principal question is : what characterizes the difference between academic and industrial data mining ? Motivation: Successful data-mining projects in industry

Inductive Software Engineering “A branch of software engineering that focuses on the delivery of data mining based software applications to users” Understand user goals to inductively generate the models that most matter to the user Industrial practitioners are focused on users, whereas academic data mining research is focused on algorithms

Industrial Data Mining 7 Principles Users before algorithms Plan for scale Early feedback Be open-minded Do smart learning Live with the data you have Broad skill set, big toolkit

Users before algorithms Guiding Principle – Users Before Algorithms Mining algorithms are only good if users fund their use in real-world applications

Users before Algorithms Hallmarks of good interaction meetings Users bring senior management to the meetings Users keep interrupting (you or each other) and debating your results Indicates the users understand your explanation of the results Your results are touching on issues that concern them User begin to offer more data sources for analysis Users invite you to their workspace to show how to do part of the analysis

Plan for scale Knowledge Discovery in Databases (KDD) KDD – Knowledge Discovery In Databases The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data Repetition Required Steps that compose the KDD process - Fayyad 1996

Plan for scale Most data mining is data pre-processing Gaining access to databases in business groups is time consuming To ensure repeatability automate as many KDD steps as possible Data mining methods are repeated multiple times Answer user questions Enhance data mining method or Fix bugs Deploy to different user groups

Plan for scale Observed Phases Scout - rapid prototyping, apply many methods to data, explore range of hypotheses, gain user interest (get feedback) Survey - experiment to find stable models - focusing on user goals Build - integrate models into a deployment framework – suitable for target user base Team size doubles after scouting, doubles after surveying – time implications!

Early feedback Simplicity first: before conducting very elaborate studies, try applying very simple tools to gain rapid early feedback Get Feedback Early and Often Discretize continuous attributes (determine what is ignorable)

Be open-minded Avoid a fixed hypothesis Avoid a fixed approach, particularly for data not been mined before Initial results are important and can change goals

Smart Learning Inductive agents, human or otherwise, make errors Don’t torture the data to meet preconceptions, but it can be ok to go “fishing” Important outcomes are riding on your conclusions - check & validate! Check the variance before concluding, it may be based on statistical noise Check conclusion stability against different sample sizes Check conclusion support to avoid conclusions based on a small percent of the data

Smart Learning Prevent spurious conclusions by carefully controlling data collection and focusing on a small space of hypotheses (IF YOU CAN) Rule learners – RIPPER and INDUCT check against randomly generated alternatives (if probabilities are the same you can delete the rule)

Live with the data you have Collecting data comes at a cost! Go mining with the data you have, not the data you hope to have at a later date Remove spurious data - conduct instance or feature selection studies 80 to 90% of rows and all but the square root of columns can be deleted before compromising performance of the learned model Be respectful but doubtful to all user-suggested domain hypotheses

Broad skill set, big toolkit Try multiple inductive technologies Inductive Engineers generate novel and insightful feedback for users Researchers can work to perfect a single algorithm Big ecology: Use tools supported by a large ecosystem of developers who are constantly building new modules (e.g. R, WEKA, MATLAB)

What does this mean for Industry? Implications for Project Management Scouting takes weeks, Surveying takes months, and Building takes years Implications for Training Communications skills Results briefing Scripting

Research to help Industry Research themes to benefit industrial data mining Analysis patterns for inductive engineers (like design patterns for developers) Design pattern for data miners Optimizations of learning algorithms Anomaly detectors Business-aware learners

Final Notes Conclusion – Be user-focused, keep these principles in mind Hopefully these generalities will be helpful Share your experiences and knowledge so that Industrial Inductive Engineering can mature