Gio Wiederhold PDM 1 Profiting from Data Mining Gio Wiederhold November 2003.

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction to Modeling DECISION MODELING WITH MICROSOFT EXCEL Copyright 2001 Prentice Hall.
Advertisements

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Introduction to Databases
Management Information Systems, Sixth Edition
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Process Specifications and Structured Decisions Systems Analysis and Design, 8e Kendall.
Gio Wiederhold SimQL 1 SimQL Accessing Simulation as Services to Information Systems Gio Wiederhold July 2001.
Chapter 3 Database Management
Requirements Specification
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
File Systems and Databases
Gio Wiederhold SimQL 1 Integration of Simulation Results into Information Systems Gio Wiederhold April 2002, updated Nov 2002.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Gio Wiederhold SimQL 1 Integration of Simulation Results into Information Systems Gio Wiederhold April 2002.
Page 1Prepared by Sapient for MITVersion 0.1 – August – September 2004 This document represents a snapshot of an evolving set of documents. For information.
Chapter 13 The Data Warehouse
Introduction to Web Applications Instructor: Enoch E. Damson.
Business Driven Technology Unit 3 Streamlining Business Operations Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
Chapter 10: Architectural Design
Data Mining: A Closer Look
Demand Planning: Forecasting and Demand Management
Data Mining Techniques
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Chapter 10 Architectural Design
Understanding Data Analytics and Data Mining Introduction.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
SESSION IV. M I S MIS refers broadly to a computer-based system that provides managers with the tools for organizing, evaluating and efficiently running.
UNDERSTANDING PRINCIPLES OF MARKETING
Decision Support Systems Management Information Systems BUS 391 Barry Floyd.
DSS Modeling Current trends – Multidimensional analysis (modeling) A modeling method that involves data analysis in several dimensions – Influence diagram.
Modeling.
MBAD/F 619: Risk Analysis and Financial Modeling Instructor: Linda Leon Fall 2014
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
Chapter 10  2000 by Prentice Hall Information Systems for Managerial Decision Making Uma Gupta Introduction to Information Systems.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Introduction to Software Engineering Lecture 1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 CHAPTER 2 Decision Making, Systems, Modeling, and Support.
Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.
Information Technology Planning. Overview What is IT Planning Organized planning of IT infrastructure and applications portfolios done at various levels.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
© 2006 Pearson Education Canada Inc. 3-1 Chapter 3 Database Management PowerPoint Presentation Jack Van Deventer Ward M. Eagen.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Object storage and object interoperability
Modern Systems Analysis and Design Third Edition Chapter 2 Succeeding as a Systems Analyst 2.1.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
BUSINESS INTELLIGENCE. The new technology for understanding the past & predicting the future … BI is broad category of technologies that allows for gathering,
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Information Technology Planning
Data Analysis.
File Systems and Databases
Kotler on Marketing Marketing is becoming a battle based more on information than on sales power.
Presentation transcript:

Gio Wiederhold PDM 1 Profiting from Data Mining Gio Wiederhold November 2003

Gio Wiederhold PDM 2 Steps needed to profit 1.Obtaining relevant data –Always incomplete 2.Extracting relationships –Imputing causality 3.Finding applicability –Determining leverage points 4.Inventing candidate actions –Assessing likely outcomes and benefits 5.Selecting action to be taken –Measuring the outcome  Collecting data for next round ? Model based

Gio Wiederhold PDM 3 Today's Problem: Disjointness 1.Database administrators Focus on data collection, organization, currency 2.Analysts Focus on slicing, dicing, relationships 3.Middle managers Focus on their costs, profits 4.MBAs Focus on business models, planning 5.Executives Must make decisions based on diverse inputs

Gio Wiederhold PDM 4 1. Data Collection Two choices 1.(rare) Collect data specifically for analysis  allows careful design --  model causes and effects Purchase = f(price, color, size, custumer inc., gender,.,,  costly  often small to make collection manageable  imposes delays 2.(common) Use data collected for other purposes  take advantage of what is readily available  low cost  filtering, reformatting, integration  incomplete - rarely covers all causes / effects  biased -- missing categories  only people with phones, cars -- shopping in super markets

Gio Wiederhold PDM 5 1a. Data Integration Needed when sources have inadequate coverage in distinct DBs for – Prices, Number purchased –Customer segments (supermarket, stores, on-line) implies some expectations append attributes where keys match: Joe include semantic match Joe = append rows where key types match: customer include semantic match customer = owner

Gio Wiederhold PDM 6 2. Data analyis Find relationships –already known - ignore or adjust in next round »requires comparison with expert knowledge »now have quantification –unknown »uninteresting per expert »interesting per expert

Gio Wiederhold PDM 7 3. Establish causality Already known -- Prior Model –B ut is it complete, i.e., does it explain all effects ? Analyze relationships – use expertise to decide direction »often obvious "common world knowledge" »sometimes ambiguous smoking  Cancer  not-smoking »often major true cause not captured in data food color 10%, food price 20%, buyer gender 2% unknown 75% guess: ethnicity, income purchase of Chinese vs other food invent surrogates: names, ZIP codes, use temporal information

Gio Wiederhold PDM 8 Establishing causality is risky 1. Is a Volvo a safe car? 2. What causes accidents?Drivers! 3. Who buys Volvos? 4. Must determine effect of safe drivers percentage of safe drivers overall percentage of safe drivers with Volvos 5. How much of the accident rate is now explained? The unexplained difference can be attributed to the car. Careful drivers! Mined: Volvos have fewer accidents

Gio Wiederhold PDM 9 Change cause create effects To use results of data mining have to understand direction of relationships interesting beneficial effects side effects controllable causes external causes hidden captured by data  Model

Gio Wiederhold PDM Causes provide the leverage Language of analyst / Language of modeling Many causes -- independent variables –A few may be controllable –Some may be controlled by our competition –Others are forces-of-nature Even more effects -- dependent variables –A few may be desired –Some may be disastrous –Many are poorly understood Intermediate effects –Provide a means for measuring effectiveness –Allow correction of actions taken

Gio Wiederhold PDM Planning & Assessment Analyze Alternatives Current Capabilities Future Expectations Process tasks: List resources Enumerate alternatives Prune alternative Compare alternatives now Predict the future

Gio Wiederhold PDM 12 Prediction Requires Tools  this book, Alfred Knopf, 1997

Gio Wiederhold PDM 13 Simulations predict 1.Back-of-the-envelope Common Adequate if model is simple Assumptions are easily forgotten after some time, not distinguished from data "Why are we doing this" 2.Spreadsheets Most common computing tool Specialist modeler can help New, recent data can be pasted in Awkward for the tree of future alternatives 3. Constructed to order Costly, powerful technology Specialist modelers required Expressive simulation languages Requires specialists to set up, run, and rerun with new data Iv gH Xy mN DM

Gio Wiederhold PDM 14 Simulation results: likelihoodstime Next period alternatives uncertainty increases and subsequent periods now

Gio Wiederhold PDM 15 Simulation services Wide variety, but common principle Inputs Model Output (time, $, place,...) 1.Spreadsheets Identify independent, controlable, and resulting values 2. Execution specific to query : what-if assessment –may require HPC power for adequate response 3. Continously executing : weather prediction –Search for best match ( location, time ) 4. Past simulations results collected for future use Typically sparse -- the dimension of the futures is too large: –Tables in a design handbook: materials Perform inter- or extra-polations to match query parameters

Gio Wiederhold PDM Specify Value of Effects Still needed: Value of alternative outcomes Decision maker / owner input –Benefits and Costs –Potential Profit –Correct for risk, and adjust to present value past now futures Values time

Gio Wiederhold PDM 17 Having it all together Relationships from analyses of past data Data representing the current state List of actionable alternatives Tree of subsequent alternatives Probabilities of those alternatives Values of the outcomes Ability to predict the likelihood of futures Values

Gio Wiederhold PDM 18 Vision: Putting it all together Combine results mined from past data, current observations, and predictions into the future. o o o o o o time Support specialists Decision Maker

Gio Wiederhold PDM 19 Needed: Information Systems that also project seamlessly into the Futures Support of decision-making requires dealing with the futures, as well the past Databases deal well with the past Streaming sensors supply current status Spreadsheets, simulations deal with the likely futures Future information systems should combine all these sources time past now future

Gio Wiederhold PDM 20 Connecting it all Build super systems Coherent, consistent Expensive Unmaintainable Too many cooks: –Database folk –Data miners –Analysts –Planners –Simulation specialists –Decision makers Develop interfaces Incremental Composable as needed Heterogeneous Interfaces required: Metadata –Database to miners: SQL –Mined results to analysts: XML? –Analysts to planners ? –Planners to Simulations? SimQL –Decision makers: New tools !

Gio Wiederhold PDM 21 Interfaces enable integration: New: SimQL to access Simulations time past now futures Msg systems, Sensors Streaming data Databases and schemas, accessed via SQL or XML Simulations, accessed via SimQL and schema compliant wrappers

Gio Wiederhold PDM 22 Parser Metadata Manager Query manager Schema Manager Wrapped..Simulations Metadata Development Interaction Production Interaction Filing of Access Specs Use of Access Specs Initiation and Results of Simulations Schema Commands Schema Commands Help Error reports CustomerDeveloper Help Query SimQL proof-of-concept Implementation o o

Gio Wiederhold PDM 23 Demonstration of SimQL Business planning spreadsheets Weather on the Internet Engineering simulation wrapper Test Applications Simple GUI common language requirements Shipping location database

Gio Wiederhold PDM 24 Information system use of simulation results Simulation results are mapped to alternative Courses-of-actions Information system should support model driving the the computation and recomputation of likelihoods Likelihoods change as now moves forwards and eliminates earlier alternatives. time prob

Gio Wiederhold PDM 25 The likelihoods multiply out to the end-effects then their values can be applied to earlier nodes Values past now future time Next period alternatives and subsequent periods prob value

Gio Wiederhold PDM 26 Recomputation is needed at the next time phase past now future Re-assess as time marches forward ! A Pruned Bush  A Pruned Bush Databases,... Spreadsheets, other simulations, Msgs sensors time 1266 ? ??

Gio Wiederhold PDM 27 Even the present needs SimQL time past now future last recorded observations simple simulations to extrapolate data Is the delivery truck in X? Is the right stuff on the truck? Will the crew be at X? Will the forces be ready to accept delivery? point-in-time for situational assessment Not all data are current:

Gio Wiederhold PDM 28 Integrative information systems: research questions What human interfaces can support the decision maker? How to move seamlessly from the past to the future? What system interfaces are good now and stay adaptable How can multiple futures be managed (indexed)? How can multiple futures be compared, selected? How should joint uncertainty be computed? How can the NOW point be moved automatically?

Gio Wiederhold PDM 29 SimQL research questions How little of the model needs to be exposed? How can defaults be set rationally? How should expected execution cost be reported? How should uncertainty be reported? Are there differences among application areas that require different language structures? Are there differences among application areas that require different language features? How will the language interface support effective partitioning and distribution?

Gio Wiederhold PDM 30 Moving to a Service Paradigm Interfaces define service potentials Server is an independent contractor, defines service Client selects service, and specifies parameters Server’s success depends on value provided Some form of payment is due for services x,y Databases are a current example. Simulations have the same potential.

Gio Wiederhold PDM 31 Summary of SimQL A new service for Decision Making : follows database paradigm –( by about 25 years ) coherence in prediction –displacement of ad-hoc practices seamless information integration –single paradigm for decision makers simulation industry infrastructure –investment has a potential market –should follows database industry model: Interfaces promote new industries

Gio Wiederhold PDM 32 extensions for network support are also disjoint Do not interoperate Summary: Today decision making support is disjoint, each community improves its area and ignores others Distribution Databases Simulation Planning Science

Gio Wiederhold PDM 33 The decisionmaker has few tools Spreadsheets Planning of allocations Other simulations various point assessments past now future time Data integration distributed, heterogeneous ffga jjkl,a nsnd nn 23.5a Databases Intuition + organized support disjointed support

Gio Wiederhold PDM 34 Databases Coda: Put relevant work together and move on Support integration of results mined from past data, current observations, and predictions about the futures. o o Simulation Support Services Decision Maker Service interfaces Human interfaces Data Mining o o Modeling tools o o ? Real Information InformationSystems