Mining data with PolyAnalyst Your Knowledge Partner TM

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Chapter 12 Decision Support Systems
Chapter 7 System Models.
STATISTICS Linear Statistical Models
CALENDAR.
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
© Megaputer intelligence, Inc. Your Knowledge Partner Survey Analysis using PolyAnalyst TM.
The basics for simulations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Chapter 1 Introduction to the Programmable Logic Controllers.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
© 2007 Megaputer Intelligence Utilizing Text Analytics in Your VOC Program: Analyzing Verbatims with PolyAnalyst Sergei Ananyan Megaputer Intelligence.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
Chapter 1 Business Driven Technology
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
GIS Lecture 8 Spatial Data Processing.
Before Between After.
Static Equilibrium; Elasticity and Fracture
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Chapter 13 The Data Warehouse
Tutorial 1: Sensitivity analysis of an analytical function
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
Benchmark Series Microsoft Excel 2013 Level 2
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
Introduction Embedded Universal Tools and Online Features 2.
OnContact CRM Customer Relationship Management. CRM 7 Benefits Rich "client" experience, completely web-based Access data anytime, anywhere. Ease of navigation.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Mining data with PolyAnalyst © 1999 Megaputer intelligence, Inc. learn to profit from data.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
CS2032 DATA WAREHOUSING AND DATA MINING
Data Mining – Intro.
Introduction to Building a BI Solution 권오주 OLAPForum
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
IBM Start Now Business Intelligence Solutions. Agenda Overview of BI Who will buy and why Start Now BI solution Benefit to customer.
PO320: Reporting with the EPM Solution Keshav Puttaswamy Program Manager Lead Project Business Unit Microsoft Corporation.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Megaputer Intelligence 인공지능연구실 석사 2 학기 최윤정
Enterprise Reporting Solution
7-1 Management Information Systems for the Information Age Copyright 2004 The McGraw-Hill Companies, Inc. All rights reserved Chapter 7 IT Infrastructures.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 4 Decision Support System & Artificial Intelligence.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
Decision Support Systems
PolyAnalyst Data and Text Mining tool
Data Warehousing and Data Mining
PolyAnalyst Web Report Training
PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

Mining data with PolyAnalyst Your Knowledge Partner TM www.megaputer.com

Outline Data Mining in BI chain PolyAnalyst overview Learning algorithms Additional features Future developments

Your Knowledge Partner TM Data Mining in BI chain Your Knowledge Partner TM

DM in Decision Making Consider a fragment of the BI chain: Data Knowledge Decision Action Data - is what we can capture and store Knowledge - is what provides for informed decisions Problem: How to get from Data to Knowledge? Solution: Data Mining (Machine Learning)

Data Mining "Data Mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from databases that is used to make crucial business decisions." -- G. Piatetsky-Shapiro, KDNuggets editor www.kdnuggets.com Valid Novel Actionable Comprehensible

Data Mining vs. OLAP OLAP - Helps prove or reject your hypotheses by dissecting data along different dimensions - But you have to guess the answer first ! Data Mining - Automatically develops and tests numerous hypotheses by learning from historical data - Analyzes raw data

Business Intelligence Chain Consider direct marketing automation Analyze data Integrate applications X

Data Mining Tasks Predicting Classifying Clustering Segmenting Explaining Associating Visualizing Link Analysis Text Mining

Fields of application

What makes DM hard? Unfamiliar concept and lack of experience Results require interpretation by an analyst Poor integration in existing applications Difficulty processing very large databases Necessity to learn a new application High cost

Megaputer response Challenge: Unfamiliar concept and lack of experience Response: Collaborative Appliance Program – combines Megaputer analysts expertise in data mining and customer knowledge of the business project Challenge: Results require interpretation by an analyst Response: Simple reporting and batch processing capabilities Challenge: Poor integration in existing applications Response: Easy scoring of external data with a few mouse clicks Challenge: Difficulty processing very large databases Response: In-Place Data Mining Challenge: Necessity to learn a new application Response: An SDK of easy-to-integrate PolyAnalyst COM components Challenge: High cost Response: Flexible licensing mechanism

Your Knowledge Partner TM PolyAnalyst overview Your Knowledge Partner TM

What is PolyAnalyst? Multi-strategy data mining suite Ease-of-use: The largest selection of ML algorithms for diverse business tasks Structured data and text processing tools Ease-of-use: friendly data manipulation and visualization Deep integration Applying models to external DB through the OLE DB protocol Exporting models to XML COM components Best Price/Performance ratio

Key differentiators of PolyAnalyst Integrated analysis of structured (numeric and categorical) and unstructured (text) data Easy to learn and operate visual analytical interface The largest selection of powerful machine learning algorithms Mouse-driven application of predictive models to data in any external system through a standard OLEDB link Simple integration with external applications: SDK of COM components In-Place Data Mining capabilities for processing huge databases Step-by-step tutorials based on real-world case studies Rich data manipulation and visualization tools Reusable analytical scripts for batch process data mining The best Price/Performance ratio

Customer base: 300+ installations PolyAnalyst Customer base: 300+ installations Sample customers Boeing (USA) 3M (USA) Chase Manhattan Bank (USA) McKinsey & Company (USA) Siemens (Germany) Lockheed Martin (USA) Allstate Insurance (USA) ICICI Bank (India) Mars (USA) Taco Bell (USA) DuPont (USA) Asea Skandia (Sweden) France Telecom (France) Cambridge Technology Partners (USA) Carlson Marketing (USA) Central Bank (Russia) US Navy (USA) KPN Research (Netherlands) Alka Insurance (Denmark) National Cancer Institute (USA)

PolyAnalyst workplace Control buttons Project navigation tree Data and Results pane Objects and Collections represented by icons Exploration engine report fragment PolyAnalyst log journal

PolyAnalyst provides Access to data held in a database or data warehouse Numerical Categorical Yes/no Date Data manipulation and visualization 14 machine learning algorithms Convenient results reporting and outputing Integration with external applications

Your Knowledge Partner TM PolyAnalyst machine learning algorithms Your Knowledge Partner TM

“Probably one the most impressive characteristic of PolyAnalyst is the sheer number of data mining tasks it can tackle.” Mario Apicella Technology Analyst InfoWorld Test Center July 3, 2000

Learning algorithms Find Laws (SKAT algorithm) Cluster (Localization of anomalies) Find Dependencies (n-dimensional distributions) Classify (Fuzzy logic modeling) Decision Tree (Information Gain criterion) PolyNet Predictor (GMDH-Neural Net hybrid) Market Basket Analysis (Association rules) Memory Based Reasoning (k-NN + GA) Linear Regression (Stepwise and rule-enriched) Discriminate (Unsupervised classification) Summary Statistics (Data summarization) Link Analysis (Visual correlation analysis) Text Mining (Semantic text analysis)

Cluster (FC) Identifies clusters of similar records Selects best variables for clustering Suggests the number of clusters Separates clusters of records in new data sets for further investigation - preprocessing for other algorithms

Groups of similar records Cluster (continued) Groups of similar records

Cluster (continued) Based on analyzing distributions in hypercubes of all variables rather than on measuring distances between points Hence, independent of rescaling of axes variable Finds only clusters actually present in data, on the background of uniformly distributed cases

Classify (CL) Fuzzy-logic based classification The function of belonging modeled by either Find Laws, PolyNet Predictor, or LR Provides record scoring with Lift and Gain charts used for visualization Assigns records to one of two classes and furnishes utilized classification rule

Classify (continued) Mass mailing Targeted mailing PolyAnalyst Lift chart illustrates an increase in the response to a campaign based on the discovered model - instead of random mailing % of maximal possible response Mass mailing Targeted mailing Profit ($) PolyAnalyst Gain chart helps optimize the profit obtained in a direct marketing campaign

Decision Tree (DT) Intuitively classifies cases to selected categories Based on Information Gain splitting criteria The fastest algorithm in PolyAnalyst Scales linearly with increasing number of records

Decision Tree (continued) Node characteristics Classification tree

Decision Forest (DF) The most efficient classification algorithm for tasks with multiple target categories Transforms the task of categorizing data records to N classes into the problem of solving N tasks of categorizing records to two classes Develops the best collection of N classification trees, with leaves containing probabilities of classifying records in the corresponding classes Scales linearly with increasing number of records

Link Analysis (LK) Reveals pairs of correlated objects Used in Fraud Detection, Text Analysis and other correlation analysis tasks

Text Analysis (TA) Extracts key concepts from natural language notes Tags individual records with the main encountered concepts Recognizes synonyms and othe semantic relations Can perform user-focused or unsupervised analysis Integrates the analysis of text with the power of other machine learning algorithms of PolyAnalyst Facilitates categorization of textual documents

Text Analysis (continued)

Basket Analysis (BA) Is used in Retailing, Fraud Detection and Medicine Identifies in transactional data groups of products sold together well Finds directed association rules for each of these groups Groups baskets containing similar sets of products Characterized by Support Confidence Improvement Based on new mathematics: works 10 to 50 times faster than traditional algorithms

Basket Analysis (continued) Groups of products sold together well Directed Association Rules

Basket Analysis (continued) Works with both transactional and flat data format Easily finds many-to-one rules “I would like to continue working together with Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).” -- Olof Goransson Senior Data Consultant CTP Skandinavien AB

Find Laws (FL) Models relationships hidden in data Presents discovered knowledge explicitly Searches the space of all possible hypotheses “The unique Find Laws algorithm along with an easy to use interface made PolyAnalyst the only choice for our environment.” -- James Farkas, Senior Navigation Engineer, The Boeing Company

Find Laws (continued) FL is based on the Megaputer’s unique Symbolic Knowledge Acquisition Technology (SKAT) A good introduction to SKAT: PCAI magazine, January 99, p. 48-52

Find Dependencies (FD) Determines most influential variables Detects multi-dimensional dependencies Predicts target variable in a table format Used as preprocessing for FL

Find Dependencies (continued) Predicted Sales per Employee

PolyNet Predictor (PN) Predicts values of continuous attributes Hybrid GMDH-Neural Network method Works well with large amounts of data The best architecture network is built automatically

Memory Based Reasoning (MB) Performs classification to multiple categories Based on identifying similar cases in the previous history Uses Genetic Algorithms to find the most suitable metric for the problem

Discriminate (DS) Determines what features of a selected data set distinguish it from the rest of the data Requires no target variable Can be powered by Find Laws PolyNet Predictor Linear Regression

Linear Regression (LR) Incorporates categorical and yes/no variables in the analysis correctly Stepwise Linear Regression: only influential variables included Can be used as a preprocessing and benchmarking module

Your Knowledge Partner TM PolyAnalyst features in more detail Your Knowledge Partner TM

Data Analysis Project Workflow Access data Understand, clean and transform data Run machine learning analysis Visualize, report and share results Integrate results in existing business process

Data Access ODBC-compliant databases: Dedicated access Oracle, DB2, Informix, Sybase, MS SQL Server, etc. Dedicated access IBM Visual Warehouse Oracle Express OLE DB (can do In-Place Data Mining) CSV or DBF files Data can be appended to the project when necessary

Data cleansing and manipulation SQL querying through OLE DB Records selection according to multiple criteria Union, intersection, or complement of data sets Categorical values aggregation Visual Drill-through Exceptional records filtering Split into n-tile percentage intervals Random sampling

Visualization Histograms Line and scatter plots with zoom and drill-through capabilities Snake charts Interactive 3D-charts Interactive Rule-graphs with sliders for visualizing multi-variable relations Frequency charts for categorical, integer, or yes/no variables Lift and Gain charts for marketing applications

Histograms and Frequencies Histogram displays distribution of numerical variables Frequencies chart displays distribution of categorical and yes/no variables

2D charts and Rule-graphs Sliders help visualize effects of other variables in more than two-dimensional models The Find Laws model (red line) for a product market share dependence on the price predicts a dramatic change in the formula when the product goes on promotion

Snake-charts Quickly compare qualitatively several datasets on all their attributes “High” Compared data sets All variables “Low”

Interactive 3D charts You can use mouse to rotate the 3D-cube

Your Knowledge Partner TM PolyAnalyst integration features Your Knowledge Partner TM

Integration objectives Use models to simply score data in various external databases Deliver models to external applications in the format they understand - XML Be able to analyze very large databases in their entirety Integrate dedicated machine learning components in existing decision support systems

Applying models externally PolyAnalyst can readily apply predictive models directly to data in any external source through a standard OLE DB protocol PolyAnalyst can export models to XML (PMML) format for their incorporation in external decision support applications

Analyzing large databases Traditional Data Mining In-Place Data Mining

PolyAnalyst COM A kit of COM-based Data Mining components Benefits See DMReview magazine, January 2000, p. 42 and PCAI magazine, March 99, p. 16 Benefits Develop new applications quickly and effortlessly Incorporate third party components Choose best components from different vendors Extend functionality by adding new components Cross-platform applications Integration with most simple tools (Visual Basic)

PolyAnalyst COM (continued) Offers individual machine learning engines Integration with external applications Hard analytical work is performed by integrated PolyAnalyst machine learning components behind the scenes The main program instructs PolyAnalyst on how to access the stored data Users see only the familiar interface enhanced by a few new buttons

PolyAnalyst platforms Standalone system: PolyAnalyst - Windows 9x/NT/2000/XP PolyAnalyst Pro - Windows NT/2000P/XP Pro PolyAnalyst XL - Add-ins for MS Excel Client/Server system: PolyAnalyst Knowledge Server - Windows NT Client - Windows 9x/NT/2000 or OS/2

Your Knowledge Partner TM Customer quotes Your Knowledge Partner TM

PolyAnalyst supports medical projects at 3M “Analytical engines do an excellent job of finding relations amongst many fields without overfitting.” Timothy Nagle Consulting Scientist 3M Corporation St. Paul, MN, USA

PolyAnalyst helps improving flight control system at Boeing “PolyAnalyst provides quick and easy access for inexperienced users to powerful modeling tools. James Farkas Senior Navigation Engineer The Boeing Company Kent, WA, USA

PolyAnalyst facilitates marketing research at Indiana University Raymond Burke E.W. Kelley Professor of BA Kelley Business School Indiana University Bloomington, IN, USA “PolyAnalyst provides a unique and powerful set of tools for data mining applications, including promotion response analysis, customer segmentation and profiling, and cross-selling analysis.”

PolyAnalyst helps medical research at the University of Wisconsin-Madison Prof. Roger L. Brown Director of RDSU University of Wisconsin Madison, WI, USA “PolyAnalyst suite enabled our researchers to search their data for rules and structure while providing a symbolic knowledge of the structure, the detail they needed.”

PolyAnalyst provides efficient machine learning algorithms Mario Apicella Technology Analyst InfoWorld Test Center “PolyAnalyst focuses more effectively on data discovery than its competition.”

Your Knowledge Partner TM PolyAnalyst future developments Your Knowledge Partner TM

Future developments Further support for OLE DB for DM Nested tables New machine learning algorithms Time series analysis Kohonen maps Enhanced data import and manipulation Visual development of workflow scripts New push-button vertical applications

PolyAnalyst -- WebAnalyst PolyAnalyst supports support visual project development when used on top of a new Megaputer web-enabled enterprise server, WebAnalyst

PolyAnalyst evaluation Download a FREE evaluation copy of PolyAnalyst from www.megaputer.com and enjoy using it hands-on following the provided step-by-step lessons, or exploring your own data.

Your Knowledge Partner TM Any Questions? Call Megaputer at (812) 330-0110 or write info@megaputer.com 120 W Seventh Street, Suite 310 Bloomington, IN 47404 USA Your Knowledge Partner TM

Case 1: Asea Skandia (Sweden)

Asea Skandia Established 1907 Largest Swedish distributor of electrical equipment About 1,400 employees and a turnover of SEK 5.1 billion About ten thousand product names Not good at CRM and DB marketing yet Had only transactional data in a database

Groups of products offered Home Appliances 90 Cookers, cooker fans, microwave ovens 91 Fridges/Chillers/Freezers 92 Washing machines, dishwashers, dryers 93 Sauna unit, fans 94 Small appliances Lightning 17 IR, RF and Bus control systems 19 Light reg.. timers, plugs, CCE-con., car heaters 70 Interior light fittings 72 Industrial light fittings 73 Emergency luminaires 74 Spotlights and downlights, lighting tracks 75 Decorative interior light fittings 77 Exterior light fittings 79 Accessories and spare parts 80 Fluorescent lamps and other discharge lamps 81 Incandescent filament and halogen lamps 82 Special lamps Ventilation and sheet metal 15 Fastening and fixings, protective equipment 16 Tools, implements, protective equipment & clothin 66 Ventilation 67 Sheet Metal for Buildings  Telecommunications 48 Low current cable 49 Data and optical fiber cable 50 Network material 51 Local data networks 52 Power Supply 53 Signalling equipment 55 Distress signal systems 57 Telephony 58 Internal communication systems 60 Aerial equipment 62 Sound and time distribution systems 63 Safety and Security Systems 64 Service Alarm Systems Electrical Equipment 1 Power and control cables 2 Electrical installation, wiring and flexible cable 6 Material kits, cable protection, lightning equipment 7 Terminations, joints, cabinets and electrical tape 8 Contact crimping 9 Electric meters 11 Cable ladders, trays, trunking, cable trolleys 14 Conduit, boxes, glands, fire protection 18 Switch systems 20 Fuses with accessories 21 Miniature circuit breaker systems 22 Distribution board systems IP20-IP43 23 Distribution board systems IP43-IP65 25 Equipment boxes, equipment cabinets 26 Distribution board accessories 28 Switchgear components, capacitors, busbar trunking 29 Connection terminals and marking materials 31 Motor, safety, load and MCCB breakers 32 Contactors and starters 35 Motors 37 Push switches 38 Sensors, monitors and regulators 40 Relays, time relays 42 Metering instruments 43 Spare parts for consumer goods 45 Programmable control system 85 Radiators and thermostats 87 Fan heaters 88 Water heaters and electric boilers 89 Heating cable

Asea Skandia CTP Megaputer (continued) Predicting cross-sell opportunities was possible Closer cooperation with the client was necessary Megaputer teamed with Cambridge Technology Partners (Sweden) Data was disguised prior to the analysis Asea Skandia CTP Megaputer Identified new opportunity Hired a consultant Helped aggregating products in groups Incorporated results in marketing activities Identified most suitable solution provider Worked with the client Collected available data Aggregated data in product categories Presented Megaputer results to the client Determined business potential of the data Developed data exploration strategy Carried out Market Basket Analysis Provided actionable results to CTP

PolyAnalyst MBA Works 10-50 times faster than traditional Easily finds many-to-one rules “I would like to continue working together with Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).” -- Olof Goransson Senior Data Consultant CTP Skandinavien AB