Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining data with PolyAnalyst Your Knowledge Partner TM

Similar presentations


Presentation on theme: "Mining data with PolyAnalyst Your Knowledge Partner TM"— Presentation transcript:

1 Mining data with PolyAnalyst Your Knowledge Partner TM

2 Outline Data Mining in BI chain PolyAnalyst overview
Learning algorithms Additional features Future developments

3 Your Knowledge Partner TM
Data Mining in BI chain Your Knowledge Partner TM

4 DM in Decision Making Consider a fragment of the BI chain: Data
Knowledge Decision Action Data - is what we can capture and store Knowledge - is what provides for informed decisions Problem: How to get from Data to Knowledge? Solution: Data Mining (Machine Learning)

5 Data Mining "Data Mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from databases that is used to make crucial business decisions." -- G. Piatetsky-Shapiro, KDNuggets editor Valid Novel Actionable Comprehensible

6 Data Mining vs. OLAP OLAP - Helps prove or reject your hypotheses by dissecting data along different dimensions - But you have to guess the answer first ! Data Mining - Automatically develops and tests numerous hypotheses by learning from historical data - Analyzes raw data

7 Business Intelligence Chain
Consider direct marketing automation Analyze data Integrate applications X

8 Data Mining Tasks Predicting Classifying Clustering Segmenting
Explaining Associating Visualizing Link Analysis Text Mining

9 Fields of application

10 What makes DM hard? Unfamiliar concept and lack of experience
Results require interpretation by an analyst Poor integration in existing applications Difficulty processing very large databases Necessity to learn a new application High cost

11 Megaputer response Challenge: Unfamiliar concept and lack of experience Response: Collaborative Appliance Program – combines Megaputer analysts expertise in data mining and customer knowledge of the business project Challenge: Results require interpretation by an analyst Response: Simple reporting and batch processing capabilities Challenge: Poor integration in existing applications Response: Easy scoring of external data with a few mouse clicks Challenge: Difficulty processing very large databases Response: In-Place Data Mining Challenge: Necessity to learn a new application Response: An SDK of easy-to-integrate PolyAnalyst COM components Challenge: High cost Response: Flexible licensing mechanism

12 Your Knowledge Partner TM
PolyAnalyst overview Your Knowledge Partner TM

13 What is PolyAnalyst? Multi-strategy data mining suite Ease-of-use:
The largest selection of ML algorithms for diverse business tasks Structured data and text processing tools Ease-of-use: friendly data manipulation and visualization Deep integration Applying models to external DB through the OLE DB protocol Exporting models to XML COM components Best Price/Performance ratio

14 Key differentiators of PolyAnalyst
Integrated analysis of structured (numeric and categorical) and unstructured (text) data Easy to learn and operate visual analytical interface The largest selection of powerful machine learning algorithms Mouse-driven application of predictive models to data in any external system through a standard OLEDB link Simple integration with external applications: SDK of COM components In-Place Data Mining capabilities for processing huge databases Step-by-step tutorials based on real-world case studies Rich data manipulation and visualization tools Reusable analytical scripts for batch process data mining The best Price/Performance ratio

15 Customer base: 300+ installations
PolyAnalyst Customer base: 300+ installations Sample customers Boeing (USA) 3M (USA) Chase Manhattan Bank (USA) McKinsey & Company (USA) Siemens (Germany) Lockheed Martin (USA) Allstate Insurance (USA) ICICI Bank (India) Mars (USA) Taco Bell (USA) DuPont (USA) Asea Skandia (Sweden) France Telecom (France) Cambridge Technology Partners (USA) Carlson Marketing (USA) Central Bank (Russia) US Navy (USA) KPN Research (Netherlands) Alka Insurance (Denmark) National Cancer Institute (USA)

16 PolyAnalyst workplace
Control buttons Project navigation tree Data and Results pane Objects and Collections represented by icons Exploration engine report fragment PolyAnalyst log journal

17 PolyAnalyst provides Access to data held in a database or data warehouse Numerical Categorical Yes/no Date Data manipulation and visualization 14 machine learning algorithms Convenient results reporting and outputing Integration with external applications

18 Your Knowledge Partner TM
PolyAnalyst machine learning algorithms Your Knowledge Partner TM

19 “Probably one the most impressive characteristic of PolyAnalyst is the sheer number of data mining tasks it can tackle.” Mario Apicella Technology Analyst InfoWorld Test Center July 3, 2000

20 Learning algorithms Find Laws (SKAT algorithm)
Cluster (Localization of anomalies) Find Dependencies (n-dimensional distributions) Classify (Fuzzy logic modeling) Decision Tree (Information Gain criterion) PolyNet Predictor (GMDH-Neural Net hybrid) Market Basket Analysis (Association rules) Memory Based Reasoning (k-NN + GA) Linear Regression (Stepwise and rule-enriched) Discriminate (Unsupervised classification) Summary Statistics (Data summarization) Link Analysis (Visual correlation analysis) Text Mining (Semantic text analysis)

21 Cluster (FC) Identifies clusters of similar records
Selects best variables for clustering Suggests the number of clusters Separates clusters of records in new data sets for further investigation - preprocessing for other algorithms

22 Groups of similar records
Cluster (continued) Groups of similar records

23 Cluster (continued) Based on analyzing distributions in hypercubes of all variables rather than on measuring distances between points Hence, independent of rescaling of axes variable Finds only clusters actually present in data, on the background of uniformly distributed cases

24 Classify (CL) Fuzzy-logic based classification
The function of belonging modeled by either Find Laws, PolyNet Predictor, or LR Provides record scoring with Lift and Gain charts used for visualization Assigns records to one of two classes and furnishes utilized classification rule

25 Classify (continued) Mass mailing Targeted mailing PolyAnalyst Lift chart illustrates an increase in the response to a campaign based on the discovered model - instead of random mailing % of maximal possible response Mass mailing Targeted mailing Profit ($) PolyAnalyst Gain chart helps optimize the profit obtained in a direct marketing campaign

26 Decision Tree (DT) Intuitively classifies cases to selected categories
Based on Information Gain splitting criteria The fastest algorithm in PolyAnalyst Scales linearly with increasing number of records

27 Decision Tree (continued)
Node characteristics Classification tree

28 Decision Forest (DF) The most efficient classification algorithm for tasks with multiple target categories Transforms the task of categorizing data records to N classes into the problem of solving N tasks of categorizing records to two classes Develops the best collection of N classification trees, with leaves containing probabilities of classifying records in the corresponding classes Scales linearly with increasing number of records

29 Link Analysis (LK) Reveals pairs of correlated objects
Used in Fraud Detection, Text Analysis and other correlation analysis tasks

30 Text Analysis (TA) Extracts key concepts from natural language notes
Tags individual records with the main encountered concepts Recognizes synonyms and othe semantic relations Can perform user-focused or unsupervised analysis Integrates the analysis of text with the power of other machine learning algorithms of PolyAnalyst Facilitates categorization of textual documents

31 Text Analysis (continued)

32 Basket Analysis (BA) Is used in Retailing, Fraud Detection and Medicine Identifies in transactional data groups of products sold together well Finds directed association rules for each of these groups Groups baskets containing similar sets of products Characterized by Support Confidence Improvement Based on new mathematics: works 10 to 50 times faster than traditional algorithms

33 Basket Analysis (continued)
Groups of products sold together well Directed Association Rules

34 Basket Analysis (continued)
Works with both transactional and flat data format Easily finds many-to-one rules “I would like to continue working together with Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).” -- Olof Goransson Senior Data Consultant CTP Skandinavien AB

35 Find Laws (FL) Models relationships hidden in data
Presents discovered knowledge explicitly Searches the space of all possible hypotheses “The unique Find Laws algorithm along with an easy to use interface made PolyAnalyst the only choice for our environment.” -- James Farkas, Senior Navigation Engineer, The Boeing Company

36 Find Laws (continued) FL is based on the Megaputer’s unique
Symbolic Knowledge Acquisition Technology (SKAT) A good introduction to SKAT: PCAI magazine, January 99, p

37 Find Dependencies (FD)
Determines most influential variables Detects multi-dimensional dependencies Predicts target variable in a table format Used as preprocessing for FL

38 Find Dependencies (continued)
Predicted Sales per Employee

39 PolyNet Predictor (PN)
Predicts values of continuous attributes Hybrid GMDH-Neural Network method Works well with large amounts of data The best architecture network is built automatically

40 Memory Based Reasoning (MB)
Performs classification to multiple categories Based on identifying similar cases in the previous history Uses Genetic Algorithms to find the most suitable metric for the problem

41 Discriminate (DS) Determines what features of a selected data set distinguish it from the rest of the data Requires no target variable Can be powered by Find Laws PolyNet Predictor Linear Regression

42 Linear Regression (LR)
Incorporates categorical and yes/no variables in the analysis correctly Stepwise Linear Regression: only influential variables included Can be used as a preprocessing and benchmarking module

43 Your Knowledge Partner TM
PolyAnalyst features in more detail Your Knowledge Partner TM

44 Data Analysis Project Workflow
Access data Understand, clean and transform data Run machine learning analysis Visualize, report and share results Integrate results in existing business process

45 Data Access ODBC-compliant databases: Dedicated access
Oracle, DB2, Informix, Sybase, MS SQL Server, etc. Dedicated access IBM Visual Warehouse Oracle Express OLE DB (can do In-Place Data Mining) CSV or DBF files Data can be appended to the project when necessary

46 Data cleansing and manipulation
SQL querying through OLE DB Records selection according to multiple criteria Union, intersection, or complement of data sets Categorical values aggregation Visual Drill-through Exceptional records filtering Split into n-tile percentage intervals Random sampling

47 Visualization Histograms
Line and scatter plots with zoom and drill-through capabilities Snake charts Interactive 3D-charts Interactive Rule-graphs with sliders for visualizing multi-variable relations Frequency charts for categorical, integer, or yes/no variables Lift and Gain charts for marketing applications

48 Histograms and Frequencies
Histogram displays distribution of numerical variables Frequencies chart displays distribution of categorical and yes/no variables

49 2D charts and Rule-graphs
Sliders help visualize effects of other variables in more than two-dimensional models The Find Laws model (red line) for a product market share dependence on the price predicts a dramatic change in the formula when the product goes on promotion

50 Snake-charts Quickly compare qualitatively several datasets on all their attributes “High” Compared data sets All variables “Low”

51 Interactive 3D charts You can use mouse to rotate the 3D-cube

52 Your Knowledge Partner TM
PolyAnalyst integration features Your Knowledge Partner TM

53 Integration objectives
Use models to simply score data in various external databases Deliver models to external applications in the format they understand - XML Be able to analyze very large databases in their entirety Integrate dedicated machine learning components in existing decision support systems

54 Applying models externally
PolyAnalyst can readily apply predictive models directly to data in any external source through a standard OLE DB protocol PolyAnalyst can export models to XML (PMML) format for their incorporation in external decision support applications

55 Analyzing large databases
Traditional Data Mining In-Place Data Mining

56 PolyAnalyst COM A kit of COM-based Data Mining components Benefits
See DMReview magazine, January 2000, p. 42 and PCAI magazine, March 99, p. 16 Benefits Develop new applications quickly and effortlessly Incorporate third party components Choose best components from different vendors Extend functionality by adding new components Cross-platform applications Integration with most simple tools (Visual Basic)

57 PolyAnalyst COM (continued)
Offers individual machine learning engines Integration with external applications Hard analytical work is performed by integrated PolyAnalyst machine learning components behind the scenes The main program instructs PolyAnalyst on how to access the stored data Users see only the familiar interface enhanced by a few new buttons

58 PolyAnalyst platforms
Standalone system: PolyAnalyst - Windows 9x/NT/2000/XP PolyAnalyst Pro - Windows NT/2000P/XP Pro PolyAnalyst XL - Add-ins for MS Excel Client/Server system: PolyAnalyst Knowledge Server - Windows NT Client - Windows 9x/NT/2000 or OS/2

59 Your Knowledge Partner TM
Customer quotes Your Knowledge Partner TM

60 PolyAnalyst supports medical projects at 3M
“Analytical engines do an excellent job of finding relations amongst many fields without overfitting.” Timothy Nagle Consulting Scientist 3M Corporation St. Paul, MN, USA

61 PolyAnalyst helps improving flight control system at Boeing
“PolyAnalyst provides quick and easy access for inexperienced users to powerful modeling tools. James Farkas Senior Navigation Engineer The Boeing Company Kent, WA, USA

62 PolyAnalyst facilitates marketing research at Indiana University
Raymond Burke E.W. Kelley Professor of BA Kelley Business School Indiana University Bloomington, IN, USA “PolyAnalyst provides a unique and powerful set of tools for data mining applications, including promotion response analysis, customer segmentation and profiling, and cross-selling analysis.”

63 PolyAnalyst helps medical research at the University of Wisconsin-Madison
Prof. Roger L. Brown Director of RDSU University of Wisconsin Madison, WI, USA “PolyAnalyst suite enabled our researchers to search their data for rules and structure while providing a symbolic knowledge of the structure, the detail they needed.”

64 PolyAnalyst provides efficient machine learning algorithms
Mario Apicella Technology Analyst InfoWorld Test Center “PolyAnalyst focuses more effectively on data discovery than its competition.”

65 Your Knowledge Partner TM
PolyAnalyst future developments Your Knowledge Partner TM

66 Future developments Further support for OLE DB for DM
Nested tables New machine learning algorithms Time series analysis Kohonen maps Enhanced data import and manipulation Visual development of workflow scripts New push-button vertical applications

67 PolyAnalyst -- WebAnalyst
PolyAnalyst supports support visual project development when used on top of a new Megaputer web-enabled enterprise server, WebAnalyst

68 PolyAnalyst evaluation
Download a FREE evaluation copy of PolyAnalyst from and enjoy using it hands-on following the provided step-by-step lessons, or exploring your own data.

69 Your Knowledge Partner TM
Any Questions? Call Megaputer at (812) or write 120 W Seventh Street, Suite 310 Bloomington, IN USA Your Knowledge Partner TM

70 Case 1: Asea Skandia (Sweden)

71 Asea Skandia Established 1907
Largest Swedish distributor of electrical equipment About 1,400 employees and a turnover of SEK 5.1 billion About ten thousand product names Not good at CRM and DB marketing yet Had only transactional data in a database

72 Groups of products offered
Home Appliances 90 Cookers, cooker fans, microwave ovens 91 Fridges/Chillers/Freezers 92 Washing machines, dishwashers, dryers 93 Sauna unit, fans 94 Small appliances Lightning 17 IR, RF and Bus control systems 19 Light reg.. timers, plugs, CCE-con., car heaters 70 Interior light fittings 72 Industrial light fittings 73 Emergency luminaires 74 Spotlights and downlights, lighting tracks 75 Decorative interior light fittings 77 Exterior light fittings 79 Accessories and spare parts 80 Fluorescent lamps and other discharge lamps 81 Incandescent filament and halogen lamps 82 Special lamps Ventilation and sheet metal 15 Fastening and fixings, protective equipment 16 Tools, implements, protective equipment & clothin 66 Ventilation 67 Sheet Metal for Buildings  Telecommunications 48 Low current cable 49 Data and optical fiber cable 50 Network material 51 Local data networks 52 Power Supply 53 Signalling equipment 55 Distress signal systems 57 Telephony 58 Internal communication systems 60 Aerial equipment 62 Sound and time distribution systems 63 Safety and Security Systems 64 Service Alarm Systems Electrical Equipment 1 Power and control cables 2 Electrical installation, wiring and flexible cable 6 Material kits, cable protection, lightning equipment 7 Terminations, joints, cabinets and electrical tape 8 Contact crimping 9 Electric meters 11 Cable ladders, trays, trunking, cable trolleys 14 Conduit, boxes, glands, fire protection 18 Switch systems 20 Fuses with accessories 21 Miniature circuit breaker systems 22 Distribution board systems IP20-IP43 23 Distribution board systems IP43-IP65 25 Equipment boxes, equipment cabinets 26 Distribution board accessories 28 Switchgear components, capacitors, busbar trunking 29 Connection terminals and marking materials 31 Motor, safety, load and MCCB breakers 32 Contactors and starters 35 Motors 37 Push switches 38 Sensors, monitors and regulators 40 Relays, time relays 42 Metering instruments 43 Spare parts for consumer goods 45 Programmable control system 85 Radiators and thermostats 87 Fan heaters 88 Water heaters and electric boilers 89 Heating cable

73 Asea Skandia CTP Megaputer
(continued) Predicting cross-sell opportunities was possible Closer cooperation with the client was necessary Megaputer teamed with Cambridge Technology Partners (Sweden) Data was disguised prior to the analysis Asea Skandia CTP Megaputer Identified new opportunity Hired a consultant Helped aggregating products in groups Incorporated results in marketing activities Identified most suitable solution provider Worked with the client Collected available data Aggregated data in product categories Presented Megaputer results to the client Determined business potential of the data Developed data exploration strategy Carried out Market Basket Analysis Provided actionable results to CTP

74 PolyAnalyst MBA Works times faster than traditional Easily finds many-to-one rules “I would like to continue working together with Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).” -- Olof Goransson Senior Data Consultant CTP Skandinavien AB


Download ppt "Mining data with PolyAnalyst Your Knowledge Partner TM"

Similar presentations


Ads by Google