Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Data Manager Business Intelligence Solutions. Data Mart and Data Warehouse Data Warehouse Architecture Dimensional Data Structure Extract, transform and.
Part IV MANUFACTURING SYSTEMS
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Shipi Kankane Prashanth Nakirekommula.  Applying analytics and risk- management capabilities to health insurance through LexisNexis data platforms. 
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
DMAIC DefineMeasureAnalyzeImproveControl D Define M Measure A Analyze I Improve C Control Implementing Six Sigma Quality at Better Body Manufacturing.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 5 Data mining : A Closer Look.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Enterprise systems infrastructure and architecture DT211 4
“Collaborative automation: water network and the virtual market of energy”, an example of Operational Efficiency improvement through Analytics Stockholm,
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Cluster Reliability Project ISIS Vanderbilt University.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Acoustic Resonant Inspection (ARI) offers a rapid and inexpensive method of 100% inspection of parts. This can contribute to improving quality of products,
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
Data Mining and Decision Support
NC-BSI: TASK 3.5: Reduction of False Alarm Rates from Fused Data Problem Statement/Objectives Research Objectives Intelligent fusing of data from hybrid.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
The article written by Boyarshinova Vera Scientific adviser: Eltyshev Denis THE USE OF NEURO-FUZZY MODELS FOR INTEGRATED ASSESSMENT OF THE CONDITIONS OF.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
SNS COLLEGE OF TECHNOLOGY
Data Transformation: Normalization
By Arijit Chatterjee Dr
Data Mining 101 with Scikit-Learn
Azure Machine Learning 101
Dr. Morgan C. Wang Department of Statistics
Ninja Trader: Introduction to data mining in financial applications
Implementing Six Sigma Quality
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN

Background  Complex production process  Large volume of production data  Explore the underlying dependencies of production data  Improve production rate  Reduce defective products

Application Gap  Capacity what the data looks like  Capability how the data can be utilized  Knowledge how to perform knowledge discover and management

Application Gap

Solution: PDP-Miner  A data analytics platform customized for process optimization  Data mining techniques  Big data infrastructures

Concrete Case: PDP Manufacturing  Production line: 6000 meters  Production process: 75 assembling routines  Production time: average production time requires 76 hours  Production equipment: 279 major production equipment  Controlling parameters: more than 10,000 controlling parameters

Workflow

Key Questions  What are the key parameters whose values can significantly differentiate qualified products from defective products  How the parameter value changes affect the production rate  What are the effective parameter recipes to ensure high yield rate

Achievement 1  DATA ANALYTICS PLATFORM WITH 3 FUNCTIONS  1. Cross-language data mining algorithms integration  2. Real-time monitoring of system resource consumption  3. Balancing the node workload in clusters

Load Balance in Dynamic Environment

 The system can balance the workloads in a dynamically changed cluster  The entire system can be linearly extended with resources of different computing power.

Achievement 2  Regression modeling to describe the relationship between product quality and various parameters  Association based methods to identify feature combinations that can significantly improve the quality of product

Regression Analysis  Linear regression models  1. Ridge Regression  2. Lasso Regression

Regression Analysis Outcomes  1. The various of the humidity of the air has positive correlation with the yield rate  The pressure of air has positive correlation with the yield rate. The less the pressure changes, the higher the yield rate would be

Regression Analysis Outcomes

 Temperature and humidity have significant correlations with the product quality  When the surrounding temperature is under 27 centigrade, the number of defective products increases dramatically

Regression Analysis Outcomes

Association Based Classification  CARs: Class association rules  {r: F y}, F: a subset of the entire feature value set y: class label

Association Based Classification  For each CAR Support: s Confidence: c  Indicate how many records contain F and the ratio of records containing F that are labeled as y  A rule based classifier is built by selecting a subset of the CARs that cause the least error

Association Based Classification  Early detection strategy  If CARs refer to the features in the early manufacturing process, it could identify semi-finished defective products quickly, which prevent further resource waste.

System architecture: PDP-Miner  Data Analytics Platform  Data Analysis Modules

SYSTEM ARCHITECRTURE

DATA ANALYTICS PLATFORM  Easy operation for task configuration  Flexible supports for various programs  Effective resource management

DATA ANALYSIS MODULES  1. Data Exploration  2. Data Analysis  3. Result Management

DATA ANALYSIS MODULES: 1. Data Exploration  Comparison Analysis  1. Quickly identify parameters whose values are statistically different between two datasets  2. Be able to extract the top-k most significant parameters  Data Cube  1. Explore high dimensional data  2. Be able to conduct multi-level inspection of data

DATA ANALYSIS MODULES: 2. Data Analysis  Important Parameter Selection  Regression Analysis  Discriminative Analysis

DATA ANALYSIS MODULES: 3. Result Management  The important parameter list  Parameter value combinations  The regression model

Discriminative Analysis  Association based classification  Low-Support discriminative pattern mining

Low Support Discriminative pattern mining  Production process generates high dimension data, and it’s time-consuming when utilizes standard association rule based methods  SMP: Low support pattern mining pattern  Integrate SMP algorithm into PDP-Miner

Discover key parameters

Discover Key Parameters  Step 1: Separate data set into 2 categories based on product quality: GOOD(Qualified products) and SCRAP(Defective products)

Discover Key Parameters  Step 2: Generate hundreds of frequent parameter value combinations for each given dataset

Discover Key Parameters  Step 3: Extract the frequent combinations in SCRAP that are not frequent in GOOD and obtain the value combinations that result in defective products

Important Parameter Discovered Big red crosses indicate that the values present densely on SCRAP products Such a parameter value combination should be avoided in the production practice

Deployment Practice  1. Be able to control product quality and reduce cost by decreasing defective products  2. Be able to quick diagnosis on parameter values  3. Be able to verify and validate analytic results  4. PDP yield rate increased from 91% to 95%

Thank You ANY QUESTIONS?