Download presentation
Presentation is loading. Please wait.
Published byJeffry Shaw Modified over 9 years ago
1
Mining Large Data at SDSC Natasha Balac, Ph.D.
2
A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences Preservation and Archiving Today, data comes from everywhere – Scientific instruments – Experiments – Sensors and sensor nets – New devices And is used by everyone – Scientists – Consumers – Educators – General public IT environments must support unprecedented diversity, globalization, integration, scale, and use Turning the deluge of data into usable information requires an unprecedented level of integration, globalization, scale, and access
3
Why DATA MINING? Necessity is mother of invention Huge amounts of data Electronic records of our decisions – Choices in the supermarket – Financial records – Our comings and goings We swipe our way through the world – every swipe is a record in a database Data rich – but information poor Lying hidden in all this data is information!
4
What is DATA MINING? Extracting or “mining” knowledge from large amounts of data Data-driven discovery and modeling of hidden patterns (we never new existed) in large volumes of data Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data Fundamental idea: learn rules/patterns/relationships automatically from the data
5
Terminology Gold Mining vs. Sand Mining Knowledge mining from databases Knowledge extraction Data/pattern analysis Knowledge Discovery Databases (KDD) Predictive Modeling Machine Learning Business Intelligence
6
CRISP-DM (Cross Industry Standard Process for Data Mining) CRISP-DM Process Model
7
Data Mining Driven Engineering Product Design Incorporate parallel computing and data mining capabilities into engineering and optimizing product design models Complex challenges new product design –accurate acquisition/ interpretation of raw customer data –Integrating newly found knowledge in the engineering design process –developing analytical techniques that help reduce the computational time required to generate product portfolios. Mining paid search on-line customer preference data
8
A java based Data Driven Product Design (DDPD) Platform is developed that integrates the supercomputing resources at the SDSC with complex engineering design simulation platforms such as Matlab in an effort to streamline the product design and development process
10
Tools in the GUI Data Mining algorithms: Weka, Parallel Weka and Parallel C4.5, Parallel K-means Data Driven Product Design Platform utilizes Matlab’s powerful computation engine directly from the GUI. Optimization choices available from the user interface include Matlab, Tomlab, Excel Solver, Star-P, Parallel Matlab, Parallel CPLEX, etc.
11
Visual Representation of Data Mining results linking with serial optimization models
12
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.