- Sachin Singh. Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal.

Slides:



Advertisements
Similar presentations
By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
Decision Tree Approach in Data Mining
System Design and Memory Limits. Problem  If you were integrating a feed of end of day stock price information (open, high, low, and closing price) for.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Managing Data Resources
Xyleme A Dynamic Warehouse for XML Data of the Web.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Data Mining.
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Business Intelligence
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Next Generation Techniques: Trees, Network and Rules
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
TEMPORAL DATA AND REAL- TIME ALGORITHMS CHAPTER 4 – GROUP 3.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Scientific Inquiry Mr. Wai-Pan Chan Scientific Inquiry Research & Exploratory Investigation Scientific inquiry is a way to investigate things, events.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining By Dave Maung.
Accelicon Confidential 1 Accelicon Overview MOS-AK April 4, 2008 Tim K Smith Accelicon Technologies.
Analysis of Complex Systems John Sherwood Period 2.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Solutions Summit 2014 SmartAnalysis Overview Terri Sullivan.
Chapter 1 Data Structures and Algorithms. Primary Goals Present commonly used data structures Present commonly used data structures Introduce the idea.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
Bi-directional incremental evolution Dr Tatiana Kalganova Electronic and Computer Engineering Dept. Bio-Inspired Intelligent Systems Group Brunel University.
CATEGORIZING COREWAR WARRIORS Nenad Tomašev, Doni Pracner.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Algorithms For Time Series Knowledge Mining Fabian Moerchen 沈奕聰.
CISB113 Fundamentals of Information Systems Data Management.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
DATA RESOURCE MANAGEMENT
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Warehousing MEC 623 – Data Warehousing and Data Mining.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
IT 5433 LM4 Physical Design. Learning Objectives: Describe the physical database design process Explain how attributes transpose from the logical to physical.
Managing Data Resources File Organization and databases for business information systems.
TECHNOLOGY IN ACTION. Chapter 11 Behind the Scenes: Databases and Information Systems.
CLASS INHERITANCE TREE (CIT)
Updating SF-Tree Speaker: Ho Wai Shing.
New Indices for Text : Pat Trees and PAT Arrays
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Efficient Aggregation over Objects with Extent
Presentation transcript:

- Sachin Singh

Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal

Classification Method Prediction model The C4.5 Tree algorithm Trans_IdAgeStudentCredit_ratingBuys_Computer noExcellentno YesExcellentno YesFairYes NoExcellentYes nofairyes YesExcellentYes Nofairno nofairno

Classification Tree

Analysis of Trees Current work focuses largely on generation of trees –Efficient algorithms –Disk Resident gigantic data sources –Improving accuracy of the generated models Motivation –Current research area – need for analysis

Areas of Analysis Two Sub Problems –Filtering Sub Problem –Comparison Sub Problem

Filtering Sub Problem Typical data warehouses are huge !! Generation of “Bushy” trees Not all outcomes are significant Need to filter trees based on the required outcomes

Filtering Sub Problem Full Classification Tree Filtered Classification Tree

Filtering Sub Problem Advantages –Efficient querying. Faster results –Easy Managed –Useful for comparison sub problem

Comparison Sub Problem Need to monitor changes in data trends by comparing the classification trees Levels of changes identified –Change in test (partition) value –Change in the partitions –Change in node levels –Change in outcome(leaves)

Comparison Sub Problem Issues –Structure of trees unpredictable –Comparing two trees with no standard structure

Solution XML Trees –Convert the tree structure in XML files –XML inherently tree structure –Take advantage of existing XML related technologies –Standard specs

Solution – Proposed File format

Approach Devise Algorithms to solve filtering and comparison problems Analyzing results of comparison in logical terms Measuring efficiency of the algorithms through time and space complexities

Progress