Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.

Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By

 What is data mining ?  Data mining consists of five major elements  Why Mine Data?  Commercial Viewpoint  Scientific Viewpoint  Some of the techniques used for data mining

 Data Mining, also known as Knowledge- Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns.  It is the process of extraction of knowledge from large datasets.  Extremely large datasets.  Useful knowledge that can improve processes.

 Extract, transform, and load transaction data onto the data warehouse system.  Store and manage the data in a multidimensional database system.  Provide data access to business analysts and information technology professionals.  Analyze the data by application software.  Present the data in a useful format, such as a graph or table.

 Lots of data is being collected and warehoused  Web data, e-commerce  purchases at department/ grocery stores  Bank/Credit Card transactions  Computers have become cheaper and more powerful  Competitive Pressure is Strong  Provide better, customized services for an edge (e.g. in Customer Relationship Management)

 Data collected and stored at enormous speeds (GB/hour).  remote sensors on a satellite  telescopes scanning the skies  microarrays generating gene expression data  scientific simulations generating terabytes of data  Traditional techniques infeasible for raw data.  Data mining may help scientists.  in classifying and segmenting data

 Artificial neural networks - Neural networks are useful for pattern recognition or data classification, through a learning process. Non-linear predictive models that learn through training and resemble biological neural networks in structure.

 Neural Networks map a set of input-nodes to a set of output-nodes  Number of inputs/outputs is variable  The Network itself is composed of an arbitrary number of nodes with an arbitrary topology

 Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

heighthaireyesclass shortblondblueA tallblondbrownB tallredblueA shortdarkblueB talldarkblueB tallblondblueA talldarkbrownB shortblondbrownB

hair eyes B B A A dark red blond bluebrown

A classification technique that classifies each record based on the records most similar to it in an historical database.

CLUSTURING

Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

The greater the similarity (or homogeneity) within a group, and the greater the difference between groups, the “better” or more distinct the clustering.

A few good reasons...  Simplifications  Pattern detection

Basic K-means Algorithm for finding K clusters: 1. Select K points as the initial centroids. 2. Assign all points to the closest centroid. 3. Recompute the centroid of each cluster. 4. Repeat steps 2 and 3 until the centroids don’t change.

Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.

Similar presentations

Presentation on theme: "Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.

Similar presentations

Presentation on theme: "Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By."— Presentation transcript:

Similar presentations

About project

Feedback