Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1.

Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1

Table of Contents  Part 1: Introduction and definition of Data Mining  History of Data Mining and the people who contributed to the creation of data mining.  Regression Analysis in comparison to Data Mining.  Neural Networks and their relation to Data Mining.  More History on Data Mining all the way up to the 1990’s and the present.  Part 2: Review of the database management techniques including the Knowledge Discovery in Databases.  Step 1: Selection  Step 2: Pre-Processing  Step 3: Transformation  Step 4: Data Mining 2

Table of Contents  Step 5: Interpretation/Evaluation  The different Methods involved with KDD  Method 1: Classification  Method 2: Regression  Method 3: Clustering  Method 4: Summarization  Part 3: Who uses Data Mining and how it is used in the Business World.  Companies who develop the software to perform Data Mining  Predictive analysis with IBM Analytics  Who uses Data Mining  How Data Mining is used with BitCoin.  Sources 3

What Is Data Mining? -Data Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related - also known as "big data") in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. -The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. 4

History of Data Mining -1763 Thomas Baye, an English statistician, published a theorem for relating current probability to prior probability. This allows the understanding of complex realities based on estimated probabilities. -Data Mining is the act of searching for unusual patterns in data. We use probability rules in decision trees to design these patterns. By using these patterns in data mining, the system gets efficient information about the data stored in the pattern. -1805 Adrien-Marie Legendre, a French mathematician, and Carl Friedrich Gauss, a German mathematician, applied regression to determine orbits of planets around the sun. The objective of the regression analysis is to estimate the relationships among variables. 5

Regression Analysis -Regression is a data mining function that predicts a number. -Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression methods. -A regression model could be used to predict the value of a house based on location, number of rooms, lot size, and other factors. -A regression task begins with a data set in which the target values are known. -For example, a regression model that predicts house values could be developed based on observed data for many houses over a period of time. -In addition to the value, the data might track other characteristics of the house such as the age and number of rooms. House value would be the primary target and the characteristics would be the predictors. 6

History of Data Mining - 1936 Alan Turing, a pioneering computer scientist, introduced the idea of a Universal Machine capable of performing computations like our modern day computers. The computers we use today are built on the concepts created by Turing. - 1943 Warren McCulloch, a cybernetician, and Walter Pitts, a logician in computational neuroscience, created the first conceptual model of a neural network. Each of the neurons can do 3 things: receive and process inputs, and generate outputs. 7

Neural Networks and Data Mining  Neural networks are an advanced data mining tool that is often used when other techniques have failed to produce a successful predictive model.  They have a biologically inspired modeling capability but are mainly used as statistical tools.  For example, Coors Brewers Ltd, used a neural network to take accumulated data about the quality and taste of their beers and learned about the relationship between the inputs to the beer and the outputs. The result was a list of flavors that could be predicted based off the information gathered about what a customer wants.  Now Coors can strategically decide on what flavors to produce for its customers. 8

History of Data Mining cont.  1970s With the creation of DMS’s, thanks to Edgar F. Codd, it is possible to store and query terabytes and petabytes of data. Data warehouses allowed users to move from a transactional way of thinking to an analytical way of seeing data.  Edgar Codd was an English computer scientist who invented the relational model for database management while working for IBM as a mathematical programmer.  1980s HNC trademarks the phrase ‘database mining’ in an act to protect a product called DataBase Mining Workstation. This was a general purpose tool for building neural network models. Also, sophisticated algorithms could finally “learn” relationships from data that allows subject matter professionals to understand what a relationship means. 9

More Facts on the History of Data Mining  1989 The term “Knowledge Discovery in Databases” is created by Gregory Piatetsky Shapiro.  Greg Shapiro is a Russian mathematician who was so fluent in mathematics that he skipped middle school to be admitted into the School of Physics and Mathematics in Moscow.  He then went on to Tel Aviv University in Israel where he wrote a program in APL to play the game Battleship.  After graduating from NYU with his PHD, he went on to join GTE Laboratories in 85 where he coined the term “Knowledge Discovery in Databases” after working on intelligent interfaces to databases.  Piatetsky is now the CEO of KDnuggets which is a company that covers the news in the fields of Data Analytics, Data Mining, and Data Science. 10

And More Facts on The History of Data Mining  1990s Retail, and financial companies began using “data mining” to analyze data and trends to expand their client base, predict future trends and fluctuations in interest rates and stock, and customer demands. 11

5 Steps of the Knowledge Discovery in Databases (aka Data Mining) Process 1.Selection 2.Preprocessing 3.Transformation 4.Data Mining 5.Interpretation/Evaluation 12

Step 1: Selection Analyzing entire data set Selecting target data set Developing variables 13

Step 2: Pre-processing Data cleaning Handling noise – Noise - extraneous data – Cleaning, removing completely, algorithm to deal with noise Dealing with missing data fields 14

Step 3: Transformation Data reduction Projection Exploratory analysis Model selection 15

Step 4: Data Mining Repeated iterative application of data mining methods Algorithm “teaches” computer Two approaches: –Statistical - not deterministic effects –Logical - purely deterministic Two high level goals –Description –Prediction 16

Step 5: Interpretation/Evaluation Visualization Verification Action 17

Selecting a method Model representation –How are we describing the data? Model evaluation –How does this method fit our goals? Search method –How are we sifting through the data? Parameter search Model search 18

Method 1: Classification Putting data into one of more broad categories – Decision making – Not perfect 19

Method 2: Regression Estimates relationships among variables (dependent and independent) – a+bX+e 20

Method 3: Clustering Organizes data into clusters of common traits –Similar to classification but not as cut and dry –Can be mutually exclusive or overlapping –Useful for pattern matching 21

Method 4: Summarization Applying mathematical equations to data – Mean – Standard deviation Automated report generation Visualization 22

Who Develops Data Mining Software? 23

IBM Analytics What Is Predictive Analytics? Software developed by IBM that brings together advanced analytics capabilities by using statistical analysis, predictive modeling, data mining, text analytics, entity analytics, optimization, real-time scoring, and machine learning. Example : IBM Predictive Analytics (IBM SPSS Modeler, IBM SPSS Statistics)IBM Predictive Analytics 24

Who Uses Data Mining? Businesses For example, a grocery chain used the data mining capacity of Oracle software to analyze local buying patterns.grocery chainOracle Businesses benefit by using data mining in a variety of ways. It allows them to analyze patterns on just about anything and use this information to increase profitability and fix problems within the business. 25

Who Uses Data Mining? (cont.) Bitcoin Miners BItcoin is a digital asset and payment system in which users “Mine ” for Bitcoin by using complex algorithms to extract the Bitcoin electronically. Here is how Bitcoin mining works!Bitcoin 26

Sources https://www.google.com/imghp?hl=en&tab=wi&ei=mxwQV761JMfCmwH3o4b4BA&ved=0EKouCBIoAQ http://www.ibm.com/analytics/us/en/technology/predictive-analytics/ http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm http://www.oracle.com/technetwork/database/options/advanced-analytics/odm/overview/index.html http://www.coindesk.com/information/how-bitcoin-mining-works/ http://rayli.net/blog/data/history-of-data-mining/ 27

Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1.

Similar presentations

Presentation on theme: "Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1.

Similar presentations

Presentation on theme: "Data Mining Vincent Lendoiro Zach Mitchell Tyler Hall April 21st,2016 1."— Presentation transcript:

Similar presentations

About project

Feedback