Download presentation
Presentation is loading. Please wait.
Published byHoratio Robertson Modified over 9 years ago
1
An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI) November 5, 2015
2
What is Predictive Analytics “A variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.” - Wikipedia 11/5/2015 Leveraging Data to Lead 2
3
Predicting the Future Not really about “predicting the future” About using Data, Statistical Models, and Machine Learning to identify the likelihood of future outcomes from which we make decisions Produce new insights that lead to better actions 11/5/2015 Leveraging Data to Lead 3
4
Machine Learning Evolved from pattern recognition and computation learning theory in artificial intelligence Construction of algorithms that can learn from data Algorithms build models from example inputs to make data-driven predictions rather than static program instructions 11/5/2015 Leveraging Data to Lead 4 Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley
5
What is Big Data? “Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.” -- Lisa Arthur, Forbes / CMO Network 11/5/2015 Leveraging Data to Lead 5 Refers to the AMOUNT of data in terms of: VOLUME: the amount of data being generated VARIETY: the type of data (pictures, videos, text, audio, etc.) VELOCITY: the speed at which data is created or changes VERACITY: the truthfulness or adherence to the truth VALUE: the relative value of data to an organization
6
Big Data due to convergence of… Big Data Moore’s Law Mobile Computing Social Networking Cloud Computing Leveraging Data to Lead 11/5/20156
7
Data Growth Leveraging Data to Lead Atlantic Ocean = (est.) 100 Billion, billion Gallons of water As of 2010, we currently create 2.5 quintillion bytes of data daily (10 18 ) If 1 gallon = 1 byte… 11/5/20157 - Ken Gabriel, Director of DARPA, March 2012 The Atlantic Ocean could only contain the data created in 2010 - Eric Schmidt, CEO of Google, 2010 Approx. 80% of all data is “unstructured”
8
Social Media’s Impact on Data Growth Leveraging Data to Lead 2010: Eric Schmidt, then CEO of Google, estimates we now create as much data every 2 days as did since the dawn of time through 2003 Source: Skloog Blog 11/5/20158
9
Data Processing before Big Data Leveraging Data to Lead 11/5/20159
10
NoSQL and Hadoop 11/5/2015 Leveraging Data to Lead 10 Big Data software framework for storing data and running applications on clusters of commodity hardware. Has the ability to handle virtually limitless concurrent tasks or jobs. Non-relational database in which data is stored and accessed from a model other than tabular relationships typical of Relational Database Management Systems (RDBMS)
11
SQL vs. NoSQL 11/5/2015 Leveraging Data to Lead 11 Vaes, Karem. "Database Variants Explained : SQL or NoSQL? Is That Really the Question?" Random Thoughts on Various Topics by an Information Technology Architect. Karim Vaes, 21 Jan. 2015. Web. 3 Nov. 2015.
12
NoSQL DB’s Classified by Data Model Column: Accumulo, Cassandra, Druid, HBase, Vertica Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB 11/5/2015 Leveraging Data to Lead 12
13
Hadoop Distributed Filesystem (HDFS) Leveraging Data to Lead 11/5/201513 Brings compute resources to the data Implements MapReduce to aggregate into useable summary data
14
Hadoop Distributed Filesystem (HDFS) 11/5/2015 Leveraging Data to Lead 14 Data Node A Data Node B Data Node C Data Node D 3 5 1 3 54 2 14 2 53 2 41 Client Name Node TCP/IP Network Metadata: Data X -> 1,2,3 Data Y -> 4,5 Name Node contains metadata and location of the data
15
Shuffle/Sort MapReduce in Hadoop Filesystem 11/5/2015 Leveraging Data to Lead 15 Input Data Map Reduce Aggregate Output Big Data No rows of data like RDBMS, only Key-value pairs
16
11/5/2015 Leveraging Data to Lead 16
17
Marketing Campaign 1,000,000 prospects $2 each to mail ($2M) 1% (1 out of 100) will buy (10,000) $220 revenue per sale 11/5/2015 Leveraging Data to Lead 17 ($220 x 10,000) = $2,200,000 - ($2 x 1,000,000) = $2,000,000 Profit = $200,000
18
Assigning a Predictive Score 11/5/2015 Leveraging Data to Lead 18 Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley
19
Targeted Marketing with PA PA results tell us which prospects are likely to respond ID 25% of prospects on list are 3X’s more likely to respond 1M reduced to 250,000 with a 3% response rate (7,500) $220 revenue per sale $1,150,000 (452.5% increase) in profit 11/5/2015 Leveraging Data to Lead 19 ($220 x 7,500) = $1,650,000 - (2$ x 250,000) = $500,000 Profit = $1,150,000
20
Recommendations: Similar to Others 11/5/2015 Leveraging Data to Lead 20
21
Recommendations: Closer to Home Leveraging Data to Lead 11/5/201521
22
Top 20 Open Source PA Software 11/5/2015 Leveraging Data to Lead 22 http://www.predictiveanalyticstoday.com/top- predictive-analytics-freeware-software/ There are several Open Source and Freeware products available to perform Predictive Analytics “R” is one of the most popular, but the link below will provide plenty to choose from
23
Wrap-up and bring it home Convergence of technology leads to Big Data You’re best bet is listening to what the data tells you rather than asking for an answer to a question that you already know the answer to Real Benefits of Predictive Analytics is the ability to find patterns in data that you were not aware of before Creating new markets and new opportunities based on data analysis Using Predictive Analytics with Big Data is truly using data to lead! Leveraging Data to Lead 11/5/201523
24
Question & Answer Leveraging Data to Lead 11/5/201524
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.