An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI)

Slides:



Advertisements
Similar presentations
R and HDInsight in Microsoft Azure
Advertisements

BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Technical Evangelist Tugdual “Tug” Grall BigData - NoSQL Hadoop - Couchbase.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Basic Marketing Research Customer Insights and Managerial Action
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
S EEQ C ORPORATION Big Data Oregon Connections Telecommunications Conference Dustin Johnson October 23, 2014.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
SQL vs NOSQL Discussion
Systems analysis and design, 6th edition Dennis, wixom, and roth
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Hadoop and HDFS
When bet365 met Riak and discovered a true, “always on” database.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Chapter 1 What is an Information System?. Learning Objectives Upon successful completion of this chapter, you will be able to: Define what an information.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
What is Big Data and Why Do We Need it?
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
MIS2502: Data Analytics Advanced Analytics - Introduction.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Monday, January 11,  INSTRUCTORS  STUDENTS:  Name?  Class?  Hometown?  Major?  Background: Math? Computers? Statistics?  Why did you take.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
NoSQL databases A brief introduction NoSQL databases1.
Information Eastman. Business Process Skills Order to Cash, Forecasting & Budgeting, etc. Process Modeling Project Management Technical Skills.
Big Data Yuan Xue CS 292 Special topics on.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1. Definition: Big data applies to the information that cant be processed or analyzed using traditional processes or tools. Case study:
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data Analytics (CS40003) Introduction to Data Lecture #1
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
NoSQL: Graph Databases
Big Data is a Big Deal!.
SNS COLLEGE OF TECHNOLOGY
BigData - NoSQL Hadoop - Couchbase
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
MIS2502: Data Analytics Advanced Analytics - Introduction
An Open Source Project Commonly Used for Processing Big Data Sets
Tutorial: Big Data Algorithms and Applications Under Hadoop
CS122B: Projects in Databases and Web Applications Winter 2017
Big Data.
Chapter 14 Big Data Analytics and NoSQL
DATA SCIENCE Online Training at GoLogica
Big Data.
Big Data Young Lee BUS 550.
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Big Data Analysis in Digital Marketing
Introduction to NoSQL Database Systems
Big DATA.
build a real time operational data lake in minutes.
Presentation transcript:

An Introduction to Predictive Analytics with Big Data and Open Source tools Joe Heary CTO & VP of Technical Operations Zimmerman Associates, Inc. (ZAI) November 5, 2015

What is Predictive Analytics “A variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.” - Wikipedia 11/5/2015 Leveraging Data to Lead 2

Predicting the Future  Not really about “predicting the future”  About using Data, Statistical Models, and Machine Learning to identify the likelihood of future outcomes from which we make decisions  Produce new insights that lead to better actions 11/5/2015 Leveraging Data to Lead 3

Machine Learning  Evolved from pattern recognition and computation learning theory in artificial intelligence  Construction of algorithms that can learn from data  Algorithms build models from example inputs to make data-driven predictions rather than static program instructions 11/5/2015 Leveraging Data to Lead 4 Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

What is Big Data? “Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.” -- Lisa Arthur, Forbes / CMO Network 11/5/2015 Leveraging Data to Lead 5 Refers to the AMOUNT of data in terms of:  VOLUME: the amount of data being generated  VARIETY: the type of data (pictures, videos, text, audio, etc.)  VELOCITY: the speed at which data is created or changes  VERACITY: the truthfulness or adherence to the truth  VALUE: the relative value of data to an organization

Big Data due to convergence of… Big Data Moore’s Law Mobile Computing Social Networking Cloud Computing Leveraging Data to Lead 11/5/20156

Data Growth Leveraging Data to Lead Atlantic Ocean = (est.) 100 Billion, billion Gallons of water As of 2010, we currently create 2.5 quintillion bytes of data daily (10 18 ) If 1 gallon = 1 byte… 11/5/ Ken Gabriel, Director of DARPA, March 2012 The Atlantic Ocean could only contain the data created in Eric Schmidt, CEO of Google, 2010 Approx. 80% of all data is “unstructured”

Social Media’s Impact on Data Growth Leveraging Data to Lead 2010: Eric Schmidt, then CEO of Google, estimates we now create as much data every 2 days as did since the dawn of time through 2003 Source: Skloog Blog 11/5/20158

Data Processing before Big Data Leveraging Data to Lead 11/5/20159

NoSQL and Hadoop 11/5/2015 Leveraging Data to Lead 10 Big Data software framework for storing data and running applications on clusters of commodity hardware. Has the ability to handle virtually limitless concurrent tasks or jobs. Non-relational database in which data is stored and accessed from a model other than tabular relationships typical of Relational Database Management Systems (RDBMS)

SQL vs. NoSQL 11/5/2015 Leveraging Data to Lead 11 Vaes, Karem. "Database Variants Explained : SQL or NoSQL? Is That Really the Question?" Random Thoughts on Various Topics by an Information Technology Architect. Karim Vaes, 21 Jan Web. 3 Nov

NoSQL DB’s Classified by Data Model  Column: Accumulo, Cassandra, Druid, HBase, Vertica  Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB  Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB  Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog  Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB 11/5/2015 Leveraging Data to Lead 12

Hadoop Distributed Filesystem (HDFS) Leveraging Data to Lead 11/5/ Brings compute resources to the data Implements MapReduce to aggregate into useable summary data

Hadoop Distributed Filesystem (HDFS) 11/5/2015 Leveraging Data to Lead 14 Data Node A Data Node B Data Node C Data Node D Client Name Node TCP/IP Network Metadata: Data X -> 1,2,3 Data Y -> 4,5 Name Node contains metadata and location of the data

Shuffle/Sort MapReduce in Hadoop Filesystem 11/5/2015 Leveraging Data to Lead 15 Input Data Map Reduce Aggregate Output Big Data No rows of data like RDBMS, only Key-value pairs

11/5/2015 Leveraging Data to Lead 16

Marketing Campaign  1,000,000 prospects  $2 each to mail ($2M)  1% (1 out of 100) will buy (10,000)  $220 revenue per sale 11/5/2015 Leveraging Data to Lead 17 ($220 x 10,000) = $2,200,000 - ($2 x 1,000,000) = $2,000,000 Profit = $200,000

Assigning a Predictive Score 11/5/2015 Leveraging Data to Lead 18 Siegel, E. (2013). Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Hoboken: Wiley

Targeted Marketing with PA  PA results tell us which prospects are likely to respond  ID 25% of prospects on list are 3X’s more likely to respond  1M reduced to 250,000 with a 3% response rate (7,500)  $220 revenue per sale  $1,150,000 (452.5% increase) in profit 11/5/2015 Leveraging Data to Lead 19 ($220 x 7,500) = $1,650,000 - (2$ x 250,000) = $500,000 Profit = $1,150,000

Recommendations: Similar to Others 11/5/2015 Leveraging Data to Lead 20

Recommendations: Closer to Home Leveraging Data to Lead 11/5/201521

Top 20 Open Source PA Software 11/5/2015 Leveraging Data to Lead 22 predictive-analytics-freeware-software/ There are several Open Source and Freeware products available to perform Predictive Analytics “R” is one of the most popular, but the link below will provide plenty to choose from

Wrap-up and bring it home  Convergence of technology leads to Big Data  You’re best bet is listening to what the data tells you rather than asking for an answer to a question that you already know the answer to  Real Benefits of Predictive Analytics is the ability to find patterns in data that you were not aware of before  Creating new markets and new opportunities based on data analysis Using Predictive Analytics with Big Data is truly using data to lead! Leveraging Data to Lead 11/5/201523

Question & Answer Leveraging Data to Lead 11/5/201524