Presentation is loading. Please wait.

Presentation is loading. Please wait.

Charles Tappert Seidenberg School of CSIS, Pace University

Similar presentations


Presentation on theme: "Charles Tappert Seidenberg School of CSIS, Pace University"— Presentation transcript:

1 Charles Tappert Seidenberg School of CSIS, Pace University
Data Science and Big Data Analytics Chap 9: Advanced Analytical Theory and Methods: Text Analysis Charles Tappert Seidenberg School of CSIS, Pace University

2 Data Analytics Lifecycle
Data Analytics Lifecycle Overview Phase 1: Discovery Phase 2: Data Preparation Phase 3: Model Planning Phase 4: Model Building Phase 5: Communicate Results Phase 6: Operationalize Case Study: GINA

3 2.1 Data Analytics Lifecycle Overview
Huge volume of data Not just thousands/millions, but billions of items Complexity of data types and structures Varity of sources, formats, structures Speed of new data creation and grow High velocity, rapid ingestion, fast analysis

4 2.2 Phase 1: Discovery Mobile sensors
Social media – 700 Facebook updates/sec in2012 Video surveillance Video rendering Smart grids Geophysical exploration Medical imaging Gene sequencing – more prevalent, less expensive

5 2.3 Phase 2: Data Preparation
image

6 2.4 Phase 3: Model Planning image

7 2.6 Phase 5: Communicate Results
Structured – defined data type, format, structure Transactional data, OLAP cubes, RDBMS, CVS files, spreadsheets Semi-structured Text data with discernable patterns – e.g., XML data Quasi-structured Text data with erratic data formats – e.g., clickstream data Unstructured Data with no inherent structure – text docs, PDF’s, images, video

8 2.7 Phase 6: Operationalize
image

9 2.8 Case Study: Global Innovation Network and Analysis (GINA)
image

10 1.2 State of the Practice in Analytics
Business Intelligence (BI) versus Data Science Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem and a New Approach to Analytics

11 Business Intelligence (BI) versus Data Science
image

12 Business Intelligence (BI) versus Data Science
image

13 Current Analytical Architecture
image

14 Current Analytical Architecture
image

15 Drivers of Big Data image

16 Emerging Big Data Ecosystem and a New Approach to Analytics
Four main groups of players Data devices Games, smartphones, computers, etc. Data collectors Phone and TV companies, Internet, Gov’t, etc. Data aggregators – make sense of data Websites, credit bureaus, media archives, etc. Data users and buyers Banks, law enforcement, marketers, employers, etc.

17 Emerging Big Data Ecosystem and a New Approach to Analytics
image

18 1.3 Key Roles for the New Big Data Ecosystem
image

19 Three Key Roles of the New Big Data Ecosystem
Deep analytical talent Advanced training in quantitative disciplines – e.g., math, statistics, machine learning Data savvy professionals Savvy but less technical than group 1 Technology and data enablers Support people – e.g., DB admins, programmers, etc.

20 Three Recurring Data Scientist Activities
Reframe business challenges as analytics challenges Design, implement, and deploy statistical models and data mining techniques on Big Data Develop insights that lead to actionable recommendations

21 Profile of Data Scientist Five Main Sets of Skills
image

22 Profile of Data Scientist Five Main Sets of Skills
Quantitative skill – e.g., math, statistics Technical aptitude – e.g., software engineering, programming Skeptical mindset and critical thinking – ability to examine work critically Curious and creative – passionate about data and finding creative solutions Communicative and collaborative – can articulate ideas, can work with others

23 1.4 Examples of Big Data Analytics
Retailer Target Uses life events: marriage, divorce, pregnancy Apache Hadoop Open source Big Data infrastructure innovation MapReduce paradigm, ideal for many projects Social Media Company LinkedIn Social network for working professionals Can graph a user’s professional network 250 million users in 2014

24 Focus of Course Focus on quantitative disciplines – e.g., math, statistics, machine learning Provide overview of Big Data analytics In-depth study of a several key algorithms


Download ppt "Charles Tappert Seidenberg School of CSIS, Pace University"

Similar presentations


Ads by Google