CSCE 587: Big Data Analytics

Slides:



Advertisements
Similar presentations
Writing functions in R Some handy advice for creating your own functions.
Advertisements

R and HDInsight in Microsoft Azure
Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Real-Time Big Data Analytics From Deployment to Production 1 David Smith Revolution
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Charles Tappert Seidenberg School of CSIS, Pace University
Session 14: Getting More out of NOVEL Databases. Focusing Questions Why use an online database with students rather than a search engine or directory?
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
 Background & Overview  Business Model & Value Proposition  Consumer & Purchase Analysis  The E-Commerce Value Chain  Technical & Design Aspects.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level LiveSense Cloud Platform.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
OMIS 694, Big Data Analytics
Evaluating Web Pages Techniques to apply and questions to ask.
IoT Meets Big Data Standardization Considerations
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
IS6146 Databases for Management Information Systems Lecture 12: Exam Revision Rob Gleasure robgleasure.com.
A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
GIS Mapping for K-12 Students
Cloud Computing for Science
Big data toolbox.
SNS COLLEGE OF TECHNOLOGY
WEB SCRAPING FOR JOB STATISTICS
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Understanding Big Data
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Big Data A Quick Review on Analytical Tools
Personal spaces.
Where do we need it ? Why do we need it ? What is Data Analytics ?
Amazon, Apple, Facebook, and Google
Introduction to R Programming with AzureML
Big Data Intro.
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
DATA SCIENCE Online Training at GoLogica
Big Data Analytics in Parallel Systems
Cloudy with a Chance of Data
School of Information Management Nanjing University China
Introduction to R.
Secondary Data, Databases,
Data Analytics at CNU Dmitriy Shaltayev
Objectives To understand the about types of computer network
Introduction to Apache
Big Data.
Overview of big data tools
Family Search and the scanning of OCPL’s historical book collection.
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Big Data, Bigger Data & Big R Data
Dept. of Computer Science University of Liverpool
Big DATA.
Ch 1 .Installing and configuring SQL Server 2005
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Presentation transcript:

CSCE 587: Big Data Analytics John R. Rose Computer Science and Engineering University of South Carolina CSCE 587 8/24/17

Overview Big Data Analytics: What is it? First, what do we mean by data? Second what do we mean by analytics? Third what do we mean by big? CSCE 587 8/24/17

What do we mean by “data”? In principal, any kind of data: Corporate sales Email Tweets Sensor output Video Photos Omics Website click streams CSCE 587 8/24/17

What do we mean by “data”? What is the structure of this data: Corporate sales – structured (tables in a DB) Email – unstructured (free text) Tweets – unstructured (free text) Sensor output – structured (DB or stream) Video – unstructured Photos – unstructured Omics – semi-structured (XML-like DB) Website click streams – quasi-structured CSCE 587 8/24/17

What do we mean by “analytics”? Broadly refers to the method of analysis Depends on what we want to learn from the data. method/model used to make sense of the data. Depends on the nature of the data. CSCE 587 8/24/17

What do we mean by “analytics”? Example: When will social security go broke? Data: Historical data over 50 years Yearly balance Payments in Payments out Size of working population Size of retired population Life expectancy Analytical method: ? CSCE 587 8/24/17

What do we mean by “big”? Any thoughts on what we might mean by “big”? CSCE 587 8/24/17

What do we mean by “big”? Examples: Genomics data: human genome is 3 billion basepairs “mapping” of human genome exceeds 8 petabytes. New York Times public archive consists of millions of pdf files. Chemical reaction databases containing millions of reactions. Library of Congress collection of tweets: 170 billion tweets (as of January 8, 2013) Others? CSCE 587 8/24/17

What are we actually going to do? Depends on your background. What do you know about SQL? What about NoSQL? What do you know about statistical analysis? Regression Clustering Association rules Decision trees Neural networks Support vector machines Hidden Markov models CSCE 587 8/24/17

Plan 0 Introduce SQL or refresh your SQL memory Introduce R in the context of RStudio Review basic statistical methods Investigate advanced data mining techniques Investigate “Big Data” techniques CSCE 587 8/24/17

Plan 0 Introduce SQL or refresh your SQL memory Next lecture or two will cover SQL Will do more if needed. Introduce R in the context of RStudio Instructions for downloading R & RStudio on class webpage Install on your own machine to work at home Review basic statistical methods Will use RStudio for hands-on in class CSCE 587 8/24/17

Plan 0 Investigate advanced data mining techniques Will use methods implemented in R packages No need to rewrite existing tools More important to understand use and limitation of tools Investigate “Big Data” techniques Hadoop HDFS PIG HIVE CSCE 587 8/24/17

VMs Each student will have a unique VM VM names are of the following form: vm-Hadoop-XX.cse.sc.edu where XX is each student’s unique 2-digit number Special URLs Rstudio: http://vm-hadoop-XX.cse.sc.edu:8787 Ambari: http://vm-hadoop-XX.cse.sc.edu:8080 CSCE 587 1/15/13

VMs Special accounts: VM account: student VM initial password: qwertyCSCE587 You must change the VM password. Obviously, make sure you can remember it CSCE 587 1/15/13

VMs Special accounts: Ambari account: admin password: admin CSCE 587 1/15/13