Where do we need it ? Why do we need it ? What is Data Analytics ?

Slides:



Advertisements
Similar presentations
Big Data A big step towards innovation, competition and productivity.
Advertisements

This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Introduction to Hadoop and HDFS
Ethics of Big Data Eduardo Felipe Zecca da Cruz. What is Big Data? Stamford, Conn.-based IT research firm Gartner Inc. defines "big data" as "high-volume,
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
SUPPLY CHAIN OF BIG DATA. WHAT IS BIG DATA?  A lot of data  Too much data for traditional methods  The 3Vs  Volume  Velocity  Variety.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
CNIT131 Internet Basics & Beginning HTML
Connected Infrastructure
Big Data is a Big Deal!.
SAS users meeting in Halifax
Big Data Enterprise Patterns
Connected Living Connected Living What to look for Architecture
MapReduce Compiler RHadoop
Understanding Big Data
Hadoop Aakash Kag What Why How 1.
Hadoop.
An Open Source Project Commonly Used for Processing Big Data Sets
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Big Data.
CLOUDERA TRAINING For Apache HBase
Hadoopla: Microsoft and the Hadoop Ecosystem
Big-Data Fundamentals
NOSQL.
Connected Living Connected Living What to look for Architecture
WELCOME Mobile Applications Testing
Big Data Dr. Mazin Al-Hakeem (Nov 2016), “Big Data: Reality and Challenges”, LFU – Erbil.
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Cloud Computing By P.Mahesh
The Contemporary Firm 550 By: Beatriz Guzman
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Madrid Software Training Solutions Big Data Hadoop.
Hadoop Clusters Tess Fulkerson.
Department of Information Systems
SocialBoards Self-Service, Multichannel Support Ticket Notifications in Microsoft Office 365 Groups Help Customer Care Teams to Provide Better Care OFFICE.
Ministry of Higher Education
Week 02 Big Data.
System And Application Software
Big Data - in Performance Engineering
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Your gateway to cloud innovation
Big Data.
Big Data Overview.
Big Data Young Lee BUS 550.
TIM TAYLOR AND JOSH NEEDHAM
Database Systems Summary and Overview
IT Megatrends that shape the Digital Future…
Zoie Barrett and Brian Lam
Charles Tappert Seidenberg School of CSIS, Pace University
Business Intelligence
Big Data Analysis in Digital Marketing
AGENDA Buzz word. AGENDA Buzz word What is BIG DATA ? Big Data refers to massive, often unstructured data that is beyond the processing capabilities.
Big DATA.
Big-Data Analytics with Azure HDInsight
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Where do we need it ? Why do we need it ? What is Data Analytics ? It is a process of transforming processed Information into knowledge We need It to predict something relevant to our goal, increase profit and for efficient utilization of our resources We need it everywhere: Daily Life: Before buying the grocery, Before filling the fuel Business : To increase Sales Computer System : Various algorithms ( LRU, MRU , Command Queuing Algorithms)

AMAZON, SNAPDEAL, MYANTRA. Have U ever noticed that the product you select or viewed from , AMAZON, SNAPDEAL, MYANTRA. The Advertisement of the same product is reflect on all the websites and webpages over which U go?

U a GARLIC BREAD with CHEESY DIP absolutely free? How Dominos is giving U a offer a PIZZA free on Thrusday and may be giving U a GARLIC BREAD with CHEESY DIP absolutely free? WHY Supermarket stores are having SALES and how they are giving a product free? IF U HAVE SUCH QUESTIONS IN YOUR MIND THEN DIVE IN OCEAN OF ANALYSIS

DATA ANALYTICS (Play with data)

ANALYSIS needs DATA So needs to understand various types of DATA over which companies work STRUCTURED DATA SEMI STRUCTURED DATA UNSTRUCTURED BIG DATA

Structured Data It concerns all data which can be stored in database SQL  in table with rows and columns. They have relational key and  can be easily mapped into pre-designed fields. Today, those data are the most processed in development and the simplest way to manage information. But structured data represent only 5 to 10% of all informatics data.

Semi Structured Data Semi-structured data is information that doesn’t reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database (it could be very hard for some kind of semi structured data), but the semi structure exist to ease space, clarity or compute… Examples of semi-structured : CSV , XML documents are semi structured documents. But as Structured data, semi structured data represents a few parts of data (5 to 10%).

Unstructured data Unstructured data represent around 80% of data. It often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.  Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data. Just as with structured data, unstructured data is either machine generated or human generated.

Unstructured data Here are some examples of machine-generated unstructured data: Satellite images: This includes weather data or the data that the government captures in its satellite surveillance imagery. Just think about Google Earth, and you get the picture. Scientific data: This includes seismic imagery, atmospheric data, and high energy physics. Photographs and video: This includes security, surveillance, and traffic video. Radar or sonar data: This includes vehicular, meteorological, and oceanographic seismic profiles. The following list shows a few examples of human-generated unstructured data: Social media data: This data is generated from the social media platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr. Mobile data: This includes data such as text messages and location information. website content: This comes from any site delivering unstructured content, like YouTube, Flickr, or Instagram.

BIG DATA

BIG DATA Big Data may well be the Next Big Thing in the IT world. Big data burst upon the scene in the first decade of the 21st century. The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. Like many new information technologies, big data can bring about dramatic cost reductions, substantial improvements in the time required to perform a computing task, or new product and service offerings.

What is BIG DATA? ‘Big Data’ is similar to ‘small data’, but bigger in size but having data bigger it requires different approaches: Techniques, tools and architecture An aim to solve new problems or old problems in a better way Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.

What is BIG DATA Walmart handles more than 1 million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • Decoding the human genome originally took 10years to process; now it can be achieved in one week.

1st Character of Big Data Volume A typical PC might have had 10 gigabytes of storage in 2000. Today, Facebook ingests 500 terabytes of new data every day. Boeing 737 will generate 240 terabytes of flight data during a single flight across the US. The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.

2nd Character of Big Data Velocity  Clickstreams and ad impressions capture user behavior at millions of events per second high-frequency stock trading algorithms reflect market changes within microseconds machine to machine processes exchange data between billions of devices infrastructure and sensors generate massive log data in real-time on-line gaming systems support millions of concurrent users, each producing multiple inputs per second.

3rd Character of Big Data Variety Big Data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure. Big Data analysis includes different types of data

Storing Big Data Analyzing your data characteristics Selecting data sources for ANALYSIS Eliminating redundant data Establishing the role of NoSQL* Overview of Big Data stores Data models: key value, graph, document, column-family Hadoop* Distributed File System HBase Hive

HADOOP Hadoop is not a type of database, but rather a software ecosystem that allows for massively parallel computing. It is an enabler of certain types NoSQL distributed databases (such as HBase), which can allow for data to be spread across thousands of servers with little reduction in performance. Hadoop is an Apache Software Foundation project that importantly provides two things: A distributed filesystem called HDFS (Hadoop Distributed File System) A framework and API for building and running MapReduce jobs HDFS is structured similarly to a regular Unix filesystem except that data storage is distributed across several machines.

HADOOP

HADOOP Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005 and Daug named it after his son's toy elephant. Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

SO TILL NOW WE HAVE STUDIED REGARDING VARIOUS TYPES OF DATA Structured Data –> Maintained by RDBMS software Unstructured Data –> Managed by NOSQL databases BIG Data –> Handled by HADOOP (MAP REDUCE FRAMEWORK) BUT to INCREASE company/organisation PROFIT then after the management of these data, they needs to be HIGHLY ANALYZED.