Download presentation
Presentation is loading. Please wait.
Published byKerry Poole Modified over 9 years ago
2
Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data Analytics tools Hadoop Data Analytics Application Recommendations
3
Introduction What is data ? What is big data ? Analysis v/s Analytics
4
WHAT IS DATA.. ? Collection of Facts and Statistics
5
CLASSIFICATION OF DATA Structured High degree of organization such as relational database Unstructured Information that is difficult to organize using traditional mechanisms Eg: Facebook, Whatsapp, Gmail WHAT IS DATA.. ? (contd..)
6
WHAT IS BIG DATA Complex and Dynamic 3V 90% of World’s DATA produced in Last 2 year -IBM
7
ANALYTICS Vs ANALYSIS ANALYTICS Extensive use of mathematics & statistics, use of descriptive techniques and predictive models to gain valuable knowledge ANALYSISANALYTICS Why did something happen?What is likely to happen?
8
WHY DATA ANALYTICS ? From Reactive strategy to proactive strategy: Helped in Determining President of America
9
DATA ANALYTICS IN REAL WORLD WALLMART Using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly
10
REAL WORLD APPLICATIONS (contd..) Medical diagnostics company analyzed and developed first non-intrusive test for predicting coronary artery disease:. Researchers analyzed over 100 million gene samples Identified the 23 primary predictive genes for coronary artery disease The resulting test, known as the “Corus CAD Test,” was recognized as on of the “Top Ten Medical Breakthroughs of 2010” by TIME Magazine
11
Data Analytics terminology Data mining Data Warehousing OLAP Big Data Analytics Business Analytics Descriptive Analytics Predictive Analytics 11
12
PREDICTIVE ANALYTICS Extracting information from existing data sets in order to determine patterns and predict future outcomes and trends Predictive analytics is an enabler of big data Faster, cheaper computers and easier-to-use software
13
PREDICTIVE ANALYTICS ( contd..)
14
What Is Machine Learning 14 Type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Some Application Of ML Spam filtering Topic Spotting Weather pridiction Medical diagnosis Fraud Detection
15
Types Of Machine Learning 15 Supervised learning:
16
Types Of Machine Learning 16 UnSupervised learning:
17
Some Algorithms Used For ML 17 Linear Regression Decision Tree Naïve Byes theorem K-means Algorithm
18
SOME DATA ANALYTICS TOOLS 18
19
R R is a programming language Open Source environment High Availability An interpreted Language Good data handling capability Most advanced graphical capability R support procedural and object oriented programming Get better result faster 19
20
SAS SAS is a commercial software developed by SAS institute It is expensive Easy to learn Good data handling capability SAS releases updates in controlled environment SAS provide dedicated customer support 20
21
DATA ANALYTICS IN CANADIAN RAILWAY 21
22
IBM PURE DATA ANALYTICS TOOLS Fast and Easy Set Up Peta scale user data capacity Better Access to Information Customized Analytics Integrated third party software 3 X faster scan rate 128 GB/sec scan rate per rack 50% greater data capacity per rack 22
23
DATA ANALYTICS PLATFORM 23
24
DATA ANALYTICS PLATFORMS (contd.) Cloudera Cloudera Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. First company to develop and distribute Apache Hadoop-based software. Use Cloudera management suite to automate the installation process It uses HDFS component for file system access Centralized metadata architecture 24
25
Hortonworks Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop It is a completely open source platform based on Apache Hadoop for analysing, storing and managing big data It is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks It uses HDFS component for File system access Centralized metadata architecture 25 DATA ANALYTICS PLATFORMS (contd.)
26
HADOOP Apache Hadoop is an open-source software framework written in java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware
27
HDFS Specially designed file system for storing huge data sets with cluster of commodity hardware with streaming access pattern
28
MAP REDUCE Apache Hadoop MapReduce is a framework for processing large data sets in parallel across a Hadoop cluster. Data analysis uses a two step map and reduce process MapReduce is a programming model Google has used successfully is processing its “big-data” sets (~ 20000 peta bytes per day) Users specify the computation in terms of a map and a reduce function
30
EXISTING CHALLENGES IN INDIAN RAIL SYSTEM Delays Signaling problem Broken down trail Congestion QoS One Solution to these problems can be Analysis of BIG Data through Predictive maintenance Big Data in the Rail industry can be used in Predictive analysis to predicts fault before they happen, thus improving the services
31
PREDICTIVE MAINTENANCE: BIG DATA ON RAILS
32
PREDICTIVE MAINTENACE (contd…) Choose the right system or subsystem for prediction The prediction possibility zone Prediction effectiveness zone Identify the required data sets as early as possible. Identify the value-add of PM for maintenance strategies Complement your data science team with rail expertise Look for the right skills when hiring data scientists
33
CHOOSING THE RIGHT SYSTEM OR SUBSYSTEM FOR PREDICTION The prediction possibility zone Prediction effectiveness zone
34
APPLICATION OF DATA ANALYTICS IN INDIAN RAILWAYS
35
Automatic vehicle location
36
PASSENGER INFORMATION SYSTEM
37
AUTOMATED FARE COLLECTION Using ticket vending machine Using smart card that provides access to all type of transit services across multiple operating agencies AFC Analytics provides details of passengers are using systems, identify the trends and help improve the services
38
AUTOMATED PASSENGER COUNTING No of passengers boarding de-boarding each vehicle in a particular Station Rate of Increase of passengers can be predicted over the years by using the recorded data Peak hours in a day and Peak Months in a year can be identified These data can used to provide better services and project evolving ridership trends
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.