Calculation of stock volatility using Hadoop and map-reduce

Slides:



Advertisements
Similar presentations
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Advertisements

MapReduce.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology.
Developing a MapReduce Application – packet dissection.
Hadoop: The Definitive Guide Chap. 2 MapReduce
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
大规模数据处理 / 云计算 Lecture 3 – Hadoop Environment 彭波 北京大学信息科学技术学院 4/23/2011 This work is licensed under a Creative Commons.
HADOOP ADMIN: Session -2
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Hadoop. Introduction Distributed programming framework. Hadoop is an open source framework for writing and running distributed applications that.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Image taken from: slideshare
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Apache hadoop & Mapreduce
Unit 2 Hadoop and big data
Software Systems Development
INTRODUCTION TO BIGDATA & HADOOP
HADOOP ADMIN: Session -2
What is Apache Hadoop? Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created.
HDFS Yarn Architecture
Chapter 10 Data Analytics for IoT
Hadoop MapReduce Framework
MapReduce Types, Formats and Features
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Introduction to MapReduce and Hadoop
Central Florida Business Intelligence User Group
Overview of Hadoop MapReduce MapReduce is a soft work framework for easily writing applications which process vast amounts of.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
The Basics of Apache Hadoop
COS 418: Distributed Systems Lecture 1 Mike Freedman
Cloud Distributed Computing Environment Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Distributed Filesystem
Hadoop Basics.
Cse 344 May 4th – Map/Reduce.
Lecture 18 (Hadoop: Programming Examples)
Setup Sqoop.
VI-SEEM data analysis service
CS 345A Data Mining MapReduce This presentation has been altered.
Charles Tappert Seidenberg School of CSIS, Pace University
MAPREDUCE TYPES, FORMATS AND FEATURES
Bryon Gill Pittsburgh Supercomputing Center
Hadoop Installation Fully Distributed Mode
COS 518: Distributed Systems Lecture 11 Mike Freedman
Map Reduce, Types, Formats and Features
Presentation transcript:

Calculation of stock volatility using Hadoop and map-reduce Ravi Teja Nekkanti U00291556

Stock volatility: Volatility = swings(varies in short period of time) Measure of relative performance of a stock High volatility stocks = more swings(and vice versa)

Whether the values plummets or raises, the fluctuations depend on day to day basis

The Hadoop framework Hadoop is open source java framework Two parts : HDFS and Hadoop MapReduce HDFS splits files into small chucks Redundancy - replication Computing Cost Programming framework on top of hdfs Parallel computation

HADOOP DISTRIBUTED FILE SYSTEM 2 types of file system – local file system and HDFS Local file system – 4kb(block size) HDFS – 64 MB(block size) REPLICATIONS – 3

HDFS SERVICES OR DAEMONS HDFS PROVIDES FIVE SERVICES NAME NODE SECONDARY NAMENODE DATA NODE JOB TRACKER TASK TRACKER

MASTER SERVICES AND SLAVE SERVICES MASTER SERVICES OR MASTER DAEMONS ASSIGN TASKS FOR SLAVE SERVICES MASTER DAEMONS : NAME NODE, SECONDARY NAME NODE, JOB TRACKER SLAVE DAEMONS: DATA NODE, TASK TRACKER MASTER NODE SERVICES TALK TO EACH OTHER AND SLAVE SERVICES TALK TO EACH OTHER

DAEMONS MASTER DAEMONS SLAVE DAEMONS NAME NODE DATA NODE SECONDARY NN TASK TRACKER JOB TRACKER

NAME NODE TO SOTRE DATA IN HDFS, THE FIRST CONTACT IS NAMENODE. TELLS THE CLIENT WHERE THE DATA IS PRESENT STORES METADATA OF DATA TALKS TO SNN, DATA NODE, JOB TRACKER Receives heart beats from data nodes(for every 3 seconds)

Secondary name node STORES METADATA WHICH IS REPLICATED FROM NAMENODE NAME NODE CAN EDIT LOGS AND HAS FS IMAGE SECONDARY NAME NODE PULLS FS IMAGE FROM NAMENODE REASONABLE PULL TIME IS ONE HOUR

JOB TRACKER JOB TRACKER ASSIGNS JOBS TO TASK TRACKER JOB TRACKER ASKS FOR MATA DATA FROM NAME NODE JOB TRACKER SENDS JAR FILE TO TASK TRACKER WHERE DATA IS STORED

DATA NODE DATA NODE STORES THE DATA SENDS BLOCK REPORT FOR EVERY 3 SECONDS TO NAMENODE REPLICATION OF DATA TASK TRACKER TALKS TO DATA NODE TO PERFORM JOB ON DATA

TYPICAL LAYOUT OF HDFS NODE NN JT SNN NODE 1 NODE 2 NODE 3 NODE 4 DN TT DN TT DN TT DN TT DN TT NODE 6 NODE 7 NODE 8 NODE 9 NODE 10 DN TT DN TT DN TT DN TT DN TT

MapReduce design This framework allows to solve problems by parallelized. Takes jar file from client and performs calculations Takes data as input splits Gives jar to each task tracker Mapper – depends on number of input splits and it performs data execution Reducer – aggregates data from mapper

Basic design of MapReduce framework

Files responsible from Hadoop framework Core-site.xml – starts name node and secondary name node services Hdfs-site.xml – starts data node Mapred-site.xml – starts job tracker and task tracker services

Basic Hadoop commands List files : $HADOOP FS –LS / Make dir : $HADOOP FS –MKDIR /GOLIB Put data : $ HADOOP FS –PUT INPUT /GOLIB Perform job : $ HADOOP JAR JARFILE.JAR DRIVERCLASS INPUT OUTPUT

Mapper and reducer input and output Mapper reads input data line by line Mapper performs operations and gives output by key and value pairs Mapper generates intermediate data. Intermediate data is shuffled and sorted Reducer takes intermediate data and aggregates it as key and value pairs Number of output files depend on number of reducers

Map reduce flow chart Input file Input split Input split Input split Record reader Record reader Record reader Record reader Mapper Mapper Mapper Mapper Intermediate data reducer reducer reducer reducer Record writer Record writer Record writer Record writer output

Java programming Input is taken as Text Input Format Wrapper class Primitive data types Box classes Integer int IntWritable Float float FloatWritable Double doublt DoubleWritable String string Text Long long LongWritable

The program consists of 3 files – Mapper class, reducer class, main class 1000 stocks of nyse are taken and performed calculation on them. Output is given in the format of key and value pairs Stock volatility is calculated. As output is sorted, we can get the top 10 values

demo

Questions ???

Thank You