Presentation is loading. Please wait.

Presentation is loading. Please wait.

Calculation of stock volatility using Hadoop and map-reduce

Similar presentations


Presentation on theme: "Calculation of stock volatility using Hadoop and map-reduce"— Presentation transcript:

1 Calculation of stock volatility using Hadoop and map-reduce
Ravi Teja Nekkanti U

2 Stock volatility: Volatility = swings(varies in short period of time)
Measure of relative performance of a stock High volatility stocks = more swings(and vice versa)

3 Whether the values plummets or raises, the fluctuations depend on day to day basis

4 The Hadoop framework Hadoop is open source java framework
Two parts : HDFS and Hadoop MapReduce HDFS splits files into small chucks Redundancy - replication Computing Cost Programming framework on top of hdfs Parallel computation

5 HADOOP DISTRIBUTED FILE SYSTEM
2 types of file system – local file system and HDFS Local file system – 4kb(block size) HDFS – 64 MB(block size) REPLICATIONS – 3

6 HDFS SERVICES OR DAEMONS
HDFS PROVIDES FIVE SERVICES NAME NODE SECONDARY NAMENODE DATA NODE JOB TRACKER TASK TRACKER

7 MASTER SERVICES AND SLAVE SERVICES
MASTER SERVICES OR MASTER DAEMONS ASSIGN TASKS FOR SLAVE SERVICES MASTER DAEMONS : NAME NODE, SECONDARY NAME NODE, JOB TRACKER SLAVE DAEMONS: DATA NODE, TASK TRACKER MASTER NODE SERVICES TALK TO EACH OTHER AND SLAVE SERVICES TALK TO EACH OTHER

8 DAEMONS MASTER DAEMONS SLAVE DAEMONS NAME NODE DATA NODE SECONDARY NN
TASK TRACKER JOB TRACKER

9 NAME NODE TO SOTRE DATA IN HDFS, THE FIRST CONTACT IS NAMENODE.
TELLS THE CLIENT WHERE THE DATA IS PRESENT STORES METADATA OF DATA TALKS TO SNN, DATA NODE, JOB TRACKER Receives heart beats from data nodes(for every 3 seconds)

10 Secondary name node STORES METADATA WHICH IS REPLICATED FROM NAMENODE
NAME NODE CAN EDIT LOGS AND HAS FS IMAGE SECONDARY NAME NODE PULLS FS IMAGE FROM NAMENODE REASONABLE PULL TIME IS ONE HOUR

11 JOB TRACKER JOB TRACKER ASSIGNS JOBS TO TASK TRACKER
JOB TRACKER ASKS FOR MATA DATA FROM NAME NODE JOB TRACKER SENDS JAR FILE TO TASK TRACKER WHERE DATA IS STORED

12 DATA NODE DATA NODE STORES THE DATA
SENDS BLOCK REPORT FOR EVERY 3 SECONDS TO NAMENODE REPLICATION OF DATA TASK TRACKER TALKS TO DATA NODE TO PERFORM JOB ON DATA

13 TYPICAL LAYOUT OF HDFS NODE NN JT SNN NODE 1 NODE 2 NODE 3 NODE 4
DN TT DN TT DN TT DN TT DN TT NODE 6 NODE 7 NODE 8 NODE 9 NODE 10 DN TT DN TT DN TT DN TT DN TT

14 MapReduce design This framework allows to solve problems by parallelized. Takes jar file from client and performs calculations Takes data as input splits Gives jar to each task tracker Mapper – depends on number of input splits and it performs data execution Reducer – aggregates data from mapper

15 Basic design of MapReduce framework

16 Files responsible from Hadoop framework
Core-site.xml – starts name node and secondary name node services Hdfs-site.xml – starts data node Mapred-site.xml – starts job tracker and task tracker services

17 Basic Hadoop commands List files : $HADOOP FS –LS /
Make dir : $HADOOP FS –MKDIR /GOLIB Put data : $ HADOOP FS –PUT INPUT /GOLIB Perform job : $ HADOOP JAR JARFILE.JAR DRIVERCLASS INPUT OUTPUT

18 Mapper and reducer input and output
Mapper reads input data line by line Mapper performs operations and gives output by key and value pairs Mapper generates intermediate data. Intermediate data is shuffled and sorted Reducer takes intermediate data and aggregates it as key and value pairs Number of output files depend on number of reducers

19 Map reduce flow chart Input file Input split Input split Input split
Record reader Record reader Record reader Record reader Mapper Mapper Mapper Mapper Intermediate data reducer reducer reducer reducer Record writer Record writer Record writer Record writer output

20 Java programming Input is taken as Text Input Format Wrapper class
Primitive data types Box classes Integer int IntWritable Float float FloatWritable Double doublt DoubleWritable String string Text Long long LongWritable

21 The program consists of 3 files – Mapper class, reducer class, main class
1000 stocks of nyse are taken and performed calculation on them. Output is given in the format of key and value pairs Stock volatility is calculated. As output is sorted, we can get the top 10 values

22 demo

23 Questions ???

24 Thank You


Download ppt "Calculation of stock volatility using Hadoop and map-reduce"

Similar presentations


Ads by Google