Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Similar presentations


Presentation on theme: "EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES."— Presentation transcript:

1 EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES

2  INTRODUCTION  MOTIVATION  IMPLEMENTATION  Core Logic (Map-Reduce Framework)  Job Scheduling  Load Balancing  HADOOP & GOOGLE APP ENGINE  CHALLENGES & ISSUES  PERFORMANCE ANALYSIS & RESULTS  QUESTIONS

3  GOOGLE APP ENGINE ?  Paas (Platform as a Service)  A platform for hosting Web Applications  Virtualizes applications across multiple servers and  Google – managed data centers

4  Project Description  Distribute computation across multiple servers and share the load across them  Use multiple accounts on App Engine  Task Tracker runs on each account  Job Tracker runs on a stand-alone machine

5  WHY GOOGLE APP ENGINE ?  WRITE THE CORE LOGIC OF APP & DEPLOY IT  NO NEED TO WORRY ABOUT DATA CENTERS  AUTOMATIC SCALING  FREE UPTO CERTAIN LIMIT  PAY AS WE GO FURTHER

6  WHAT WE DID ?  BUILT APPLICATIONS(INVERTED INDEX, WORDCOUNT, MOVIE RATINGS)  BUILT MAP – REDUCE FUNCTIONS FOR THESE APPLICATIONS  DEPLOYED THESE MAP/REDUCE FUNCTIONS ON TASK TRACKERS  A JOB TRACKER, ACTING AS A MASTER, DISTRIBUTES DATA THROUGH URLFETCH

7  PROVIDED A UI TO ENABLE THE USER TO UPLOAD INPUT DATA ON GOOGLE’S PERSISTENT STORAGE - BIGTABLE  LIBRARIES USED TO CONNECT TO THE PERSISTENT STORAGE : JDO/JPA  USER CAN CHOOSE THE APPLICATION TO BE RUN

8  JOB IS SUBMITTED TO JOB TRACKER  JOB TRACKER MAINTAINS A QUEUE OF JOBS  SCHEDULER  PRIORITY SCHEDULER THE USER CAN SPECIFY THE PRIORITY FOR THE JOB. BASED ON IT, JOB WILL BE INSERTED INTO THE QUEUE USED WHEN THE USER SPECIFIES A PRIORITY

9  FIFO SCHEDULER THE SUBMITTED JOB IS INSERTED AT THE BACK OF THE QUEUE A JOB IS PICKED FROM THE FRONT THUS RUNNING IN A FIFO FASHION DEFAULT SCHEDULER

10 RESOURCEDAILY LIMIT(FREE) MAX RATE (FREE) DAILY LIMIT(BILLED) MAX RATE(BILLED REQUESTS13,00,000 REQUESTS 7,400 REQUESTS/MIN 4,30,00,000 REQUESTS 30,000 REQUESTS/MIN OUTGOING BANDWIDTH 1 GB56 MB/MIN1 GB FREE ; 1046 GB MAX 740 MB/MIN INCOMING BANDWIDTH 1 GB56 MB/MIN1 GB FREE ; 1046 GB MAX 740 MB/MIN CPU TIME6.5 CPU HOURS15 CPU- MIN/MIN 6.5 CPU HOURS FREE; 1729 MAX 72 CPU- MIN/MIN

11  WHY ?  EVERY ACCOUNT HAS A FIXED QUOTA  DISTRIBUTION OF DATA ACROSS MULTIPLE TASK TRACKERS TO PERTAIN TO THE QUOTA  COST MODEL FOR LOAD BALANCING COST IS PROPORTIONAL TO THE AMOUNT OF DATA PROCESSED BY A TASK TRACKER DATA DIVIDED INTO EQUAL SIZED CHUNKS AND SENT TO THE TASK TRACKER’S MAP FUNCTION

12  HANDLING HUGE DATA SETS  DATA DIVIDED INTO CHUNKS  WHAT IF CHUNK SIZE IS HUGE ?? AT LEAST, ONE OF THE TASK TRACKER WILL FAIL, NO MATTER WHICH LOAD BALANCING ALGORITHM IS USED SOLUTION : DYNAMICALLY INCREASE THE NO. OF TASK TRACKERS IF ONE OF THEM FAILS AFTER A FIXED NO OF TRIALS.

13  LIMITED CONTROL ON GOOGLE APP ENGINE  NO SPAWNING OF THREADS  INABILITIY TO WRITE ON THE FILESYSTEM OF GOOGLE’S SERVER  NO CONTROL ON DATA LOCALITY  MACHINE ON WHICH DATA IS STORED, IS DYNAMICALLY ALLOCATED BY GOOGLE  IN HADOOP, THREADS AND FILE IO CAN BE DONE  IMPLEMENTING HADOOP USING GOOGLE APP ENGINE IS DIFFICULT

14  DATA RETRIEVAL IS NOT IN THE SAME ORDER AS DATA STORAGE BECAUSE OF GOOGLE’S STORAGE ARCHITECTURE  NO CONTROL ON USAGE OF NETWORK BANDWIDTH BETWEEN THE JOB TRACKER AND TASK TRACKERS  EXPENSIVE JOIN,UNION OPERATIONS WHEN NUMBER OF TABLES INVOLVED ARE HUGE.

15  RESULT SAME AS THAT WHEN RUNNING THE APPLICATION ON HADOOP.  TESTED WORDCOUNT APPLICATION ON A DATA SET CONSISTING OF 10000 WORDS USING 3 TASK TRACKERS  NETWORK BANDWIDTH IS A BOTTLENECK IN THE RUNTIME OF APPLICATION AS DATA HAS TO TRASNSFERRED FROM TASK TRACKERS TO JOB TRACKER AND VICE-VERSA.

16


Download ppt "EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES."

Similar presentations


Ads by Google