Download presentation
Presentation is loading. Please wait.
Published byJoel Jacobs Modified over 9 years ago
1
Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837
2
OUTLINE Motivation/Goals System architecture/tools used/Softwares integrated Related work and efforts Validation/Evaluation Results
3
Motivation and Goals To study the architecture of widely used distributed systems and fa miliarised ourselves with Hadoop and Spark and Google File Systems Aimed at analyzing the performance of these distributed systems under high work-loads. Hive DB and sparkSQL
4
System Architecture Hadoop Cluster with Database distributed across nodes. Spark Cluster using HDFS. HIVE (Issuing SQL queries to Hadoop Distributed system) SparkSQL (Issuing SQL queries to Spark Distributed system)
5
Tools used/Softwares Integrated Hadoop and Spark with Hive and SparkSQL atop those systems, respectively. TPC-H benchmark data for for Load generation. DBGen
6
Related work and efforts (cont.) Set up the Hadoop and Spark environment along with the Hive,SparkSQL databases of size 30 GB on the cluster. Issued TPCH benchmark SQL queries to the hive and SparkSQL databases that queries the database spread across the nodes of the systems.
7
Hive Query Results
8
THANK YOU!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.