Performance Comparison of Clustered Systems Yugandhar Maram, # Anjana Vadivel, # Stuthi Balaji, #
OUTLINE Motivation/Goals System architecture/tools used/Softwares integrated Related work and efforts Validation/Evaluation Results
Motivation and Goals To study the architecture of widely used distributed systems and fa miliarised ourselves with Hadoop and Spark and Google File Systems Aimed at analyzing the performance of these distributed systems under high work-loads. Hive DB and sparkSQL
System Architecture Hadoop Cluster with Database distributed across nodes. Spark Cluster using HDFS. HIVE (Issuing SQL queries to Hadoop Distributed system) SparkSQL (Issuing SQL queries to Spark Distributed system)
Tools used/Softwares Integrated Hadoop and Spark with Hive and SparkSQL atop those systems, respectively. TPC-H benchmark data for for Load generation. DBGen
Related work and efforts (cont.) Set up the Hadoop and Spark environment along with the Hive,SparkSQL databases of size 30 GB on the cluster. Issued TPCH benchmark SQL queries to the hive and SparkSQL databases that queries the database spread across the nodes of the systems.
Hive Query Results
THANK YOU!!