Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data - in Performance Engineering

Similar presentations


Presentation on theme: "Big Data - in Performance Engineering"— Presentation transcript:

1 Big Data - in Performance Engineering
Rachna Trivedi & Performance Test Analyst Quintiles IMS

2 Abstract Big data is a collection of large datasets that cannot be processed using traditional computing techniques. Big data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, variety, and velocity. Any big data project involves in processing huge volumes of structured and unstructured data and is processed across multiple, nodes to complete the job in less amount of time. At times because of poor design and architecture performance is degraded. Some of the areas where performance issues can occur are imbalance in input slits, redundant shuffle and sorts, moving most of the aggregation computations to reduce process and so on. Performance testing is conducted by setting up huge volume of data in an environment. In current whitepaper I would like to focus on Performance engineering concepts in BigData. How we can achieve performance engineering in BigaData/solutions provided and various tools and technologies.

3 Presentation Big data is a collection of large datasets that cannot be processed using traditional computing techniques. Big data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, variety, and velocity. Big Data Testing can be broadly divided into three steps "MapReduce“ Validation Data Staging Validation Output Validation Phase - Data from various source like RDBMS, weblogs, social media, etc. should be validated to make sure that correct data is pulled into system - Comparing source data with the data pushed into the Hadoop system to make sure they match. - Verify the right data is extracted and loaded into the correct HDFS location -To check the transformation rules are correctly applied To check the data integrity and successful data load into the target system To check that there is no data corruption by comparing the target data with the HDFS file system data - Map Reduce process works correctly - Data aggregation or segregation rules are implemented on the data -Key value pairs are generated -Validating the data after Map Reduce process

4 Approach Performance Testing – In Big Data
Performance Testing comprised of below activities :- Job Completion Time Memory Utilization Data Throughput Failures of Data Nodes Data ingestion and Throughout: In this stage, the tester verifies how the fast system can consume data from various data source. It also includes how quickly data can be inserted into underlying data store for example insertion rate into a Mongo and Cassandra database. Data Processing: It involves verifying the speed with which the queries or map reduce jobs are executed. It also includes testing the data processing in isolation when the underlying data store is populated within the data sets. For example running Map Reduce jobs on the underlying HDFS Sub-Component Performance: These systems are made up of multiple components, and it is essential to test each of these components in isolation. For example, how quickly message is indexed and consumed, MapReduce jobs, query performance, search, etc. Approach Understanding of Applications Setting Up Monitoring tools in Clustered environment Identify and design corresponding workloads Execute the test and analyses the result Scripts creation/ Updation

5 Performance Engineering Tools :-
Jmeter Sandstorm Dynatrace Script development for NoSQL, Messaging, MapReduce technologies User interface for Mongo, Cassandra, Hbase, Kafka, RabbitMQ, ActiveMQ, Hadoop Benchmarks API for MapReduce and other Big Data components Integrated resource monitoring Hadoop Cassandra, Mongo, Oracle NoSQL, Hbase Apache Kafka, Apache ActiveMQ, RabbitMQ JMX, SNMP Cloud platform for scalability, high volume testing Powerful analytics Hadoop Set Hbase Connection Config Hbase Operations Hadoop Job Tracker Sampler Hbase CRUD Sampler Hbase RowKey Sampler Analyze your Hadoop components Get enhanced insights for HDFS and MapReduce. Pinpoint problems at the code level

6 Performance Testing Challenges
Diverse set of technologies: As seen above, each sub component involved in a big data application belongs to a different technology. While we need to test each in isolation, no single tool can support each of the technologies. This is unlike the web applications where though the technology may differ but underlying communication is through Http. But, in this case the communication mechanism vary from component to components Unavailability of specific tools: No single tool can cater to each of the technology. For e.g. database testing tools for NoSQL might not be a fit for message queues. Similarly, custom tools and utilities will be require to test map reduce jobs Test scripting: There are no record and playback mechanisms for such systems. A high degree of scripting is required to design test cases and scenarios

7 References & Appendix BigData Technologies BigData in Performance Engineering BigData - Performance Engineering Approach BigData - Performance Engineering Tools BigData - Performance Testing Challenges

8 Author Biography Rachna Trivedi is an experienced, delivery-focused technology specialist in performance testing solutions provided with close to 12 years testing experience. She is currently working as a Performance Lead at Quintiles Ims health Technologies. She brings diverse technology solution experience in performance engineering for software products, bottleneck identification and diagnostics, profiling and tuning app and database servers. The recent focus has been on cloud computing, big data. Delivered more than 300 projects in 3 years in IMS Health and focussed delivery in 4 years for FIFA Client in Tech Mahindra at different client locations. Participated in STC in 2015,2016 and Rachna is also an active contributor to performance engineering forums.

9 Thank You!!!


Download ppt "Big Data - in Performance Engineering"

Similar presentations


Ads by Google