Apache Bigtop Week 10, Testing. Unit Testing Programming in the small vs. Programming in the large Parlante’s link: codingbat.com unit tests for programming.

Slides:



Advertisements
Similar presentations
Beyond Mapper and Reducer
Advertisements

Apache Bigtop Working Group Cluster stuff. Cloud computing.
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
Jenkins User Conference San Francisco, Sept #jenkinsconf Business Process Model & Notation (BPMN) Workflows in Jenkins Max Spring Cisco
Developing in CAS. Why? As distributed you edit CAS 3 with Eclipse and build with Maven 2 – Best Practice for Release Engineering – Difficult edit-debug.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Using Eclipse. Getting Started There are three ways to create a Java project: 1:Select File > New > Project, 2 Select the arrow of the button in the upper.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. The Web Services Modeling Toolkit Mick Kerrigan.
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
Developing CAS in Eclipse
Object-Oriented Enterprise Application Development Tomcat 3.2 Configuration Last Updated: 03/30/2001.
Web Applications Basics. Introduction to Web Web features Clent/Server HTTP HyperText Markup Language URL addresses Web server - a computer program that.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
NDT Tools Tutorial: How-To setup your own NDT server Rich Carlson Summer 04 Joint Tech July 19, 2004.
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Using Ant to build J2EE Applications Kumar
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
HADOOP ADMIN: Session -2
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Maven & Bamboo CONTINUOUS INTEGRATION. QA in a large organization In a large organization that manages over 100 applications and over 20 developers, implementing.
MAVEN-BLUEMARTINI Yannick Robin. What is maven-bluemartini?  maven-bluemartini is Maven archetypes for Blue Martini projects  Open source project on.
Tomcat Spencer Uresk. Notes This is a training NOT a presentation Please ask questions This is being recorded
|Tecnologie Web L-A Anno Accademico Laboratorio di Tecnologie Web Introduzione ad Eclipse e Tomcat
HAMS Technologies 1
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Hive Facebook 2009.
INFSOM-RI Juelich, 10 June 2008 ETICS - Maven From competition, to collaboration.
Drexel University Software Engineering Research Group Git for SE101 1.
Development Environment Matthew Sell, CSSE Student MASS Research Participant, October 2014.
MapReduce High-Level Languages Spring 2014 WPI, Mohamed Eltabakh 1.
An Introduction to HDInsight June 27 th,
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
GumTree Development Environment Setup Windows Only Compatible with Eclipse 3.2 M3 (Last update: 16/11/05)
Ant & Jar Ant – Java-based build tool Jar – pkzip archive, that contains metadata (a manifest file) that the JRE understands.
Andy Pavlo November 29, 2015November 29, 2015November 29, 2015 Testing H- Store.
Running Kuali: A Technical Perspective Ailish Byrne (Indiana University) Jonathan Keller (University of California, Davis)
Overview of the Automated Build & Deployment Process Johnita Beasley Tuesday, April 29, 2008.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
Enterprise Java v090125Dev Env Overview1 Enterprise Java ( ) Development Environment Overview.
Build Systems Presentation December 14, 2015 Noon-1pm Kathy Lee Simunich Bldg. 203/ D120 Brought to you by: Argonne Java.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Maven. Introduction Using Maven (I) – Installing the Maven plugin for Eclipse – Creating a Maven Project – Building the Project Understanding the POM.
HADOOP Course Content By Mr. Kalyan, 7+ Years of Realtime Exp. M.Tech, IIT Kharagpur, Gold Medalist. Introduction to Big Data and Hadoop Big Data › What.
Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.
Using Ant in Eclipse Dwight Deugo Nesa Matic
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Moscow, November 16th, 2011 The Hadoop Ecosystem Kai Voigt, Cloudera Inc.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Open-O Integration Project Introduction
Maven 04 March
Hadoop Architecture Mr. Sriram
Chapter 11 Command-Line Master Class
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Instructor: Prasun Dewan (FB 150,
LING 408/508: Computational Techniques for Linguists
Operation System Program 4
JavaTeaching and Importing a github repository
Lecture 18 (Hadoop: Programming Examples)
Setup Sqoop.
CSE 491/891 Lecture 21 (Pig).
JENKINS TIPS Ideas for making your life with Jenkins easier
Git CS Fall 2018.
Charles Tappert Seidenberg School of CSIS, Pace University
Pig Hive HBase Zookeeper
Presentation transcript:

Apache Bigtop Week 10, Testing

Unit Testing Programming in the small vs. Programming in the large Parlante’s link: codingbat.com unit tests for programming in the small Apache rule: Before submitting patch to Hadoop Component, pass and verify all component unit tests. dougc [at] gmail25 dot com 2012 All Rights Reserved

Unit Testing Hadoop Unit Tests installed in bigtop Great reference: hadoop/ hadoop/ Run tests on downloaded hadoop : ant test Where are the bigtop shims for hadoop /1.0/.22? For hive/pig? Other shims are available but don’t work, have to pick at build time. In latest relese. Hive Pig in dougc [at] gmail25 dot com 2012 All Rights Reserved

Hadoop Unit Testing dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hadoop Unit Testing dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Unit Test Symlink Symlink src: 28:/usr/lib/hadoop$ sudo ln -s /usr/src/hadoop /usr/lib/hadoop/src dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Unit Test Permission sudo chmod 757 /usr/lib/hadoop, /usr/lib/hadoop/bin, /usr/lib/hadoop/sbin dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Unit Tests If running in AWS, setup Screen – sudo apt-get install screen screen-profiles screen- profiles-extras – Type screen, will see clear terminal window, start ant test, ctrl-a ctrl-d, logout, login again, type screen -r Ron’s fix: Modify /etc/hostname dougc [at] gmail25 dot com 2012 All Rights Reserved

Standalone/Bigtop Hadoop Unit Tests Results Standalone: Logs for each test under ~/hadoop /build/test Bigtop: /usr/lib/hadoop/build/test dougc [at] gmail25 dot com 2012 All Rights Reserved

Hadoop Mods to get Integration Tests to run (repeat) Copy testConf.xml: sudo cp /usr/src/hadoop/test/org/apache/hadoop/cli/testConf. xml /home/ubuntu/bigtop incubating/bigtop- tests/test- execution/smokes/hadoop/target/clitest_data/ Add Jackson dependency to pom.xml org.codehaus.jackson jackson-mapper-asl dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hadoop Integration Tests Running single integration test: mvn – Dit.test=org.apache.bigtop.itest.hadooptests.CL ASS verify Example: mvn - Dit.test=org.apache.bigtop.itest.hadooptests.Tes tTestCLI verify dougc [at] gmail25 dot com 2012 All Rights Reserved

Standalone Hbase Unit Tests ~/hbase-0.9.2/mvn –P localTest Running a single unit test: mvn test - Dtest=org.apache.hadoop.hbase.TestHServerA ddress dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hbase Unit Tests Don’t exist? Put in /usr/src/hbase like Hadoop and use Groovy shell to run? Project to get Hbase unit tests working in bigtop? Partition hbase unit tests into categories. One approach to issue requests, look at internal state and verify. Another approach only use public APIs, r/w to Hbase. Partition into 2 categories. MiniHbase mock objects in a single JVM process can be used in Bigtop. Different bugs in distributed mode vs. MiniMr/DFSCluster. Write this up as a project. PIG uses same test artifacat from unit test for bigtop. Missing pom goals Use – org.apache.bigtop.itest.JUnitUtils.groovy. For annotation support in Junit4/groovy. – org.apache.bigtop.itest.junit.OrderedParameterized.java; extension of Junit, Junit has all tests are stateless, order doesn’t matter. Tests are not stateless in bigtop, ordering requires run stages, specify which run stage; simple ints with ordering. By default are in run stage 0, if have tests case annotated -1 run stage will execute this first. dougc [at] gmail25 dot com 2012 All Rights Reserved

Org.apache.bigtop.itest.pmanager questions PackageManager/Abstract Class What is DEBPackage.groovy, ManagedPacakge.groovy, RPMPackage.groovy for? AptCmdLinePackageManager.groovy allows apt-get commands in Groovy? YumCmdLinePackageManager, RPMPackage, ZypperCmdLinePackageManager Bigtop spends time on packaging like apt-get install, no existing Java APIs to do this, install packages using Java Api. Used internally for Jenkins testing, tests in test-artifacts/package. Manifest driven in xml files for what is expected from package, files with xxx permissions, check and verify paths and permission. If you are introducing a new package you are responsible for this abstract class testing. dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hbase Integration Tests Bigtop-2.0-incubating/bigtop-tests/test- execution/smokes/hbase/mvn verify /home/ubuntu/bigtop incubating/system/TestLoadAndVerify.java // private static final long NUM_TO_WRITE_DEFAULT = 100*1000; private static final long NUM_TO_WRITE_DEFAULT = 10; //private static final int NUM_TASKS = 200; //private static final int NUM_REDUCE_TASKS = 35; private static final int NUM_TASKS=2; private static final int NUM_REDUCE_TASKS=2; dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hbase Integration Results dougc [at] gmail25 dot com 2012 All Rights Reserved

Pig/HiveMahout/Oozie/Flume Unit Tests ant test or mvn test Project: mavenize Hive:hard Project: Pig, easier? ~/hive-0.7.1/src/build.xml ~/pig-0.9.2/build.xml mahout-0.6-src, ~/mahout-distribution-0.6; mvn test; install core and src, 2 subdirectories with same name ~/mahout- distribution-0.6/mahout-distribution-0.6/pom.xml git clone mvn testhttps://github.com/yahoo/oozie.git git clone mvn testhttps://github.com/cloudera/flume.git dougc [at] gmail25 dot com 2012 All Rights Reserved

Pig unit test Results dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Pig Integration Tests Problem with mvn artifact… dougc [at] gmail25 dot com 2012 All Rights Reserved

Hive Notes dougc [at] gmail25 dot com 2012 All Rights Reserved Hive Unit Tests install own version of Hadoop ~/hive-0.7.1/src/build/hadoopcore/ Remove test TestHadoopThriftAuthBridge20S.java. Cant connect to Thrift Server, socket timeout > 6x.

Hive Unit Tests dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Hive Integration Tests Follow Pig format, work with Hive Unit Test Authors. Hive integration project suggestion. Hive/Pig on top of M/R with custom language. With a compiler, the unit tests have input/expected output. Hive unit tests are SQL code and verification afterwards. Was hard to retrofit vs. real cluster. Took *.SQL files from Hive and dumping them in Bigtop to take SQL files and compare actual/expected. Can you reuse the same test artifacts for unit tests and bigtop integration tests. Convert Hive Unit tests dougc [at] gmail25 dot com 2012 All Rights Reserved

Mahout Results dougc [at] gmail25 dot com 2012 All Rights Reserved

Flume Unit Test Results dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Flume Integration Tests Transition to FlumeNG. NG lost features from Flume. Too early. dougc [at] gmail25 dot com 2012 All Rights Reserved

Oozie Unit Test dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Oozie Integration Test What to set oozie_url to? Start the oozie service.. Runs only Oozie examples.jar. Project to create workflow for oozie. Integration testing on cluster needed here. Actions, to broaden data interfaces. actions, sqoop action. Good project for J2EE developers. dougc [at] gmail25 dot com 2012 All Rights Reserved

Bigtop Scripts mod Where to modify the bigtop install scripts to fix this? dougc [at] gmail25 dot com 2012 All Rights Reserved

Command line vs. Eclipse Create a Java Project, test programs using HDFS/Hive/Pig, etc… 2 ways to run the files, command line or in Eclipse. dougc [at] gmail25 dot com 2012 All Rights Reserved

Command Line vs. Eclipse This may be important when debugging in cluster and pseudo-distributed mode. Cluster loads the 3 conf/ files, core-site.xml, hdfs-site.xml, mapred-sire.xml. Some of the parameters are embedded.. Java Code may not properly init these params for cluster operation. Sometimes hard to debug dougc [at] gmail25 dot com 2012 All Rights Reserved

Hadoop CLI Command line uses bin/hadoop jarfilename.jar ClassName args Did this when running Pi from hadoop-xxx- examples.jar, test programs under jar dougc [at] gmail25 dot com 2012 All Rights Reserved

Hadoop CLI Set absolute path for log4j.properties PropertyConfigurator.configure("/Users/dc/Documents/ workspace/log4j.properties"); Properties files are outside of the jar. Web search for adding log4j.properties to jar are incorrect. Web search for setting class path are incorrect. dougc [at] gmail25 dot com 2012 All Rights Reserved

Eclipse Console Output 1/2 14:03:08,263 INFO TestHDFS:34 - Yo I am logger!!! created new jobconf finished setting jobconf parameters generateSampeInpuIf inputDirectory:file:/tmp/MapReduceIntroInput exists:true 14:03:08,588 INFO TestHDFS:59 - isEmptyDirectory 14:03:08,595 INFO TestHDFS:65 - num file status:4 14:03:08,596 INFO TestHDFS:75 - file:///tmp/MapReduceIntroInput is not empty 14:03:08,596 INFO TestHDFS:80 - A non empty file file:///tmp/MapReduceIntroInput/asdf.txt was found 14:03:08,597 INFO TestHDFS:46 - The inputDirectory file:/tmp/MapReduceIntroInput exists and is either a file or a non empty directory 14:03:08,598 INFO TestHDFS:111 - Generating 3 input files of random data, each record is a random number TAB the input file name dougc [at] gmail25 dot com 2012 All Rights Reserved

Eclipse Console Output 2/2 14:03:25,076 INFO JobClient:589 - Map input records=15 14:03:25,076 INFO JobClient:589 - Reduce shuffle bytes=0 14:03:25,076 INFO JobClient:589 - Spilled Records=30 14:03:25,077 INFO JobClient:589 - Map output bytes=303 14:03:25,077 INFO JobClient:589 - Total committed heap usage (bytes)= :03:25,077 INFO JobClient:589 - Map input bytes=302 14:03:25,077 INFO JobClient:589 - SPLIT_RAW_BYTES=358 14:03:25,078 INFO JobClient:589 - Combine input records=0 14:03:25,078 INFO JobClient:589 - Reduce input records=15 14:03:25,078 INFO JobClient:589 - Reduce input groups=15 14:03:25,078 INFO JobClient:589 - Combine output records=0 14:03:25,079 INFO JobClient:589 - Reduce output records=15 14:03:25,079 INFO JobClient:589 - Map output records=15 14:03:25,079 INFO TestHDFS:235 - The job has completed. 14:03:25,079 INFO TestHDFS:241 - The job completed successfully. dougc [at] gmail25 dot com 2012 All Rights Reserved

Command Line Output 21:44:36,765 INFO TestHDFS:35 - Yo I am logger!!! created new jobconf finished setting jobconf parameters generateSampeInpuIf inputDirectory:file:/tmp/MapReduceIntroInput exists:true 21:44:36,983 INFO TestHDFS:60 - isEmptyDirectory 21:44:36,990 INFO TestHDFS:66 - num file status:4 21:44:36,991 INFO TestHDFS:76 - file:///tmp/MapReduceIntroInput is not empty 21:44:36,991 INFO TestHDFS:81 - A non empty file file:///tmp/MapReduceIntroInput/asdf.txt was found 21:44:36,992 INFO TestHDFS:47 - The inputDirectory file:/tmp/MapReduceIntroInput exists and is either a file or a non empty directory 21:44:36,992 INFO TestHDFS:112 - Generating 3 input files of random data, each record is a random number TAB the input file name 21:44:36,999 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21:44:37,007 INFO TestHDFS:169 - The job output directory file:/tmp/MapReduceIntroOutput exists and is not a directory and will be removed 21:44:37,022 INFO TestHDFS:235 - Launching the job. dougc [at] gmail25 dot com 2012 All Rights Reserved

Command Line Output 2/2 21:44:53,523 INFO JobClient:589 - SPLIT_RAW_BYTES=358 21:44:53,524 INFO JobClient:589 - Combine input records=0 21:44:53,524 INFO JobClient:589 - Reduce input records=19 21:44:53,531 INFO JobClient:589 - Reduce input groups=19 21:44:53,531 INFO JobClient:589 - Combine output records=0 21:44:53,531 INFO JobClient:589 - Reduce output records=19 21:44:53,531 INFO JobClient:589 - Map output records=19 21:44:53,532 INFO TestHDFS:237 - The job has completed. 21:44:53,532 INFO TestHDFS:243 - The job completed successfully. dougc [at] gmail25 dot com 2012 All Rights Reserved

M/R Idioms What you get for free in M/R Sorting Duplicate Detection Design Pattern Notes: Object Churn, Thread Safety in Mappers – ThreadLocal vs. Atomic Ivars vs. Locks dougc [at] gmail25 dot com 2012 All Rights Reserved

M/R Idioms Hadoop Partitioner, multiple output files or 1 output file. – Job.setNumReduceTasks(1) same as merge sort – Default HashPartitioner – Create own for filtering, e.g. sending all keys which start with a common prefix to one specific file. dougc [at] gmail25 dot com 2012 All Rights Reserved

HDFS Idioms Serialization: Writable I/F order of magnitude performance improvement HDFS Block R/W JobTrackers/TaskTrackers/NameNodes. Each file operation directly goes to the NN. dougc [at] gmail25 dot com 2012 All Rights Reserved

Integration Testing Smoke Hadoop Component, /usr/lib Tests Exist under bigtop-tests HbaseYesIncrementalPELoad.java, TestHBaseCompression.java, TEstHBasePigSmoke.groovy, TestHBaseSmoke.java, TestHFileOutpuFormat.java, TestLoadIncrementalHFiles.java HiveYesHiveBulkScriptExecutor.java, IntegrationTestHiveSmokeBulk.groovy, TestHiveSmokeBulk.groovy, TestJdbcDriver.java PigNoYes, in Hbase ZookeeperNoPart of components MahoutNo dougc [at] gmail25 dot com 2012 All Rights Reserved

Integration Testing Smoke Hadoop ComponentTest Code existsProgram name WhirrNo, not needed? SQOOPYesIntegrationTestSqoopHive.g roovy, IntegrationTestSqoopHbase.groovy FlumeYesTestFlumeSmoke.groovy HadoopYesTestCLI.groovy, TesthadoopSmoke, TestHadoopExamples Package TestYesPackageTestCommon.groov y, HueYes, part of package test OozieYesTestOozieSmoke.groovy, StateVerifierZookeeper.gro ovy dougc [at] gmail25 dot com 2012 All Rights Reserved

Lab #4 Create an integration test Groovy runtime allows shell commands which allow you to use the scripts inside the components saving debugging time for the classpaths and environment files Alternatively use Java libraries, DFSCluster, MiniMRCluster, reverse engineer the env. vars settings, sequence of commands to run, Start from a HDFS file system then work way up to Bigtop Component dougc [at] gmail25 dot com 2012 All Rights Reserved

From Lab #3 Working map reduce program, run them using mvn verify. Have to make sure HDFS/Hadoop is running first dougc [at] gmail25 dot com 2012 All Rights Reserved

Create a Mahout artifact dir and child pom dougc [at] gmail25 dot com 2012 All Rights Reserved

Groovy Test Code Assumes HDFS running Configuration conf = new Configuration(); conf.addResource('mapred-site.xml’) Shell sh = new Shell("/bin/bash -s"); sh.exec("hadoop fs –mkdir /tmp/test”, “hadoop fs –copyFromLocal one /tmp/test/one”, “hadoop fs –cat /tmp/test/one”); dougc [at] gmail25 dot com 2012 All Rights Reserved

Future Labs Integrate unit testing into Bigtop More integration testing Integrate different versions of Hadoop Components(Hbase, Hive, etc) into Bigtop Mavenize an ant centric Hadoop Component, Pig, Hive Puppet Lab; Bigtop puppet code used in CDH4 to deploy/test Deploying and testing in cluster dougc [at] gmail25 dot com 2012 All Rights Reserved