1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM Research, India NC State University, USA
2 Many programming languages are introduced over decades Motivation Business requirements force companies to release applications in multiple languages E.g., Lucene and WordNet have both Java and C# variants Three major reasons for developing variants in multiple languages For API libraries, to attract a large number of programmers For stand-alone applications, to acquire specific features of underlying languages For mobile applications, to support multiple platforms
3 Develop in one language and translate to other languages Example applications: Lucene.Net and Db4o Advantage: significant reduction of effort Many translation tools already exist E.g., Java2CSharp, Net2Java Key idea: replace APIs of one language with their corresponding APIs in another language via API mapping relations Trends in Developing Variants
4 Associate APIs of one language with APIs of the other language What Are API Mapping Relations? Help translate code from one language to the other language
5 Mapped APIs can have behavioral differences Differences among outputs or exceptions being thrown Such differences lead to defects in translated code Problem An Example from Lucene project Substring API Java: 2 nd parameter represents end index C#: 2 nd parameter represents #characters
6 Are such behavioral differences pervasive? What types of behavioral differences are there? What types of differences are more common than others? Are these differences easy to be resolved? Goals of Our Study
7 Mapping relations are not available explicitly and take long time to be written manually Extraction from tools : translation tools use different formats for specifying API mapping relations Extraction from translated code: applications under translation may not cover APIs of interest Extraction from translated code: translated code typically has compilation errors, not feasible for testing Challenges
8 A tool chain, called TeMAPI, that detects behavioral differences among API mapping relations Empirical results showing Behavioral differences are pervasive 8 findings on exposed behavioral differences and implications to API-library implementers&users Behavioral differences indicating defects in translation tools, and 4 defects were confirmed by developers Major Contributions
9 Motivation Study Setup Empirical Results Conclusion Outline
10 Subject libraries Study Setup Includes two major steps
11 Create wrapper for each API method in one lang Apply translation tools on the wrapper Extract the mapping relation from original & translated wrappers Ignore a mapping relation if the translated wrapper does not compile Step 1: Extract Mapping Relations
12 Step 2: Generate Test Cases Original Wrapper Translated Wrapper Apply translation tool Original Test case Translated Test case Generate test on original wrapper Execute test on translated wrapper Apply translation tool Two existing state-of-the-art test generation tools Pex: a dynamic-symbolic-execution-based test generation tool Randoop: a feedback-guided random test generation tool
13 Motivation Study Setup Empirical Results Conclusion Outline
14 We address the following research questions: Are behavioral differences pervasive in cross- language API mapping relations? What are the characteristics of behavioral differences concerning inputs and outputs? What are the characteristics of behavioral differences concerning method sequences? Research Questions
15 Columns E-Tests: #exception-causing test cases Column A-Tests: #assertion-failing test cases RQ1: Pervasiveness About 50% of the generated test cases fail: Behavioral differences are pervasive in API mapping relations between Java and C#
16 Finding % - handling of null inputs. Java.lang.Integer.parseInt(null, 10) ->NumberFormatException System.Convert.ToInt32(null, 10)->0 Implication API-library implementers should clearly define behaviors of null inputs Programmers should handle null inputs carefully. RQ2: Findings and Implications
17 Finding % - returned string values. ToString vs toString GetName vs getName Implication A method in Java and a method in C# typically return different string values even if they have the same functionality. ▪ Programmers should be cautious while using these values. RQ2: Findings and Implications
18 Finding % - input domains. java.lang.Boolean.parserBoolean(“test”)->false System.Boolean.Parse(“test”)->FormatException. Implication Programmers should be cautious while dealing with methods with odd input values. RQ2: Findings and Implications
19 Finding % - implementations. java.lang.Character.isJavaIdentifierPart(“\0”)->true ILOG.J2CsMapping.Util.Character.IsCSharpIdentifie rPart (“\0”)->false Implication Some differences reflect different natures of different languages, and some others indicate defects in translation tools. ▪ Programmers should learn the natures of different programming languages to figure out such differences, e.g., different definitions of paths and files. RQ2: Findings and Implications
20 Finding % - handling of exceptions. Implication API-library implementers may design different exception-handling mechanisms. If programmers do not notice these differences, they may introduce dead or defective code java.lang.StringBuffer.insert(int,char)->ArrayIndexOutofBoundsException System.Text.StringBuilder.Insert(int, char)-> ArgumentOutOfRangeException IndexOutOfRangeException RQ2: Findings and Implications
21 Finding % - constants. java.lang.Double.MAX VALUE -> E+308 System.Double.MaxValue -> E+308 Implication API-library implementers may store different values in constants, even if two constants have the same name. Programmers should be careful to use constants. RQ2: Findings and Implications
22 Finding 7. Different inheritance hierarchies that can lead to compilation errors. Implication When programmers translate code (e.g., cast statements), they should be aware of such differences. StringBufferInputStream var4 =...; InputStreamReader var10 = new InputStreamReader((InputStream)var4, var8); StringReader var4 =...; StreamReader var10 = new StreamReader((Stream)var4, var8); StringBufferInputStream is a subclass of InputStream StringReader is NOT a subclass of Stream RQ3: Findings and Implications
23 Finding % - method sequences. Implication Legal method sequences can become illegal after translation, due to various factors such as constraints in the target programming language and field accessibility. DateFormatSymbols var0 = new DateFormatSymbols(); String[] var16 = new String[]...; var0.setShortMonths(var16); DateTimeFormatInfo var0 = System.Globalization.DateTimeFormatInfo.CurrentInfo; String[] var16 = new String[]...; var0.AbbreviatedMonthNames = var16; InvalidOperationException RQ3: Findings and Implications
24 Tool chain + empirical study of exposing behavioral differences of API mapping relations Behavioral differences are pervasive and dangerous 8 findings with valuable implications for API-library implementers and users + 4 defects confirmed Conclusion Original WrapperTranslated Wrapper Apply translation tool Original Test caseTranslated Test case Generate test on original wrapper Execute test on translated wrapper Apply translation tool
25 Acknowledgment: NSF of China No , NSF of China No , NSF grants CCF , CCF , CNF , CNS , and an NSA Science of Security Lablet Grant
26 Tool chain + empirical study of exposing behavioral differences of API mapping relations Behavioral differences are pervasive and dangerous 8 findings with valuable implications for API-library implementers and users + 4 defects confirmed Conclusion Original WrapperTranslated Wrapper Apply translation tool Original Test caseTranslated Test case Generate test on original wrapper Execute test on translated wrapper Apply translation tool