Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features Robert Dyer The research activities described in this talk were.

Slides:



Advertisements
Similar presentations
COMP 121 Week 9: AbstractList and ArrayList. Objectives List common operations and properties of Lists as distinct from Collections Extend the AbstractCollection.
Advertisements

CMSC 132: Object-Oriented Programming II Nelson Padua-Perez William Pugh Department of Computer Science University of Maryland, College Park.
New features in JDK 1.5 Can these new and complex features simplify Java development?
Java File I/O. File I/O is important! Being able to write and read from files is necessary and is also one common practice of a programmer. Examples include.
Core Java Lecture 4-5. What We Will Cover Today What Are Methods Scope and Life Time of Variables Command Line Arguments Use of static keyword in Java.
Java Review Interface, Casting, Generics, Iterator.
CERTIFICATION OBJECTIVES Use Class Members Develop Wrapper Code & Autoboxing Code Determine the Effects of Passing Variables into Methods Recognize when.
Exception Handling Chapter 12.  Errors- the various bugs, blunders, typos and other problems that stop a program from running successfully  Natural.
Slides prepared by Rose Williams, Binghamton University ICS201 Exception Handling University of Hail College of Computer Science and Engineering Department.
Java Planning our Programs Flowcharts Arithmetic Operators.
Chapter 10 Introduction to Arrays
1 Fall 2009ACS-1903 The break And continue Statements a break statement can be used to abnormally terminate a loop. use of the break statement in loops.
CS 206 Introduction to Computer Science II 01 / 21 / 2009 Instructor: Michael Eckmann.
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
Exceptions Three categories of errors: Syntax errors Runtime errors Logic errors Syntax errors: rules of the language have not been followed. Runtime error:
Java Review 2 – Errors, Exceptions, Debugging Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Fall 2008ACS-1903 for Loop Reading files String conversions Random class.
Java Review Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Chapter 3b Standard Input and Output Sample Development.
CS 206 Introduction to Computer Science II 01 / 23 / 2009 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 03 / 17 / 2008 Instructor: Michael Eckmann.
JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.
CSC – Java Programming II Lecture 9 January 30, 2002.
Example 1 :- Handling integer values public class Program1 { public static void main(String [] args) { int value1, value2, sum; value1 = Integer.parseInt(args[0]);
1 Review of Java Higher Level Language Concepts –Names and Reserved Words –Expressions and Precedence of Operators –Flow of Control – Selection –Flow of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
General Features of Java Programming Language Variables and Data Types Operators Expressions Control Flow Statements.
Very Brief Introduction to Java I/O with Buffered Reader and Buffered Writer.
07 Coding Conventions. 2 Demonstrate Developing Local Variables Describe Separating Public and Private Members during Declaration Explore Using System.exit.
Effective Java: Generics Last Updated: Spring 2009.
Introduction to Programming David Goldschmidt, Ph.D. Computer Science The College of Saint Rose Java Fundamentals (Comments, Variables, etc.)
The Java Programming Language
Basic Java Programming CSCI 392 Week Two. Stuff that is the same as C++ for loops and while loops for (int i=0; i
JAVA Tokens. Introduction A token is an individual element in a program. More than one token can appear in a single line separated by white spaces.
Geoff Holmes and Bernhard Pfahringer COMP206-08S General Programming 2.
Outline Character Strings Variables and Assignment Primitive Data Types Expressions Data Conversion Interactive Programs Graphics Applets Drawing Shapes.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
Object-Oriented Program Development Using Java: A Class-Centered Approach, Enhanced Edition.
Generics CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.
E FFECTIVE C# 50 Specific Ways to Improve Your C# Second Edition Bill Wagner محمد حسین سلطانی.
Efficiently Mining Source Code with Boa Robert Dyer The research activities described in this talk were supported in part by the US National Science Foundation.
Introduction to Java Lecture Notes 3. Variables l A variable is a name for a location in memory used to hold a value. In Java data declaration is identical.
G ENERICS I N J AVA BY: Ankit Goyal Sankalp Singh.
Java 1.5 The New Java Mike Orsega Central Carolina CC.
Lecture 4 Generic programming Advanced Java Programming 1 dr hab. Szymon Grabowski dr inż. Wojciech Bieniecki
Java 5 Part 2 CSE301 University of Sunderland Harry Erwin, PhD.
CMSC 132: Object-Oriented Programming II Java Constructs Department of Computer Science University of Maryland, College Park.
09/13/12All Rights Reserved - ADVANTEST CORPORATION1 Java 7 Highlights Presented By: Andrey Loskutov.
Demonstrating Programming Language Feature Mining using Boa Robert Dyer These research activities supported in part by the US National Science Foundation.
Exceptions and Assertions Chapter 15 – CSCI 1302.
GENERICS AND THE JAVA COLLECTIONS FRAMEWORK Lecture 16 CS2110 – Fall 2015 Photo credit: Andrew Kennedy.
Chapter 4 Generic Vector Class. Agenda A systemic problem with Vector of Object – Several approaches at a solution – Generic structures Converting classes.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Recitation 5 Enums and The Java Collections classes/interfaces 1.
What is Iterator Category: Behavioral Generic Way to Traverse Collection Not Related to Collection Implementation Details Not Related to the direction/fashion.
Peter Andreae Computer Science Victoria University of Wellington Copyright: Peter Andreae, Victoria University of Wellington Summary and Exam COMP 102.
Chapter 11: Advanced Inheritance Concepts. Objectives Create and use abstract classes Use dynamic method binding Create arrays of subclass objects Use.
Mining Programming Language Usage with Boa Robert Dyer These research activities supported in part by the US National Science Foundation (NSF) grants CNS ,
Bart van Kuik Application Developer Oracle Corporation.
Interfaces, Classes, Collections School of Engineering and Computer Science, Victoria University of Wellington COMP T2 Lecture 3 Marcus Frean.
CS 106 Introduction to Computer Science I 03 / 22 / 2010 Instructor: Michael Eckmann.
Mining Programming Feature Usage at a Very Large Scale Robert Dyer These research activities supported in part by the US National Science Foundation (NSF)
JAVA: An Introduction to Problem Solving & Programming, 6 th Ed. By Walter Savitch ISBN © 2012 Pearson Education, Inc., Upper Saddle River,
Session 02 Module 3: Statements and Operators Module 4: Programming constructs Module 5: Arrays.
Program Input/Output (I/O)
Program Analysis on Thousands of Projects
Operators Laboratory /11/16.
Starting JavaProgramming
Chap 2. Identifiers, Keywords, and Types
Presentation transcript:

Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features Robert Dyer The research activities described in this talk were supported in part by the US National Science Foundation (NSF) grants CCF , CCF , TWC , CCF , CCF , and CCF Tien N. Nguyen Hridesh Rajan Hoan Anh Nguyen

2 Previous Language Studies What languages do programmers choose? [Meyerovich&Rabkin SPLASH'13] Reflection [Livshits et al. APLAS'05] [Callaú et al. MSR'11] JavaScript / eval [Yue&Wang WWW'09] [Richards et al. PLDI'10] [Ratanaworabhan et al. WEBAPPS'10] [Richards et al. ECOOP'11] Generics [Basit et al. SEKE'05] [Parnin et al. MSR'11] [Hoppe&Hanenberg SPLASH'13] Object-oriented Features [Tempero et al. ECOOP'08] [Muschevici et al. OOPSLA'08] [Tempero ASWEC'09] [Grechanik et al. ESEM'10] [Gorschek et al. ICSE'10]

What is this study about? How have new Java language features been adopted over time? Assume Java Corpus of 30k+ projects Study 18 new features from 3 language editions Over 10 years of history

4 Research Questions RQ1: Are language features used before release? RQ2: How frequently is each feature used? RQ3: How did committers/teams adopt features? RQ4: Could features have been used more? RQ5: Was old code converted to use new features?

How is Java's language defined? Java Language Specifications (JLS)

6 JLS2Java 1.4May 2002 JLS3Java 5September 2004 JLS4Java 7July 2011 JLS5Java 8March 2014

7 JLS2: New Language Features Assert assert i > 0; assert n != null;

8 JLS3: New Language Features Enhanced-For Loop for (T v : items)... Annotation Test {} Enums enum E { N1,..} Annotation void m() Generic Variables List l; Map m; Varargs void m(T... arg){ Generic Types interface List {} Generic Methods void m(T a){ Generic Wildcards Class c;

9 JLS4: New Language Features Diamond Map m = new HashMap<>(); Binary Literals int ONE = 0b001; int TWO = 0b010; int FOUR = 0b100; Underscore Literals int MILLION = 1_000_000; int MASK = 0xFF_FF_00; Safe static List asList(T... elems) { Multi-catch try {.. } catch (E1 | E2 e) {.. } Try with Resources try (File f = new..) {

10 Study Tools and Dataset Boa [ICSE'13] input = project 1 input = project 2 input = project 3 input = project n Dataset Boa Program Assert Assert[ ] = 5 Assert[ ] = 12 Assert[ ] = 14 Assert[ ] = 18. Output Assert[ ] << 1; , 1 Assert[ ] << 1; , , , , , 1 Processes

11 Study Dataset Projects31,432 Revisions4,298,309 Java Files9,093,216 Java File Snapshots28,747,948 AST Nodes18,323,905,323

Research Question 1 Are language features used before release? Yes!

Research Question 2 How frequently was each language feature used?

14 Project Histogram: Annotation Use

15 Project Density: Annotation Use

16 Some features popular

17 Some features popular. Why?

18 Some features popular. Why? List ArrayList Map HashMap Set Collection Vector Class Iterator HashSet (confirms [Parnin et al. MSR'11])

Research Question 3 How did committers adopt features? Adoption by individuals, not teams (confirms [Parnin et al. MSR'11])

Research Question 4 Could features have been used more?

21 Opportunity: Assert void m(..) { if (cond) throw new IllegalArgumentException();... } void m(..) { assert cond;... } Find methods that throw IllegalArgumentException. Simpler Machine-checkable Easily disabled for production

22 Opportunity: Varargs void m(a1, a2, T[] a3) { void m(a1, a2, T... a3) { Find methods that take arrays as last argument. m(..,.., new T[] {t 1, t 2,..}) { m(..,.., t 1, t 2,..) {

23 Opportunity: Binary Literals int x = 1 << 5; Find where literal 1 is shifted left. short[] phases = { 0x7, 0xE, 0xD, 0xB }; short[] phases = { 0b0111, 0b1110, 0b1101, 0b1011 };

24 Opportunity: Underscore Literals int x = ; int x = 1_000_000; Find integers with 7 or more digits and no underscores.

25 Opportunity: Diamond List l = new ArrayList (); List l = new ArrayList<>(); Instantiation of generics not using diamond.

26 Opportunity: MultiCatch try {.. } catch (T1 e) { b1 } catch (T2 e) { b1 } try {.. } catch (T1 | T2 e) { b1 } A try with multiple, identical catch blocks.

27 Opportunity: Try w/ Resources try {.. } finally { var.close(); } try (var =..) {.. } Try statements calling close() in the finally block.

28 AssertVarargs Binary Literals DiamondMultiCatch Try w/ Resources Underscore Literals Old 89K612K56K3.3M341K489K5.3M New 291K1.6M5K414K24K33K507K Millions of opportunities!

Potential Uses Projects 18.18%88.78%5.9%59.08%49.75%37.27%51.15% 29 Actual Uses AssertVarargs Binary Literals DiamondMultiCatch Try w/ Resources Underscore Literals Projects 12.72%15.43%0.02%0.4%0.27%0.21%0.02% Millions of opportunities!

30 Impact: Potential for bugs BufferedReader br =...; String s = br.readLine(); br.close(); try (BufferedReader br =...;) { String s = br.readLine(); } throw new IOException();

31 Impact: Potential for bugs 193,768 instances sampling shows 50% accuracy Mine for methods that: 1. declare they throw IOException 2. do not catch IOException in body 3. contain a call to close() public void close() throws IOException { f.close(); } try {... } finally { f1.close(); f2.close(); } try { sock.close(); rec.close(); } catch (Exception e) { }

Research Question 5 Was old code converted to use new features?

33 Detecting Conversions potential N uses N potential N+1 uses N+1 uses N < uses N+1 potential N > potential N+1 File.java (Revision N) File.java (Revision N+1)

34 Detected lots of conversions! manual, systematic sampling confirms 2602 conversions 13 not conversions AssertVarargsDiamondMultiCatch Try w/ Resources Underscore Literals Count K8.5K Files K3.8K Projects

35 Similar usage patterns AssertVarargsDiamondMultiCatch Try w/ Resources Underscor e Literals Count K8.5K Files K3.8K Projects Old code converted to use new features Only few features see high use AssertVarargs Binary Literals DiamondMultiCatch Try w/ Resources Underscore Literals Old 89K612K56K3.3M341K489K5.3M New 291K1.6M5K414K24K33K507K All 380K2.2M61K3.7M365K522K5.8M Files 1.39%12.74%0.11%12.25%2.28%1.85%5.86% Projects 18.18%88.78%5.9%59.08%49.75%37.27%51.15% Despite (missed) potential for use Feature adoption by individuals To summarize...

36 Call to action!

37 Thank you!