F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea The Visual Code Navigator: An Interactive Toolset For Source Code Investigation Eindhoven.

Slides:



Advertisements
Similar presentations
Chapter 8 Technicalities: Functions, etc. Bjarne Stroustrup
Advertisements

1 jNIK IT tool for electronic audit papers 17th meeting of the INTOSAI Working Group on IT Audit (WGITA) SAI POLAND (the Supreme Chamber of Control)
ONYX RIP Version Technical Training General. Overview General Messaging and What’s New in X10 High Level Print and Cut & Profiling Overviews In Depth.
Visualizing Dynamic Memory Allocations Sergio Moreta and Alexandru Telea Department of Mathematics and Computer Science Technische Universiteit Eindhoven,
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
Programming in Visual Basic
Lecture 1 Introduction to the ABAP Workbench
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
CPSC Compiler Tutorial 9 Review of Compiler.
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Environments and Evaluation
Software Evolution Visualization Lucian Voinea.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Guide To UNIX Using Linux Third Edition
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages C++ Programming:
Introduction to Systems Analysis and Design
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
C++ Functions. 2 Agenda What is a function? What is a function? Types of C++ functions: Types of C++ functions: Standard functions Standard functions.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Activity 1 - WBs 5 mins Go online and spend a moment trying to find out the difference between: HIGH LEVEL programming languages and LOW LEVEL programming.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
C++ Code Analysis: an Open Architecture for the Verification of Coding Rules Paolo Tonella ITC-irst, Centro per la Ricerca Scientifica e Tecnologica
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.
Copyright 2001 Oxford Consulting, Ltd1 January Storage Classes, Scope and Linkage Overview Focus is on the structure of a C++ program with –Multiple.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Creating your first C++ program
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Axel Naumann University of Nijmegen / NIKHEF, NL ROOT 2004 Users Workshop The Future of THtml Plans and Status of ROOT’s documentation facility.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
C++ Programming Language Lecture 2 Problem Analysis and Solution Representation By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Lucian Voinea Visualizing the Evolution of Code The Visual Code Navigator (VCN) Nunspeet,
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
Computer Systems & Architecture Lesson 4 8. Reconstructing Software Architectures.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Presentation Name / 1 Visual C++ Builds and External Dependencies NAME.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Chapter 1 Introduction Major Data Structures in Compiler
Compiler Construction (CS-636)
Programming Fundamentals Lecture No. 2. Course Objectives Objectives of this course are three fold 1. To appreciate the need for a programming language.
Intermediate 2 Computing Unit 2 - Software Development.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Cross Language Clone Analysis Team 2 February 3, 2011.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Connecting Architecture Reconstruction Frameworks Ivan Bowman, Michael Godfrey, Ric Holt Software Architecture Group University of Waterloo CoSET ‘99 May.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
T Project Review Wellit I1 Iteration
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Software Architecture in Practice
Software Development, Quality and Documentation Tool
Compiler Construction (CS-636)
Chapter 1: Introduction to Compiling (Cont.)
Software Development, Quality and Documentation Tool
ReSharper Dainius Kreivys.
Presentation transcript:

F. Boerboom, A. Janssen, G. Lommerse, F. Nossin, L. Voinea, A. Telea The Visual Code Navigator: An Interactive Toolset For Source Code Investigation Eindhoven University of Technology, the Netherlands

The Visual Code Navigator (VCN): an environment for interactive visualization of industry-size source code projects tuned for C/C++ code bases stored in CVS targets understanding code evolution and code structure based on three views with complementary purposes How can we extract facts from source code? Outline What can the VCN source code views show?

Fact extraction Notoriously difficult problem… Requirements (roughly): completeness: - extracts all elements & cross-refs from source code - extracts correct information - complies with latest C/C++ standard - includes preprocessor facilities tolerance: - handles incomplete/incorrect/ambiguous code efficiency: - memory/speed efficient on industry-size code bases availability: - can be built from source code, preferably cross-platform

Existing fact extractors ++ very good + good o could be better - limited -- unacceptable/missing ? insufficiently tested Testing: - get the tool as binary/source; try to build it - analyze very large systems (>0.5MLOC) - select extremely messy C/C++ code - try with/without includes (incomplete) - check output for size, correctness, completeness, throughput - investigate limitations’ causes

Conclusions Many surprises: most tools extract interface data quite ok … but badly fail at parsing implementation (function bodies) tolerance and completeness are mutually exclusive completeness and performance are also complementary GLR grammar based tools are by far the best Overall, we found just one reasonably good tool: Columbus However, it is: closed-source limited in some technical respects (template handling) quite slow (1 hr 20 min for ~ LOC) How can we do better than the above tools?

EFES: An own C/C++ fact extractor We chose to build an own extractor: based on the Elkhound C/C++ GLR parser uses a modified preprocessor, for tolerance extends the parser, for tolerance vs incomplete/incorrect code & handling templated code uses compression techniques to compact/speed up output So far: tests on very large projects (>200 MLOC) look good we are 3..7 times faster than Columbus we produce the ‘bare’ info, no metrics yet Hard, but unavoidable endeavour

EFES Architecture source:any C/C++ project, possibly incomplete/incorrect code preprocessor:libcpp, also used by GNU CPP parser:Elsa – uses the Elkhound GLR parser generator type checker:disambiguates code with type information filter:limits output to a set of interest (e.g. files, scopes, …) output generator:efficiently writes the output information to a file

EFES Enhancements Several enhancements to ‘standard’ fact extraction: preprocessor:enhanced CPP to produce exact location information (needed later for construct visualization & comparison) parser &enhanced Elsa to: type checker:- parse incorrect code with extra grammar rules – errors are caught at scope level - extended Elsa’s template support - added checkpoints at top-form level to trap internal errors filter:novel element; reduces output size dramatically, e.g. by skipping standard header information output added compact binary output; reduces output size 10 times generator: increases output speed 5 times project lets users customize extraction (C++ dialect, filtering, parser concept:strictness, what to output, etc)

Performance & Results Columbus EFES We are 3..7 times faster

Conclusions We’ve build a powerful C/C++ fact extractor: works on large projects (>200 MLOC) handles incorrect/incomplete code well extracts virtually all raw information there is is 3..7 times faster than a known commercial solution Desired additions distil raw information into more interesting facts (metrics, patterns, etc) add query layer atop basic extractor add interactive visualization layer atop query layer An evolving project

Visualization We have now our extracted facts: variables, types, functions, classes… cross-references between all these location information (file, line, column) of each construct We like to show it to the user & answer questions: how is the code structured? how are programming constructs distributed? how has the code changed in time? how are the typical function signatures used in a project? …and so on Several visualization tools

1) Syntactic view: 1 version, N files – code view Basic idea: combine a classical text editor with a pixel-based text display (e.g. SeeSoft) in a single view let users smoothly navigate between the two blend syntactic structures over code text using cushions + border size x cushion profile f(x) source code cushion texture result syntax tree

Syntactic view: Classical code editor…

Syntactic view: Blend in structure cushions…

Syntactic view: More structure cushions…

Syntactic view: Zoom out on 10 files, ~7000 LOC

Syntactic view: Zoom out, structure cushions only

Cushion vs ‘syntax highlighting’ syntax highlightingstructure cushions clasical syntax highlighting is actually lexical lighlighting we generalize and enhance syntax highlighting

Syntactic view: Navigation user points the mouse at some code location…

Syntactic view: Spot cursor …and brings the text in focus above the structure

Syntactic view: Structure cursor …over a whole syntactic construct, if desired.

Syntactic view - Conclusions Two main uses: 1)Overview: good for showing up to LOC on one screen colors code by construct type easy to spot presence/distribution of constructs in code 2)Detail: good for quick browsing a single source file gives structure context information typical question: “where was that function with that doubly-nested for ?”

2) Symbol view: N files, 1 version – interface view files ‘public’ symbols in files Displays public symbols in source files Nested by scope rules (global, namespace, method, argument) Visualized using a cushion treemap, colored by symbol type arguments functions fields typedefs files global vars

Symbol view - Details Treemap node size computation: - leafs: function bodies: number of LOC in declaration else number of LOC or sizeof() - non-leafs: sum of children Shading: - hue: construct type (typedef, function, argument, …) - saturation: construct nesting (global/class scope) Targeted questions: - “what kind of symbols are in a library’s headers?” - “how are namespaces used in interface headers?” - “does a header have a simple / uniform structure or not?” - “are there heavy functions from a parameter-passing view?”

Symbol view: Example C global namespace C++ std namespace brushed file symbols in file

3) Evolution view: M files, N versions time (version) axis file axis source code details Basic idea: CVSscan tool [Voinea & Telea, ACM SoftVis’05]

Evolution view: M files, N versions time (version) axis file axis extends the CVSscan tool [Voinea & Telea, ACM SoftVis’05] stacks several stripped-out file evolution views above each other line color = construct type helps spotting cross-file correlations (e.g. large changes) comments function bodies strings function headers

Evolution view - Results We look for: Large size jumps = large code changes Size jumps correlating across more files at same version = cross-system changes Less ‘wavy’ patterns = stable(r) files Horizontal patterns = unchanged code

Evaluation Method & materials: - VTK C++ library (1 MLOC, 100 versions) - 3 users with C++ but no VTK knowledge - 1 user with C++ and VTK knowledge (evaluator) - quantitative and qualitative questions to be answered on VTK with and without VCN are files fine/coarse grained? what is the typical class interface structure? what is the typical class implem. structure? find & describe a few large evolution changes what is the typical macro usage/frequency? what is the typical comment usage/frequency? Questions StxSym Evo preferred/first tool optional/second tool

Evaluation Results: VCN allowed getting answers (much) faster than by pure classical source code browsing views are complementary, serve different tasks in different ways a single view is usually not enough a fine-tuned, fast, integrated system is essential! users reluctant to work with lame/suboptimal tools symbol view text editor start fine insight syntax view evolution view interface? implementation?

Implementation Syntactic view: cushions: OpenGL textures - superimposed, not blended careful cushion border design (see paper) Symbol view: cushion treemap: OpenGL fragment programs essential for interactive, fast navigation! Evolution view: column cushions: OpenGL textures several LOC / pixel solve by software antialiasing efficient tool design essential for smooth navigation in large code bases important for user acceptance

Conclusions VCN: multi-view visual environment for understanding source code and its evolution Syntax view: 1 version, N files (compiler) Symbol view: 1 version, N version (linker) Evolution view: M versions, N files Dense pixel displays essential for viewing large datasets Cushion techniques effective for visualizing various kinds of visual nesting (syntax,symbol,file,…) Working to extend & generalize the VCN What to do when M,N exceed a few hundred? Check it out: