Moonzoo Kim School of Computing KAIST

Slides:



Advertisements
Similar presentations
ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.
Advertisements

Programming Languages and Paradigms The C Programming Language.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 6: Type Systems Fall 2009.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 6 Type Systems I was eventually persuaded.
An Introduction to C Programming Geb Thomas. Learning Objectives Learn how to write and compile a C program Learn what C libraries are Understand the.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Initialization of Clang
 Value, Variable and Data Type  Type Conversion  Arithmetic Expression Evaluation  Scope of variable.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
A Look at the MIRV Compiler System EECS 483 Supplementary Information for Project 2 October 9, 2006.
/21 Clang Tutorial CS453 Automated Software Testing.
Lambda Expressions Version 1.0
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
BEGINNING PROGRAMMING.  Literally – giving instructions to a computer so that it does what you want  Practically – using a programming language (such.
School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.
TREES K. Birman’s and G. Bebis’s Slides. Tree Overview 2  Tree: recursive data structure (similar to list)  Each cell may have zero or more successors.
1 CSC103: Introduction to Computer and Programming Lecture No 16.
1 C++ Programming C++ Basics ● C++ Program ● Variables, objects, types ● Functions ● Namespace ● Tests ● Loops ● Pointers, references.
Automating Fortran - C Interoperability Prepared by: Sisi Liu (Garnet) Mentor: Dan Nagle Co-Mentor: Davide del Vento 1.
Lecture 9 Symbol Table and Attributed Grammars
Concepts of Programming Languages
Types Type Errors Static and Dynamic Typing Basic Types NonBasic Types
CS1010 Discussion Group 11 Week 5 – Functions, Selection, Repetition.
Code Analysis using Compiler Front-ends (Clang)
Chapter 9: Value-Returning Functions
01 – Overview of Compilers
Lambda Expressions Version 1.1
Chapter 7: Expressions and Assignment Statements
The mechanics of recursion
Lesson #6 Modular Programming and Functions.
A Simple Syntax-Directed Translator
Constructing Precedence Table
Parsing & Context-Free Grammars
Introduction to Parsing (adapted from CS 164 at Berkeley)
Introduction to C++ Systems Programming.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lesson #6 Modular Programming and Functions.
Semantic Analysis with Emphasis on Name Analysis
Programming Languages and Paradigms
Object Oriented Programming
Chapter 7: Expressions and Assignment Statements
Programming Paradigms
CS 536 / Fall 2017 Introduction to programming languages and compilers
Prepared By: G.UshaRani B.Pranalini A.S.Lalitha
CMPE 152: Compiler Design December 5 Class Meeting
Functions.
LLVM Pass and Code Instrumentation
Chapter 5 - Functions Outline 5.1 Introduction
Lesson #6 Modular Programming and Functions.
Recursive descent parsing
Syntax-Directed Translation
Introduction to C++ Programming
CSE 3302 Programming Languages
How to build a program analysis tool using Clang/LLVM 4.0.1
Trees Lecture 9 CS2110 – Fall 2009.
Moonzoo Kim School of Computing KAIST
Lesson #6 Modular Programming and Functions.
Announcements Prelim 1 on Tuesday! A4 will be posted today
Function.
Course Overview PART I: overview material PART II: inside a compiler
How to build a program analysis tool using Clang/LLVM 4.0.1
CMPE 152: Compiler Design April 9 Class Meeting
Chapter 11 Programming in C
Software Testing and Verificaton (SWTV) group
Programming Languages and Paradigms
Semantic Analysis Pu Youer
Function.
CPS125.
C Pointers Another ref:
Chapter 11 Introduction to Programming in C
Presentation transcript:

Moonzoo Kim School of Computing KAIST Clang Tutorial Moonzoo Kim School of Computing KAIST The original slides were written by Yongbae Park, yongbae2@gmail.com

Content Motivation of learning code analysis technique Overview of Clang AST structure of Clang Decl class Stmt class Traversing Clang AST

Motivation for Learning Code Analysis Technique Biologists know how to analyze laboratory mice. In addition, they know how to modify the mice by applying new medicine or artificial organ Mechanical engineers know how to analyze and modify mechanical products using CAD tools. Software engineers also have to know how to analyze and modify software code which is far more complex than any engineering product. Thus, software analysis/modification requires automated analysis tools. Using source level analysis framework (e.g., Clang, C Intermediate Language (CIL), EDG parser) Using low-level intermediate representation (IR) analysis framework (e.g., LLVM IR) 2018-12-28

Overview There are frequent chances to analyze/modify program code mechanically/automatically Ex1. Refactoring code for various purposes Ex2. Generate test driver automatically Ex3. Insert probes to monitor target program behavior Clang is a library to convert a C program into an abstract syntax tree (AST) and manipulate the AST Ex) finding branches, renaming variables, pointer alias analysis, etc Clang is particularly useful to simply modify C/C++ code Ex1. Add printf(“Branch Id:%d\n”,bid)at each branch Ex2. Add assert(pt != null)right before referencing pt

Example C code 2 functions are declared: myPrint and main main function calls myPrint and returns 0 myPrint function calls printf myPrint contains if and for statements 1 global variable is declared: global //Example.c #include <stdio.h> int global; void myPrint(int param) { if (param == 1) printf("param is 1"); for (int i = 0 ; i < 10 ; i++ ) { global += i; } int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0;

Example AST Clang generates 3 ASTs for myPrint(), main(), and global A function declaration has a function body and parameters ASTs for myPrint() AST for global ASTs for main()

Structure of AST Each node in AST is an instance of either Decl or Stmt class Decl represents declarations and there are sub-classes of Decl for different declaration types Ex) FunctionDecl class for function declaration and ParmVarDecl class for function parameter declaration Stmt represents statements and there are sub-classes of Stmt for different statement types Ex) IfStmt for if and ReturnStmt class for function return Comments (i.e., /* */, // ) are not built into an AST

Decl (1/4) A root of the function AST is a Decl node A root of function AST is an instance of FunctionDecl which is a sub-class of Decl Function declaration int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0;} 14 15 16 17 18 Legend

Decl (2/4) FunctionDecl can have an instance of ParmVarDecl for a function parameter and a function body ParmVarDecl is a child class of Decl Function body is an instance of Stmt In the example, the function body is an instance of CompoundStmt which is a sub-class of Stmt Function parameter declarations 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } Function body Legend

Decl (3/4) VarDecl is for a local and global variable declaration VarDecl has a child if a variable has a initial value In the example, VarDecl has IntegerLiteral Global variable declaration Initial value 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } Local variable declaration Legend

Decl (4/4) FunctionDecl, ParmVarDecl and VarDecl have a name and a type of declaration Ex) FunctionDecl has a name ‘main’ and a type ‘void (int, char**)’ Types 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } Names Legend Types

Stmt (1/9) Stmt represents a statement Subclasses of Stmt CompoundStmt class for code block DeclStmt class for local variable declaration ReturnStmt class for function return 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } Statements Legend

Stmt (2/9) Expr represents an expression (a subclass of Stmt) Subclasses of Expr CallExpr for function call ImplicitCastExpr for implicit type casts DeclRefExpr for referencing declared variables and functions IntegerLiteral for integer literals 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } Expressions (also statements) Legend

Stmt (3/9) Stmt may have a child containing additional information CompoundStmt has statements in a code block of braces (“{}”) int param = 1; 14 15 16 17 18 int main(int argc, char *argv[]) { int param = 1; myPrint(param); return 0; } myPrint(param); Legend return 0;

Stmt (4/9) Stmt may have a child containing additional information (cont’) The first child of CallExpr is for a function pointer and the others are for function parameters Legend Declarations for DeclStmt Function pointer for CallExpr Function parameter for CallExpr Return value for ReturnStmt

Stmt (5/9) Expr has a type of an expression Ex) a node of CallExpr has a type ‘void’ Some sub-classes of Expr can have a value Ex) a node of IntegerLiteral has a value ‘1’ Legend Values Types Types Value

Stmt (6/9) myPrint function contains IfStmt and ForStmt in its function body void myPrint(int param) { if (param == 1) printf("param is 1"); for (int i=0;i<10;i++) { global += i; } 6 7 8 9 10 11 12

Stmt (7/9) IfStmt has 4 children A condition variable in VarDecl In C++, you can declare a variable in condition (not in C) A condition in Expr Then block in Stmt Else block in Stmt Condition 6 7 8 9 10 11 12 void myPrint(int param) { if (param == 1) printf("param is 1"); for (int i = 0 ; i < 10 ; i++ ) { global += i; } Then block Else block

Stmt (8/9) ForStmt has 5 children Initialization in Stmt Condition variable ForStmt has 5 children Initialization in Stmt A condition variable in VarDecl A condition in Expr Increment in Expr A loop block in Stmt Condition Increment 6 7 8 9 10 11 12 void myPrint(int param) { if (param == 1) printf("param is 1"); for (int i = 0 ; i < 10 ; i++ ) { global += i; } Loop block

Stmt (9/9) BinaryOperator has 2 children for operands UnaryOperator has a child for operand Two operands for BinaryOperator 6 7 8 9 10 11 12 void myPrint(int param) { if (param == 1) printf("param is 1"); for (int i = 0 ; i < 10 ; i++ ) { global += i; } A operand for UnaryOperator

Traversing Clang AST (1/3) Clang provides a visitor design pattern for user to access AST ParseAST() starts building and traversal of an AST: void clang::ParseAST (Preprocessor &pp, ASTConsumer *C, ASTContext &Ctx, …) The callback function HandleTopLevelDecl() in ASTConsumer is called for each top-level declaration HandleTopLevelDecl() receives a list of function and global variable declarations as a parameter A user has to customize ASTConsumer to build his/her own program analyzer class MyASTConsumer : public ASTConsumer { public: MyASTConsumer(Rewriter &R) {} virtual bool HandleTopLevelDecl(DeclGroupRef DR) { for(DeclGroupRef::iterator b=DR.begin(), e=DR.end(); b!=e;++b){ … // variable b has each decleration in DR } return true; }; 1 2 3 4 5 6 7 8 9 10 11 12

Traversing Clang AST (2/3) HandleTopLevelDecl() calls TraverseDecl() which recursively travel a target AST from the top-level declaration by calling VisitStmt (), VisitFunctionDecl(), etc. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class MyASTVisitor : public RecursiveASTVisitor<MyASTVisitor> { bool VisitStmt(Stmt *s) { printf("\t%s \n", s->getStmtClassName() ); return true; } bool VisitFunctionDecl(FunctionDecl *f) { if (f->hasBody()) { Stmt *FuncBody = f->getBody(); printf("%s\n", f->getName()); }; class MyASTConsumer : public ASTConsumer { virtual bool HandleTopLevelDecl(DeclGroupRef DR) { for (DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) { MyASTVisitor Visitor; Visitor.TraverseDecl(*b); … VisitStmt is called when Stmt is encountered VisitFunctionDecl is called when FunctionDecl is encountered

Traversing Clang AST (3/3) VisitStmt() in RecursiveASTVisitor is called for every Stmt object in the AST RecursiveASTVisitor visits each Stmt in a depth-first search order If the return value of VisitStmt is false, recursive traversal halts Example: main function of the previous example RecursiveASTVisitor will visit all nodes in this box (the numbers are the order of traversal) 1 2 3 4 5 6 7 8 9 10 11