Initialization of Clang

Slides:



Advertisements
Similar presentations
CPSC 441 TUTORIAL – JANUARY 16, 2012 TA: MARYAM ELAHI INTRODUCTION TO C.
Advertisements

IT 325 OPERATING SYSTEM C programming language. Why use C instead of Java Intermediate-level language:  Low-level features like bit operations  High-level.
Christo Wilson Project 2: User Programs in Pintos
Templated Functions. Overloading vs Templating  Overloaded functions allow multiple functions with the same name.
C++ Basics Variables, Identifiers, Assignments, Input/Output.
 2005 Pearson Education, Inc. All rights reserved Introduction.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
Strings and Dynamic Memory Allocation CS-2301, B-Term Programming Assignment #6 Strings and Dynamic Memory Allocation CS-2301, System Programming.
Primitive Variable types Basic types –char (single character or ASCII integer value) –int (integer) –short (not longer than int) –long (longer than int)
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process Dale/Weems/Headington.
Basic Elements of C++ Chapter 2.
Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.
CS 31 Discussion, Week 6 Faisal Alquaddoomi, Office Hours: BH 2432, W 4:30-6:30pm, F 12:30-1:30pm.
CS140: Intro to CS An Overview of Programming in C by Erin Chambers.
CS 192 Lecture 3 Winter 2003 December 5, 2003 Dr. Shafay Shamail.
COMPUTER PROGRAMMING. Data Types “Hello world” program Does it do a useful work? Writing several lines of code. Compiling the program. Executing the program.
 200 Total Points ◦ 74 Points Writing Programs ◦ 60 Points Tracing Algorithms and determining results ◦ 36 Points Short Answer ◦ 30 Points Multiple Choice.
Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.
CS 11 java track: lecture 1 Administrivia need a CS cluster account cgi-bin/sysadmin/account_request.cgi need to know UNIX
1 Programs Composed of Several Functions Syntax Templates Legal C++ Identifiers Assigning Values to Variables Declaring Named Constants String Concatenation.
1 C++ Syntax and Semantics, and the Program Development Process.
Copyright  Hannu Laine C++-programming Part 1 Hannu Laine.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
Introduction to Programming
/21 Clang Tutorial CS453 Automated Software Testing.
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
Lecture 3 Classes, Structs, Enums Passing by reference and value Arrays.
CS-1030 Dr. Mark L. Hornick 1 Java Review Interactive.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 8: User-Defined Simple Data Types, Namespaces, and the string Type.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process.
CSE 332: C++ IO We’ve Looked at Basic Input and Output Already How to move data into and out of a program –Using argc and argv to pass command line args.
COP INTERMEDIATE JAVA Designing Classes. Class Template or blueprint for creating objects. Their definition includes the list of properties (fields)
A brief introduction to javadoc and doxygen. What’s in a program file? 1. Comments 2. Code.
Classes, Interfaces and Packages
Prof. amr Goneid, AUC1 CSCE 110 PROGRAMMING FUNDAMENTALS WITH C++ Prof. Amr Goneid AUC Part 15. Dictionaries (1): A Key Table Class.
CS426Fall 2010/Lecture 141 Computer Security CS 426 Lecture 14 Software Vulnerabilities: Format String and Integer Overflow Vulnerabilities.
C++ Namespaces, Exceptions CSci 588: Data Structures, Algorithms and Software Design All material not from online sources copyright © Travis Desell, 2011.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
C Programming Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/LA07/003 Version No. 1.0 Union –mechanism to create user defined data types.
C++ Lesson 1.
Code Analysis using Compiler Front-ends (Clang)
Chapter 1.2 Introduction to C++ Programming
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
Chapter Topics The Basics of a C++ Program Data Types
Chapter 1.2 Introduction to C++ Programming
User-Written Functions
Chapter 1.2 Introduction to C++ Programming
Chapter 1.2 Introduction to C++ Programming
Programming Fundamentals
Basic Elements of C++.
Debugging with gdb gdb is the GNU debugger on our CS machines.
LLVM Pass and Code Instrumentation
Basic Elements of C++ Chapter 2.
Introduction to javadoc
C Stuff CS 2308.
CS 240 – Lecture 18 Command-line Arguments, Typedef, Union, Bit Fields, Pointers to Functions.
How to build a program analysis tool using Clang/LLVM 4.0.1
Moonzoo Kim School of Computing KAIST
Variables, Identifiers, Assignments, Input/Output
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
Moonzoo Kim School of Computing KAIST
Introduction to javadoc
EE 312 Exam I Review.
2. Second Step for Learning C++ Programming • Data Type • Char • Float
How to build a program analysis tool using Clang/LLVM 4.0.1
Class.
C++ Programming Basics
EE 312 Exam I Review.
EE 312 Exam I Review.
Presentation transcript:

How to build a program analysis tool using Clang Clang Tutorial, CS453 Automated Software Testing How to build a program analysis tool using Clang Initialization of Clang Useful functions to print AST Line number information of Stmt Code modification using Rewriter Converting Stmt into String Obtaining SourceLocation

Initialization of Clang Clang Tutorial, CS453 Automated Software Testing Initialization of Clang Initialization of Clang is complicated To use Clang, many classes should be created and many functions should be called to initialize Clang environment Ex) ComplierInstance, TargetOptions, FileManager, etc. It is recommended to use the initialization part of the sample source code from the course homepage as is, and implement your own ASTConsumer and RecursiveASTVisitor classes

Useful functions to print AST Clang Tutorial, CS453 Automated Software Testing Useful functions to print AST dump() and dumpColor() in Stmt and FunctionDecl to print AST dump() shows AST rooted at Stmt or FunctionDecl object dumpColor() is similar to dump() but shows AST with syntax highlight Example: dumpColor() of myPrint FunctionDecl 0x368a1e0 <line:6:1> myPrint 'void (int)' |-ParmVarDecl 0x368a120 <line:3:14, col:18> param 'int' `-CompoundStmt 0x36a1828 <col:25, line:6:1> `-IfStmt 0x36a17f8 <line:4:3, line:5:24> |-<<<NULL>>> |-BinaryOperator 0x368a2e8 <line:4:7, col:16> 'int' '==' | |-ImplicitCastExpr 0x368a2d0 <col:7> 'int' <LValueToRValue> | | `-DeclRefExpr 0x368a288 <col:7> 'int' lvalue ParmVar 0x368a120 'param' 'int' | `-IntegerLiteral 0x368a2b0 <col:16> 'int' 1 |-CallExpr 0x368a4e0 <line:5:5, col:24> 'int' | |-ImplicitCastExpr 0x368a4c8 <col:5> 'int (*)()' <FunctionToPointerDecay> | | `-DeclRefExpr 0x368a400 <col:5> 'int ()' Function 0x368a360 'printf' 'int ()' | `-ImplicitCastExpr 0x36a17e0 <col:12> 'char *' <ArrayToPointerDecay> | `-StringLiteral 0x368a468 <col:12> 'char [11]' lvalue "param is 1" `-<<<NULL>>>

Line number information of Stmt Clang Tutorial, CS453 Automated Software Testing Line number information of Stmt A SourceLocation object from getLocStart() of Stmt has a line information SourceManager is used to get line and column information from SourceLocation In the initialization step, SourceManager object is created getExpansionLineNumber() and getExpansionColumnNumber() in SourceManager give line and column information, respectively bool VisitStmt(Stmt *s) { SourceLocation startLocation = s->getLocStart(); SourceManager &srcmgr=m_srcmgr;//you can get SourceManager from the initialization part unsigned int lineNum = srcmgr.getExpansionLineNumber(startLocation); unsigned int colNum = srcmgr.getExpansionColumnNumber(startLocation); … }

Code Modification using Rewriter Clang Tutorial, CS453 Automated Software Testing Code Modification using Rewriter You can modify code using Rewriter class Rewriter has functions to insert, remove and replace code InsertTextAfter(loc,str), InsertTextBefore(loc,str), RemoveText(loc,size), ReplaceText(…) , etc. where loc, str, size are a location (SourceLocation), a string, and a size of statement to remove, respectively Example: inserting a text before a condition in IfStmt using InsertTextAfter() 1 2 3 4 5 6 7 bool MyASTVisitor::VisitStmt(Stmt *s) { if (isa<IfStmt>(s)) { IfStmt *ifStmt = cast<IfStmt>(s); condition = ifStmt->getCond(); m_rewriter.InsertTextAfter(condition->getLocStart(), "/*start of cond*/"); } if( param == 1 ) if( /*start of cond*/param == 1 )

Clang Tutorial, CS453 Automated Software Testing Output of Rewriter Modified code is obtained from a RewriterBuffer of Rewriter through getRewriteBufferFor() Example code which writes modified code in output.txt ParseAST() modifies a target code as explained in the previous slides TheConsumer contains a Rewriter instance TheRewriter 1 2 3 4 5 6 7 8 int main(int argc, char *argv[]) { … ParseAST(TheCompInst.getPreprocessor(), &TheConsumer, TheCompInst.getASTContext()); const RewriteBuffer *RewriteBuf = TheRewriter.getRewriteBufferFor(SourceMgr.getMainFileID()); ofstream output(“output.txt”); output << string(RewriteBuf->begin(), RewriteBuf->end()); output.close(); }

Converting Stmt into String Clang Tutorial, CS453 Automated Software Testing Converting Stmt into String ConvertToString(stmt) of Rewriter returns a string corresponding to Stmt The returned string may not be exactly same to the original statement since ConvertToString() prints a string using the Clang pretty printer For example, ConvertToString() will insert a space between an operand and an operator ParstAST ConvertToString a<100 a < 100

SourceLocation To change code, you need to specify where to change Clang Tutorial, CS453 Automated Software Testing SourceLocation To change code, you need to specify where to change Rewriter class requires a SourceLocation class instance which contains location information You can get a SourceLocation instance by: getLocStart() and getLocEnd() of Stmt which return a start and an end locations of Stmt instance respectively findLocationAfterToken(loc, tok,… ) of Lexer which returns the location of the first token tok occurring right after loc Lexer tokenizes a target code SourceLocation.getLocWithOffset(offset,…) which returns location adjusted by the given offset

getLocStart() and getLocEnd() Clang Tutorial, CS453 Automated Software Testing getLocStart() and getLocEnd() getLocStart() returns the exact starting location of Stmt getLocEnd() returns the location of Stmt that corresponds to the last-1 th token’s ending location of Stmt To get correct end location, you need to use Lexer class in addition Example: getLocStart() and getLocEnd() results of IfStmt condition The last token of IfStmt condition getLocStart() points to if (param == 1) getLocEnd() points to the end of “==“ not “1”

findLocationAfterToken (1/2) Clang Tutorial, CS453 Automated Software Testing findLocationAfterToken (1/2) Static function findLocationAfterToken(loc,Tkind,…) of Lexer returns the ending location of the first token of Tkind type after loc Use findLocationAfterToken to get a correct end location of Stmt Example: finding a location of ‘)’ (tok::r_paren) using findLocationAfterToken() to find the end of if condition static SourceLocation findLocationAfterToken (SourceLocation loc, tok::TokenKind TKind, const SourceManager &SM, const LangOptions &LangOpts, bool SkipTrailingWhitespaceAndNewLine) 1 2 3 4 5 6 7 8 bool MyASTVisitor::VisitStmt(Stmt *s) { if (isa<IfStmt>(s)) { IfStmt *ifStmt = cast<IfStmt>(s); condition = ifStmt->getCond(); SourceLocation endOfCond = clang::Lexer::findLocationAfterToken(condition-> getLocEnd(), tok::r_paren, m_sourceManager, m_langOptions, false); // endOfCond points ‘)’ } if ( a + x > 3 ) ifStmt->getCond()->getLocEnd() findLocationAfterToken ( , tok::r_paran)

findLocationAfterToken (2/2) Clang Tutorial, CS453 Automated Software Testing findLocationAfterToken (2/2) You may find a location of other tokens by changing TKind parameter List of useful enums for HW #3 The fourth parameter LangOptions instance is obtained from getLangOpts() of CompilerInstance (see line 99 and line 106 of the appendix) You can find CompilerInstance instance in the initialization part of Clang Enum name Token character tok::semi ; tok::r_paren ) tok::question ? tok::r_brace }

References Clang, http://clang.llvm.org/ Clang Tutorial, CS453 Automated Software Testing References Clang, http://clang.llvm.org/ Clang API Documentation, http://clang.llvm.org/doxygen/ How to parse C programs with clang: A tutorial in 9 parts, http://amnoid.de/tmp/clangtut/tut.html

Appendix: Example Source Code (1/4) Clang Tutorial, CS453 Automated Software Testing Appendix: Example Source Code (1/4) This program prints the name of declared functions and the class name of each Stmt in function bodies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 PrintFunctions.c #include <cstdio> #include <string> #include <iostream> #include <sstream> #include <map> #include <utility> #include "clang/AST/ASTConsumer.h" #include "clang/AST/RecursiveASTVisitor.h" #include "clang/Basic/Diagnostic.h" #include "clang/Basic/FileManager.h" #include "clang/Basic/SourceManager.h" #include "clang/Basic/TargetOptions.h" #include "clang/Basic/TargetInfo.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/Lex/Preprocessor.h" #include "clang/Parse/ParseAST.h" #include "clang/Rewrite/Core/Rewriter.h" #include "clang/Rewrite/Frontend/Rewriters.h" #include "llvm/Support/Host.h" #include "llvm/Support/raw_ostream.h" using namespace clang; using namespace std; class MyASTVisitor : public RecursiveASTVisitor<MyASTVisitor> { public:

Appendix: Example Source Code (2/4) Clang Tutorial, CS453 Automated Software Testing Appendix: Example Source Code (2/4) 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 bool VisitStmt(Stmt *s) { // Print name of sub-class of s printf("\t%s \n", s->getStmtClassName() ); return true; } bool VisitFunctionDecl(FunctionDecl *f) { // Print function name printf("%s\n", f->getName()); }; class MyASTConsumer : public ASTConsumer { public: MyASTConsumer() : Visitor() //initialize MyASTVisitor {} virtual bool HandleTopLevelDecl(DeclGroupRef DR) { for (DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) { // Travel each function declaration using MyASTVisitor Visitor.TraverseDecl(*b); private: MyASTVisitor Visitor; int main(int argc, char *argv[])

Appendix: Example Source Code (3/4) Clang Tutorial, CS453 Automated Software Testing Appendix: Example Source Code (3/4) 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 if (argc != 2) { llvm::errs() << "Usage: PrintFunctions <filename>\n"; return 1; } // CompilerInstance will hold the instance of the Clang compiler for us, // managing the various objects needed to run the compiler. CompilerInstance TheCompInst; // Diagnostics manage problems and issues in compile TheCompInst.createDiagnostics(NULL, false); // Set target platform options // Initialize target info with the default triple for our platform. TargetOptions *TO = new TargetOptions(); TO->Triple = llvm::sys::getDefaultTargetTriple(); TargetInfo *TI = TargetInfo::CreateTargetInfo(TheCompInst.getDiagnostics(), TO); TheCompInst.setTarget(TI); // FileManager supports for file system lookup, file system caching, and directory search management. TheCompInst.createFileManager(); FileManager &FileMgr = TheCompInst.getFileManager(); // SourceManager handles loading and caching of source files into memory. TheCompInst.createSourceManager(FileMgr); SourceManager &SourceMgr = TheCompInst.getSourceManager(); // Prreprocessor runs within a single source file TheCompInst.createPreprocessor(); // ASTContext holds long-lived AST nodes (such as types and decls) . TheCompInst.createASTContext(); // A Rewriter helps us manage the code rewriting task. Rewriter TheRewriter;

Appendix: Example Source Code (4/4) Clang Tutorial, CS453 Automated Software Testing Appendix: Example Source Code (4/4) 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 TheRewriter.setSourceMgr(SourceMgr, TheCompInst.getLangOpts()); // Set the main file handled by the source manager to the input file. const FileEntry *FileIn = FileMgr.getFile(argv[1]); SourceMgr.createMainFileID(FileIn); // Inform Diagnostics that processing of a source file is beginning. TheCompInst.getDiagnosticClient().BeginSourceFile(TheCompInst.getLangOpts(),&TheCompInst.getPreprocessor()); // Create an AST consumer instance which is going to get called by ParseAST. MyASTConsumer TheConsumer; // Parse the file to AST, registering our consumer as the AST consumer. ParseAST(TheCompInst.getPreprocessor(), &TheConsumer, TheCompInst.getASTContext()); return 0; }