How to build a program analysis tool using Clang/LLVM 4.0.1

Slides:



Advertisements
Similar presentations
Factorial Preparatory Exercise #include using namespace std; double fact(double); int fact(int); int main(void) { const int n=20; ofstream my_file("results.txt");
Advertisements

1 Lecture 16:User-Definded function I Introduction to Computer Science Spring 2006.
Templated Functions. Overloading vs Templating  Overloaded functions allow multiple functions with the same name.
CS 1620 File I/O. So far this semester all input has been from keyboard all output has been to computer screen these are just two examples of where to.
Prof. amr Goneid, AUC1 CSCE 110 PROGRAMMING FUNDAMENTALS WITH C++ Prof. Amr Goneid AUC Part 5. Functions.
1 11/3/08CS150 Introduction to Computer Science 1 Reading from and Writing to Files Section 3.12 & 13.1 & 13.5.
Primitive Variable types Basic types –char (single character or ASCII integer value) –int (integer) –short (not longer than int) –long (longer than int)
1 10/29/07CS150 Introduction to Computer Science 1 Reading from and Writing to Files Section 3.12 & 13.1 & 13.5.
Basic Elements of C++ Chapter 2.
C++ Programming Language Day 1. What this course covers Day 1 – Structure of C++ program – Basic data types – Standard input, output streams – Selection.
Ping Zhang 10/08/2010.  You can get data from the user (input) and display information to the user (output).  However, you must include the library.
1 Operator Overloading in C++ Copyright Kip Irvine, All rights reserved. Only students enrolled in COP 4338 at Florida International University may.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Initialization of Clang
C for Java Programmers Tomasz Müldner Copyright:  Addison-Wesley Publishing Company, 2000 Introduction to C Muldner, Chapters 1, 2.
CIS 3301 C# Lesson 5 Methods. CIS 3302 Objectives Understand the structure of a method. Know the difference between static and instance methods. Learn.
CS 11 java track: lecture 1 Administrivia need a CS cluster account cgi-bin/sysadmin/account_request.cgi need to know UNIX
Chapter 6: User-Defined Functions
C ++ Programming Languages Omid Jafarinezhad Lecturer: Omid Jafarinezhad Fall 2013 Lecture 2 Department of Computer Engineering 1.
C#.Net Development Version 1.0. Overview Nullable Datatype Description ? HasValue Lifted Conversions null coalescing operator ?? Partial Classes Copyright.
File IO and command line input CSE 2451 Rong Shi.
Chapter 8 Scope of variables Name reuse. Scope The region of program code where it is legal to reference (use) a variable The scope of a variable depends.
C ++ Basics by Bindra Shrestha sce.uhcl.edu/shresthab CSCI 3333 Data Structures.
1 + 1 becomes 11 what does our software promise?.
/21 Clang Tutorial CS453 Automated Software Testing.
Creational Pattern: Factory Method At times, a framework is needed to standardize the behavior of objects that are used in a range of applications, while.
File I/O in C++ II. Open() function Open() is a member function in each classes ( fstream, ifstream, ofstream) Void fstream :: open ( const char *filename,
CSC1201: Programming Language 2 Lecture 1 Level 2 Course Nouf Aljaffan (C) CSC 1201 Course at KSU1.
COP INTERMEDIATE JAVA Designing Classes. Class Template or blueprint for creating objects. Their definition includes the list of properties (fields)
A brief introduction to javadoc and doxygen. What’s in a program file? 1. Comments 2. Code.
Exceptions in C++. Exceptions  Exceptions provide a way to handle the errors generated by our programs by transferring control to functions called handlers.
Functions Math library functions Function definition Function invocation Argument passing Scope of an variable Programming 1 DCT 1033.
Introduction to Objective-C Spring Goals An introduction to Objective-C As implemented by the Apple LLVM Compiler 4.0 (a.k.a. Clang) Only the basics…
EIToolkit stub doc. main.cpp file int main(int argc, char* argv[]) { EIProperties::UseStandards(); // create the stub ExampleStub stub; stub.Start();
Singleton Pattern. Problem Want to ensure a single instance of a class, shared by all uses throughout a program Context Need to address initialization.
File I/O in C++. Using Input/Output Files A computer file  is stored on a secondary storage device (e.g., disk);  is permanent;  can be used to provide.
C++ Namespaces, Exceptions CSci 588: Data Structures, Algorithms and Software Design All material not from online sources copyright © Travis Desell, 2011.
File I/O in C++ I. Using Input/Output Files A computer file is stored on a secondary storage device (e.g., disk); is permanent; can be used to provide.
1 Huffman Codes Using Binary Files. 2 Getting Started Last class we extended a program to create a Huffman code and permit the user to encode and decode.
Intro Programming in C++ Computer Science Dept Va Tech August, 2001 © Barnette ND & McQuain WD 1 Pass-by-Value - default passing mechanism except.
CS212: Object Oriented Analysis and Design
C++ Lesson 1.
Code Analysis using Compiler Front-ends (Clang)
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
Review 1.
Chapter Topics The Basics of a C++ Program Data Types
Basic Elements of C++.
Command Line Arguments
Multi-dimensional Array
Friday, January 26, 2018 Announcements… For Today… For Next Time…
LLVM Pass and Code Instrumentation
Basic Elements of C++ Chapter 2.
Introduction to javadoc
Chapter 9 Scope, Lifetime, and More on Functions
3.3 Abstract Classes, Assignment, and Casting in a Hierarchy
C Stuff CS 2308.
CS 240 – Lecture 18 Command-line Arguments, Typedef, Union, Bit Fields, Pointers to Functions.
How to build a program analysis tool using Clang/LLVM 4.0.1
Programming with Data Files
Refactoring Support Tool: Cancer
Moonzoo Kim School of Computing KAIST
Moonzoo Kim School of Computing KAIST
Introduction to javadoc
CHAPTER 4 File Processing.
Class.
CMPE 152: Compiler Design March 26 – April 16 Labs
File I/O in C++ II.
An Introduction to STL.
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
File I/O in C++ I.
Presentation transcript:

How to build a program analysis tool using Clang/LLVM 4.0.1 Initialization of Clang Useful functions to print AST Line number information of Stmt Code modification using Rewriter Converting Stmt into String Obtaining SourceLocation

Initialization of Clang Initialization of Clang is complicated To use Clang, many classes should be created and many functions should be called to initialize Clang environment Ex) ComplierInstance, TargetOptions, FileManager, etc. It is recommended to use the initialization part of the sample source code from the course homepage as is, and implement your own ASTConsumer and RecursiveASTVisitor classes

Useful functions to print AST dump() and dumpColor() in Stmt and FunctionDecl to print AST dump() shows AST rooted at Stmt or FunctionDecl object dumpColor() is similar to dump() but shows AST with syntax highlight Example: dumpColor() of myPrint FunctionDecl 0x368a1e0 <line:6:1> myPrint 'void (int)' |-ParmVarDecl 0x368a120 <line:3:14, col:18> param 'int' `-CompoundStmt 0x36a1828 <col:25, line:6:1> `-IfStmt 0x36a17f8 <line:4:3, line:5:24> |-<<<NULL>>> |-BinaryOperator 0x368a2e8 <line:4:7, col:16> 'int' '==' | |-ImplicitCastExpr 0x368a2d0 <col:7> 'int' <LValueToRValue> | | `-DeclRefExpr 0x368a288 <col:7> 'int' lvalue ParmVar 0x368a120 'param' 'int' | `-IntegerLiteral 0x368a2b0 <col:16> 'int' 1 |-CallExpr 0x368a4e0 <line:5:5, col:24> 'int' | |-ImplicitCastExpr 0x368a4c8 <col:5> 'int (*)()' <FunctionToPointerDecay> | | `-DeclRefExpr 0x368a400 <col:5> 'int ()' Function 0x368a360 'printf' 'int ()' | `-ImplicitCastExpr 0x36a17e0 <col:12> 'char *' <ArrayToPointerDecay> | `-StringLiteral 0x368a468 <col:12> 'char [11]' lvalue "param is 1" `-<<<NULL>>>

Line number information of Stmt A SourceLocation object from getLocStart() of Stmt has a line information SourceManager is used to get line and column information from SourceLocation In the initialization step, SourceManager object is created getExpansionLineNumber() and getExpansionColumnNumber() in SourceManager give line and column information, respectively bool VisitStmt(Stmt *s) { SourceLocation startLocation = s->getLocStart(); SourceManager &srcmgr=m_srcmgr;//you can get SourceManager from the initialization part unsigned int lineNum = srcmgr.getExpansionLineNumber(startLocation); unsigned int colNum = srcmgr.getExpansionColumnNumber(startLocation); … }

Code Modification using Rewriter You can modify code using Rewriter class Rewriter object can be created with SourceManager and LangOptions objects which can be obtained from CompilerInstance object CompilerInstance object provides Clang’s initialization configurations see the example in the last page Rewriter has functions to insert, remove and replace code InsertTextAfter(loc,str), InsertTextBefore(loc,str), RemoveText(loc,size), ReplaceText(…) , etc. where loc, str, size are a location (SourceLocation), a string, and a size of statement to remove, respectively Example: inserting a text before a condition in IfStmt using InsertTextAfter() 1 2 3 4 5 6 7 bool MyASTVisitor::VisitStmt(Stmt *s) { if (isa<IfStmt>(s)) { IfStmt *ifStmt = cast<IfStmt>(s); condition = ifStmt->getCond(); MyRewriter.InsertTextAfter(condition->getLocStart(), "/*start of cond*/"); } if( param == 1 ) if( /*start of cond*/param == 1 )

Output of Rewriter Modified code is obtained from a RewriterBuffer of Rewriter through getRewriteBufferFor() Example code which writes modified code in output.txt 1 2 3 4 5 6 7 8 int main(int argc, char *argv[]) { … const RewriteBuffer *RewriteBuf = MyRewriter.getRewriteBufferFor(SourceMgr.getMainFileID()); ofstream output("output.txt"); output << string(RewriteBuf->begin(), RewriteBuf->end()); output.close(); }

Converting Stmt into String printPretty(raw_ostream&, PrinterHelper*, PrintingPolicy&) writes a string corresponding to Stmt to raw_ostream Example code shows VisitStmt function which gets string from given Stmt Check https://stackoverflow.com/a/9639239 for additional information 1 2 3 4 5 6 7 8 9 10 11 bool VisitStmt(Stmt *s) { // MyASTVisitor should receive LangOptions from main as LangOpts clang::PrintingPolicy Policy(LangOpts); std::string str1; llvm::raw_string_ostream os(str1); s->printPretty(os, NULL, Policy); llvm::outs() << os.str() << "\n"; return true; }

SourceLocation To change code, you need to specify where to change Rewriter class requires a SourceLocation class instance which contains location information You can get a SourceLocation instance by: getLocStart() and getLocEnd() of Stmt which return a start and an end locations of Stmt instance respectively findLocationAfterToken(loc, tok,… ) of Lexer which returns the location of the first token tok occurring right after loc Lexer tokenizes a target code SourceLocation.getLocWithOffset(offset,…) which returns location adjusted by the given offset

getLocStart() and getLocEnd() getLocStart() returns the exact starting location of Stmt getLocEnd() returns the location of Stmt that corresponds to the last-1 th token’s ending location of Stmt To get correct end location, you need to use Lexer class in addition Example: getLocStart() and getLocEnd() results of IfStmt condition The last token of IfStmt condition getLocStart() points to if (param == 1) getLocEnd() points to the end of “==“ not “1”

SourceLocation getLocWithOffset( int offset ) // Ex. Logging Caller -> Callee function calls bool insertProbe(const CallExpr *ce, std::string calleeName) { // x = f(a,b); is modified as follows: // x = (printf(“%s,%s\n”,CallerFuncName,calleeName)? f(a,b):0); std::string probeFront = "(printf(\"%s,%s\\n\",\"“ +CallerFuncName + "\",\"" + calleeName + "\")?"; std::string probeBack= ": 0)"; MyRewriter.InsertTextAfter(ce->getLocStart(), probeFront); SourceLocation funEndLoc = ce->getLocEnd().getLocWithOffset(1); MyRewriter.InsertTextAfter(funEndLoc, probeBack); return true; }

References Clang, http://clang.llvm.org/ Clang API Documentation, http://clang.llvm.org/doxygen/ How to parse C programs with clang: A tutorial in 9 parts, http://amnoid.de/tmp/clangtut/tut.html

Appendix: Example Source Code (1/5) This program prints the name of declared functions, statements and the class name of each Stmt in function bodies 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 // PrintFunctions.cpp #include <cstdio> #include <string> #include <iostream> #include <sstream> #include <fstream> #include "clang/AST/AST.h" #include "clang/AST/ASTConsumer.h" #include "clang/AST/RecursiveASTVisitor.h" #include "clang/Frontend/ASTConsumers.h" #include "clang/Frontend/CompilerInstance.h" #include "clang/Frontend/FrontendActions.h" #include "clang/Rewrite/Core/Rewriter.h" #include "clang/Tooling/CommonOptionsParser.h" #include "clang/Tooling/Tooling.h" #include "llvm/Support/raw_ostream.h" using namespace clang; using namespace clang::driver; using namespace clang::tooling; using namespace std; static llvm::cl::OptionCategory MyOptionCategory("MyOptions"); static llvm::cl::opt<std::string> OutputFilename("o", llvm::cl::desc("Specify output filename that contains stmt:type"), llvm::cl::value_desc("output_filename"), llvm::cl::cat(MyOptionCategory)); LangOptions MyLangOpts; SourceManager *ptrMySourceMgr; Rewriter MyRewriter;

Appendix: Example Source Code (2/5) 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 class MyASTVisitor : public RecursiveASTVisitor<MyASTVisitor> { public: MyASTVisitor() {} bool VisitStmt(Stmt *s) { // Print a current statement and its type std::string str1; llvm::raw_string_ostream os(str1); s->printPretty(os, NULL, MyLangOpts); puts("-----------------"); printf("%s\n", os.str().c_str()); printf("TYPE:%s\n", s->getStmtClassName()); fflush(stdout); return true; } bool VisitFunctionDecl(FunctionDecl *f) { // Print function name llvm::outs() << "*********************************\n"; llvm::outs() << "*** FUNCTION NAME:" << f->getName() << '\n'; };

Appendix: Example Source Code (3/5) 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 class MyASTConsumer : public ASTConsumer { public: MyASTConsumer(): Visitor() {} //initialize MyASTVisitor virtual bool HandleTopLevelDecl(DeclGroupRef DR) { for (DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) { // Travel each function declaration using MyASTVisitor Visitor.TraverseDecl(*b); } return true; private: MyASTVisitor Visitor; };

Appendix: Example Source Code (4/5) 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 class MyFrontendAction : public ASTFrontendAction { public: MyFrontendAction() {} void EndSourceFileAction() override { // Fill out if necessary } std::unique_ptr<ASTConsumer> CreateASTConsumer( CompilerInstance &CI, StringRef file) override { MyLangOpts = CI.getLangOpts(); ptrMySourceMgr= &(CI.getSourceManager()); MyRewriter= Rewriter(*ptrMySourceMgr, MyLangOpts); return llvm::make_unique<MyASTConsumer>(); };

Appendix: Example Source Code (5/5) 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 int main(int argc, const char **argv) { CommonOptionsParser op(argc, argv, MyOptionCategory); ClangTool Tool(op.getCompilations(), op.getSourcePathList()); int rtn_flag; // ClangTool::run accepts a FrontendActionFactory, which is then used to // create new objects implementing the FrontendAction interface. Here we use // the helper newFrontendActionFactory to create a default factory that will // return a new MyFrontendAction object every time. // To further customize this, we could create our own factory class. // AST Parsing rtn_flag= Tool.run(newFrontendActionFactory<MyFrontendAction>().get()); /* // // Rewriter sample. Save changed target code into output.txt if any const RewriteBuffer *RewriteBuf = MyRewriter.getRewriteBufferFor ((*ptrMySourceMgr).getMainFileID()); ofstream out_file ("output.txt"); out_file << string(RewriteBuf->begin(), RewriteBuf->end()); out_file.close(); */ return rtn_flag; }

Appendix: Output on example.c ********************************* *** FUNCTION NAME:myPrint ----------------- { if (param == 1) printf("param is 1"); for (int i = 0; i < 10; i++) { global += i; } TYPE:CompoundStmt TYPE:IfStmt param == 1 TYPE:BinaryOperator param TYPE:ImplicitCastExpr TYPE:DeclRefExpr 1 TYPE:IntegerLiteral ********************************* *** FUNCTION NAME:main ----------------- { int param = 1; myPrint(param); return 0; } TYPE:CompoundStmt TYPE:DeclStmt 1 TYPE:IntegerLiteral myPrint(param) TYPE:CallExpr myPrint TYPE:ImplicitCastExpr TYPE:DeclRefExpr param ----------------- printf("param is 1") TYPE:CallExpr printf TYPE:ImplicitCastExpr TYPE:DeclRefExpr "param is 1" TYPE:StringLiteral for (int i = 0; i < 10; i++) { global += i; } TYPE:ForStmt ----------------- int i = 0; TYPE:DeclStmt TYPE:IntegerLiteral i < 10 TYPE:BinaryOperator i TYPE:ImplicitCastExpr TYPE:DeclRefExpr 10 i++ TYPE:UnaryOperator …