Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Software Group ® CDT DOM Proposal Title slide Doug Schaefer IBM Tech Lead, Eclipse CDT December 2004.

Similar presentations


Presentation on theme: "IBM Software Group ® CDT DOM Proposal Title slide Doug Schaefer IBM Tech Lead, Eclipse CDT December 2004."— Presentation transcript:

1 IBM Software Group ® CDT DOM Proposal Title slide Doug Schaefer IBM Tech Lead, Eclipse CDT December 2004

2 IBM Software Group | Rational software © 2004 IBM Corporation 2 Where are we (IBM) coming from?  We’re part of the Model-Driven Development (MDD) team in the Rational Software Division of the IBM Software Group  CDT is the core C++ component of Rational Software Architect  The C++ component includes visualization of code as UML as well as transformation of UML to C++  We need an accurate, complete DOM that allows programmatic code change  Since do both Java and C++ we’d like the architectures of the JDT and CDT to be “similar”.

3 IBM Software Group | Rational software © 2004 IBM Corporation 3 What have we done until now?  Started work on parser in Dec. 2002  Considered a number of options but settled on a handwritten parser  Needed smart control over ambiguities in C/C++  Needed to support a number of clients with different needs  Similar in approach to newer versions of gcc.  First contributed in 1.1 for CModel replacing previous JavaCC-based parser  Added Indexing and Search as client in 1.2  Replaced previous ctags-based index (not without issue)  Numerous clients in 2.0, including content assist, F3, type browsers

4 IBM Software Group | Rational software © 2004 IBM Corporation 4 Parsing Architecture 1.0  We had concern over scalability of AST  Settled on a callback-based architecture  Clients created only the data that they needed  Client would pass in a requestor to the parser and the parser would pass in parse info as the parse progressed  Despite the names, the IAST* parse info does not form a proper AST  Client would also be responsible for passing in ScannerInfo  Compiler command line arguments that affect parsing Include paths, macro defs  Needed to properly parse

5 IBM Software Group | Rational software © 2004 IBM Corporation 5 Parsing Architecture 1.0 – What worked well  Pretty accurate parse information (usually)  Content assist worked really well for C and C++  Accurate out line view  Accurate search results, F3  Ability to generate type info for class and hierarchy browsers  Flexibility for clients  Enabled us to pile on the clients in 2.0

6 IBM Software Group | Rational software © 2004 IBM Corporation 6 Parser Architecture 1.0 – What didn’t work well  Performance  Assumed parse times would be quick, < 1 sec Finding parse times usually in 2-4 second range  Index takes a lot of time, memory, and CPU power to generate  Content assist times out regularly  Hard to provide accurate results consistently  F3 and content assist often produce no results  Need for accurate ScannerInfo often hard to satisfy  Scalability to large projects  Problems only worsen when project grows  Index times longer than already long build times  ScannerInfo different for different files in project

7 IBM Software Group | Rational software © 2004 IBM Corporation 7 Why a DOM?  Although flexible, the callback mechanism was hard to define  Unable to provide all information that every possible client would require  Often unable to determine parser context for certain constructs  Clients had added complexity to manage their own data  Parser had added complexity to provide enough data Parse mode proliferation  Need to address scalability  Cut down on the amount of parsing we need to do  Can we reuse parse results?  Need data structures to help programmatically make changes to the source code

8 IBM Software Group | Rational software © 2004 IBM Corporation 8 DOM Architecture  The DOM is composed of the parser/scanner and three levels of data  1) Physical AST with mapping back through macro expansions to source  2) Logical Scope/Binding tree with cross translation unit indexing  3) AST Rewriter  Firm up interfaces for DOM creation  Navigate from Core Model to DOM  Hide parser from clients (use core model)  Allows us to play with how the DOM is created to improve scalability

9 IBM Software Group | Rational software © 2004 IBM Corporation 9 Goals for 3.0  To reduce the fixed cost of a parse (preprocess, scan & syntax matching) and to allow for lazy semantic evaluation  Improve performance & reduce memory footprint of navigation features  To provide a “complete” physical AST which can make our clients aware of preprocessor macro expansions in source code  To provide better support for C  Link-time resolution cross references  Tailored implementation of parser/semantic bindings

10 IBM Software Group | Rational software © 2004 IBM Corporation 10 Physical AST - IASTNode  Given a file to parse, the AST Framework shall return an IASTTranslationUnit which then can be traversed or visited  The IASTNode interface hierarchy represent constructs in the C/C++ grammar. long x; /* IASTSimpleDeclaration with 1 IASTDeclarator */ long y(); /* IASTSimpleDeclaration with 1 IASTFunctionDeclarator */ int f() { return sizeof( int ); } /* IASTFunctionDefinition */  Physical tree is unaware of any semantic knowledge  Declaration before use (C++)  Scoping  Type compatibility  lValue vs. rValue  Allows for quick generation of syntax tree  Only slight overhead to cost of scan & preprocess

11 IBM Software Group | Rational software © 2004 IBM Corporation 11 Logical Tree - IBinding  Logical elements are higher-level constructs that map onto a physical IASTNode  For 3.0 : IASTName#resolveBinding() long x; /* IASTName for x resolves to an IVariable */ long y(); /* IASTName for y resolves to an IFunction */ int f() { return sizeof( int ); } /* IASTName for f resolves to an IFunction*/  Beyond 3.0 – IASTBinaryExpression could bind to an user defined operator  Semantic errors return an IProblemBinding describing the error  IASTTranslationUnit can be asked for all declarations or references of a particular IBinding  Bindings can be resolved completely lazily on request or in a full traversal of the AST  Indexer would require full binding resolution  Most other clients do not require all bindings to be resolved

12 IBM Software Group | Rational software © 2004 IBM Corporation 12 Macro Awareness in the Physical AST  Nearly all parser clients in the CDT are concerned with offsets  Selection  Search Markers  Navigation  Compare  Within a preprocessor macro expansion, our IScanner implementation massages the offsets as tokens arrive  However, the CDT 2.x AST Nodes are unaware as to whether or not they are a result of a macro expansion  This deficiency affects different clients differently.

13 IBM Software Group | Rational software © 2004 IBM Corporation 13 The Good – Outline View

14 IBM Software Group | Rational software © 2004 IBM Corporation 14 The Bad – Search Markers Slightly Off

15 IBM Software Group | Rational software © 2004 IBM Corporation 15 The Ugly – A Refactoring Gone Wrong

16 IBM Software Group | Rational software © 2004 IBM Corporation 16 Introducing IASTNodeLocation  Every node in the AST derives from IASTNode  Every IASTNode provides a mechanism for resolving its location in the physical AST  External Header Files  Resources  Working Copies  Macro Expansions  getNodeLocations() returns an array of IASTNodeLocation, as a node may span multiple locations (when macros are involved)  Interpretation of macro expansion locations are up to the client

17 IBM Software Group | Rational software © 2004 IBM Corporation 17 Better support for C  Support for link-time bindings  resolving function references in C Will greatly aid navigation/search features for this style of C code Refactoring/programmatic edit support will be difficult −Definite candidate for “Preview” pane in refactoring −Without full build-model to reference, resolution is heuristic-based  Stricter emphasis upon providing custom implementation for C & GCC  AST & supporting data structures will be more compact memory wise  Syntax parsing will be more accurate  Algorithms for semantic analysis in C are far less rigorous Should yield better performance for nearly all clients

18 IBM Software Group | Rational software © 2004 IBM Corporation 18 Parser Language Variants  Interfaces  org.eclipse.cdt.core.dom.ast – Base interfaces that apply to both ANSI C & ISO C++  org.eclipse.cdt.core.dom.ast.c – C99 specific sub-interfaces  org.eclipse.cdt.core.dom.ast.cpp – ISO C++ 98 specific sub-interfaces  org.eclipse.cdt.core.dom.ast.gnu – GNU extensions that apply to both C & C++  org.eclipse.cdt.core.dom.ast.gnu.c – GNU extensions that apply to C  org.eclipse.cdt.core.dom.ast.gnu.cpp – GNU extensions that apply to C++  Implementations  org.eclipse.cdt.internal.core.parser2 – supporting infrastructure  org.eclipse.cdt.internal.core.parser2.c – C99 & GCC support  org.eclipse.cdt.internal.core.parser2.cpp – ISO C++ 98 & G++ support  Other variants may subclass C or C++ Source Code Parser and choose which GNU extensions they wish to enable through a configuration interface

19 IBM Software Group | Rational software © 2004 IBM Corporation 19 Physical Tree Hierarchy  C++ extends the common AST  GNU C++ extends the C++  C++ and C are unrelated outside of the common AST package

20 IBM Software Group | Rational software © 2004 IBM Corporation 20 Logical Tree Hierarchy  C extends common  C++ extends common  Theoretically, new bindings could be added for future variants  e.g. GCC Signatures

21 IBM Software Group | Rational software © 2004 IBM Corporation 21 AST Rewriting  AST Rewriter accepts requests to change AST  Add (Insert/Set), Remove, Replace  The new code can be a new AST Node or a String  AST Rewrite gathers change requests and then executes  Analyzes the change requests for validity  Produces text edits that can be applied to documents to affect the change  The analysis can decide not to do a change because it is too hard  E.g. when macros are involved

22 IBM Software Group | Rational software © 2004 IBM Corporation 22 DOM as a Service  We need to take some effort to minimize parsing within Eclipse  C and C++ are mature languages : hence, larger source code bases Multiple clients parsing on resource-change events can cripple the system  A complex web of #include’s throughout the code base is difficult to optimize per-parse without having knowledge of previous parses Same with templates …  The Indexer is already parsing continually, we should be able to leverage that information for all other clients that require saved-file parse trees  Since parsing can be a processor and memory-intensive operation, it is difficult for the indexer to co-ordinate its priority vs. other parser clients for system resources users requests an Open Declaration which competes against a running indexer for memory & CPU  Eventual Goal AST Service w/Index Incremental Parse


Download ppt "IBM Software Group ® CDT DOM Proposal Title slide Doug Schaefer IBM Tech Lead, Eclipse CDT December 2004."

Similar presentations


Ads by Google