Secrets from the Monster: Extracting Mozilla’s Software Architecture

Slides:



Advertisements
Similar presentations
Universität Koblenz-Landau Institut für Softwaretechnik GUPRO WOSEF 2000Limerick, (1) Components of Interchange Formats (Metaschemas and Typed.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
Eva van Emden, CWI Reverse Engineering Java Using ASF+SDF and Rigi A preliminary experience report.
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
WPSM Programming Language A simple language that transform simple data structure into complex xML format Wai Y. Wong Peter Chen Seema Gupta Miqdad Mohammed.
Whole Platform Tesi di Dottorato di: RICCARDO SOLMI Università degli Studi di Bologna Facoltà di scienze matematiche, fisiche e naturali Corso di Dottorato.
Presenter : g9102 黃培智 Outline  Motivation and Background  GXL Overview  Exchange Graphs  Exchange Graph Schemas  Conclusion.
Automatic Data Ramon Lawrence University of Manitoba
Software Architecture Group University of Waterloo CANADA Architecture Recovery Of Web Applications.
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
1 The Problem Do you have: A legacy ABL system with millions of Lines of ABL Code? Years and years of modifications to your ABL code? System documentation.
DEV-07: Increasing Productivity with Tools for Business Logic Gikas Principal Software Engineer.
Topic S Program Analysis and Transformation SEG 4110: Advanced Software Design and Reengineering.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic.
SaveUML System design. System overview Possible...
Defining, Transforming, and Exchanging High-Level Schemas A guided journey through the outback Presented by Michael W. Godfrey Software Architecture Group.
1 INTEROP WP1: Knowledge Map Michaël Petit (U. of Namur) January 19 th 2004 Updated description of tasks after INTEROP Kickoff Meeting, Bordeaux.
SEMINAR WEI GUO. Software Visualization in the Large.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
Evolution in Open Source Software: A Case Study Michael W. Godfrey Qiang Tu [paper in ICSM 2000] Software Architecture Group University of Waterloo.
Computer Systems & Architecture Lesson 4 8. Reconstructing Software Architectures.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
CSER / CASCON 1999 Exchange formats: Some problems, a few results, and a cool name Michael Godfrey Ivan Bowman and others … University of Waterloo.
University of Waterloo Four “interesting” ways in which history can teach us about software Michael W. Godfrey * Xinyi Dong Cory Kapser Lijie Zou Software.
© SERG Reverse Engineering (REportal) REportal: Reverse Engineering Portal (reportal.cs.drexel.edu)
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Connecting Architecture Reconstruction Frameworks Ivan Bowman, Michael Godfrey, Ric Holt Software Architecture Group University of Waterloo CoSET ‘99 May.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
The Dagstuhl Middle Model: An Overview Timothy C. Lethbridge SITE, University. of Ottawa
University of Waterloo Exploring Structural Change and Architectural Evolution Qiang Tu and Michael Godfrey Software Architecture Group (SWAG) University.
Acacia and CppETS Michael W. Godfrey Software Architecture Group (SWAG) University of Waterloo.
Reverse Engineered Architecture of the Linux Kernel Kristof De Vos.
A S P. Outline  The introduction of ASP  Why we choose ASP  How ASP works  Basic syntax rule of ASP  ASP’S object model  Limitations of ASP  Summary.
Building Your ETL Framework with Biml Meagan Longoria March 19, 2016.
Defects of UML Yang Yichuan. For the Presentation Something you know Instead of lots of new stuff. Cases Instead of Concepts. Methodology instead of the.
© 2005 KPIT Cummins Infosystems Limited We value our relationship XML Publisher Prafulla Kauthalkar RJTSB – Oracle Apps Consultant We value our relationship.
COMP 2100 From Python to Java
CMIT100 Chapter 14 - Programming.
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Sharing is fun! Thoughts on open access research
Chapter 2 Database Environment.
CSCI-235 Micro-Computer Applications
Object-Oriented Database Management System (ODBMS)
Chapter 1 Preliminaries.
The Client/Server Database Environment
CAE-SCRUB for Incorporating Static Analysis into Peer Reviews
Phil Bernstein Microsoft Corp.
Modelling and Extracting the Build-Time Architectural View
Chapter 2 Database Environment.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Using JDeveloper.
Conceptual Architecture of PostgreSQL
Conceptual Architecture of PostgreSQL
CSE 491/891 Lecture 21 (Pig).
Team Acacia Michael W. Godfrey Assistant Professor
Database System Concepts and Architecture
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

Secrets from the Monster: Extracting Mozilla’s Software Architecture Michael W. Godfrey Eric H. S. Lee Software Architecture Group Dept of Comp Sci, Univ of Waterloo

Background Reverse engineering tools can aid in recovering from “architectural drift” RE tool: Fact extractor, manipulator, visualizer Examples: PBS, Acacia, Rigi, TKSee, SHriMP, … Fact extractors vary in quality, detail, robustness, languages supported, … Extractor interoperability has proven to be a huge headache RE subtools often tightly coupled Architectural drift is distance between conceptual (as-designed) and concrete (as-built) architectures. It gets worse over time, unless architecture is actively maintained.

Architectural Reconstruction Source Code Executing System Control System Artifacts Scanning Parsing Profiling Change Reporting Extractors Extracted Facts Repository View Generation Visualization Manipulation Architecture

Motivation Want to create architecture models of C++ systems, esp. Mozilla Options: DIY … or … Gen++, Datrix, Acacia Is there a “better” C extractor? How do extractors compare quali/quantitatively? Want to investigate data exchange between RE tools WoSEF to be held tomorrow (Later) want to build BEAGLE: a tool for exploring program evolution

Extractor interoperability … just like western civilization Researchers want this to work, tho Need to agree on Syntax (TA, XML, SQL) Semantic models (AST, CFG/DFG, SwArch) CoSET-99 paper: TAXFORM suggested Exploration of problems: unique naming, entity resolution, entity location (line numbers) Preliminary case studies

Exchange Format Reqs [CASCON ’98] Support multiple source languages Scale to MLOC systems Provide mapping to source code Support static & dynamic dependencies Incremental approach Extensible, allowing new schemes to be defined as needed

TAXForm Utopia

Transforming Between Schemas Universal High-Level Procedural PL/I C Object-Oriented C++ Java Dali C Rigi C PBS C

TAXform — High level schema

TAXform — Procedural schema

“Facts”: PBS vs. Acacia PBS produces output in TA tuples describe attributes of program entities/relationships: funcdcl read.h fileClose funcdef read.c fileClose linkcall fileClose getFileSize Acacia produces two “;” delimited plain-text DBs: entity.db and relationship.db Use SQL-like queries to get raw text output: cdef -u func - def=dec cref -u - - m – file2=’*.h’

Translation Nuts and Bolts Acacia C model close to PBS’s 1:1 relationship between most kinds of facts; translation via awk and ksh scripts … but “linkcall” harder as acacia already does resolution of “f calls g” to the function defs; cfx does resolution at a later stage no transitive closure for “includes” Solution: simple grok program Ccia problems: less robust on some C systems generates multiple UIDs sometimes

Guinea Pig #1: VIM text editor Examined VIM version 5.6: 149 source files (.c, .h, .pro) over 160 KLOC of K&R C Extraction results: Differences due to macro expansion, lib. var. refs, and missed fcn calls Time (min:sec) # facts gcc compile 6:29 — cfx extraction 4:27 43,000 cia extraction 1:52 + 3:20 51,000

Vim’s architecture

Guinea Pig #2: Mozilla browser “Open source” cousin of Netscape Examined Milestone 9 (M9): Over 7400 files, 2 MLOC of C and C++ Extraction results: Full compile 0:35 hrs Fact extraction (Ccia) 3:30 hrs Fact manipulation (grok) 3:00 hrs # of facts extracted 990,000

Mozilla extraction details Much extra work required: Reconfigured PBS to understand OOPL schema Complete rewrite of translation scripts (into perl) for efficiency Some source code tweaking More complex name mangling needed

Mozilla’s architecture

Summary Created automated mechanisms for using the Acacia fact extractors within the PBS rev. eng. system Tested on two large guinea pigs This work serves as an initial step towards data exchange between reverse engineering tools. See proc. of WoSEF-00 for more discussion of this general topic.