Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Repaso: Unidad 2 Lección 2
Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.
Variations of the Turing Machine
Angstrom Care 培苗社 Quadratic Equation II
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Myra Shields Training Manager Introduction to OvidSP.
Objectives: Generate and describe sequences. Vocabulary:
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
We need a common denominator to add these fractions.
Microsoft Access 2007 Advanced Level. © Cheltenham Courseware Pty. Ltd. Slide No 2 Forms Customisation.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Custom Services and Training Provider Details Chapter 4.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt RhymesMapsMathInsects.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
1. PHOTO INDEX Bayside: Page 5-7 Other Colour Leon: Page 8-10 Cabrera Page Canaria Page Driftwood Page 16 Florence Florence and Corfu Page.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
TESOL International Convention Presentation- ESL Instruction: Developing Your Skills to Become a Master Conductor by Beth Clifton Crumpler by.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 10 Routing Fundamentals and Subnets.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Before Between After.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Chapter 13 Web Page Design Studio
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf, Michal Antkiewicz, and Krzysztof Czarnecki Generative Software Technologies Corp. Waterloo, Canada 1 © Generative Software Technologies Corp.

The Idea 2 © Generative Software Technologies Corp.

Specification Documents Tex t text Tex t text Section Table Paragraph Physical structures 3 Functional Reqs Business Rules Use Case Logical structures (specification elements) © Generative Software Technologies Corp.

Recognize and extract specification elements based on physical document structure 4 © Generative Software Technologies Corp.

ET – Extraction Tool searches for template instances Spec Doc text text text text text text text text text text text text text text text text text text text text Text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Text textTexttext UC Template 5 UC 1 UC 2 © Generative Software Technologies Corp.

ET – Extraction Tool searches for template instances Spec Doc text text text text text text text text text text text text text text text text text text text text Text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Text textTexttext UC Template UC 1 6 © Generative Software Technologies Corp.

ET – Extraction Tool searches for template instances Spec Doc text text text text text text text text text text text text text text text text text text text text Text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Text textTexttext UC Template UC 1 7 © Generative Software Technologies Corp.

ET – Extraction Tool searches for template instances Spec Doc text text text text text text text text text text text text text text text text text text text text Text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Text textTexttext UC Template 8 UC 1 UC 2 © Generative Software Technologies Corp.

Precondition: Documents have been authored with some template in mind 9 © Generative Software Technologies Corp.

Application scenarios 10 © Generative Software Technologies Corp.

Import to Requirements Mgmt Tools Spec Doc Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Tex t text Text text Doors HP Quality Center Requisite Pro … 11 Functional Reqs Business Rules Use Case Functional Reqs Business Rules Use Case ET © Generative Software Technologies Corp.

Spec Doc Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text QT Spec Doc Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Structured Query Tex t text Text text All use cases with actor = customer 12 Use Case Spec Doc Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Functional Reqs Use Case Business Rules © Generative Software Technologies Corp.

Spec Doc text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Tex t text Text text Spec Doc Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Tracing 13 Business Rules Use Case © Generative Software Technologies Corp.

Spec Doc text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Heading text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text Tex t text Text text Template Conformance Checking 14 Use Case © Generative Software Technologies Corp.

Main Challenge: Logical and Physical Variation 15 © Generative Software Technologies Corp.

Challenge – Variation Instances of Use Case 16 © Generative Software Technologies Corp.

Challenge – Variation Instances of Use CaseLogical components Component Identifiers 17 © Generative Software Technologies Corp.

Challenge – Variation Instances of Use CaseLogical components Component Identifiers 18 © Generative Software Technologies Corp.

Variation Types 19 DesignedAccidental Logical Physical © Generative Software Technologies Corp.

Designed Logical Variation 20 Optional component © Generative Software Technologies Corp.

Designed Logical Alternatives 21 Deeper decomposition Different methodologies lead to logical variation © Generative Software Technologies Corp.

Designed Physical Variation 22 Different formatting © Generative Software Technologies Corp.

Accidental Variation Logical Missing components, e.g., actor Physical Spelling mistakes, e.g., Actar Style inconsistency, e.g., italics instead of bold 23 © Generative Software Technologies Corp.

Solution 24 © Generative Software Technologies Corp.

ET – Extraction Tool 25 Docs PSE Physical components Sections, lists, table cells LSE UC Template Logical components Actor, flow, extensions Accidental variation via match threshold Accidental variation via match threshold Designed variation via template Designed variation via template © Generative Software Technologies Corp.

26 UC Template Metamodel UC Name : String Flow Action : String * 1 1 Section Heading List Paragraph Mapping © Generative Software Technologies Corp.

Example Template 27 © Generative Software Technologies Corp.

Logical Structure 28 © Generative Software Technologies Corp.

Mapping 29 © Generative Software Technologies Corp.

Regular Expressions 30 © Generative Software Technologies Corp.

Lists 31 © Generative Software Technologies Corp.

Component Nesting 32 © Generative Software Technologies Corp.

Optional Components 33 © Generative Software Technologies Corp.

Physical Alternatives 34 © Generative Software Technologies Corp.

Templates with Tables 35 © Generative Software Technologies Corp.

Logical Alternatives 36 © Generative Software Technologies Corp.

ET – Extraction Tool 37 Docs PSE Physical components Basic: Paragraph, cell, graphic Composite: Sections, lists, tables, … LSE UC Template Logical components Actor, flow, extensions © Generative Software Technologies Corp.

Physical Structure Extraction 38 Docs PSE Physical components Basic: Paragraph, cell, graphic Composite: Sections, lists, tables, … LSE UC Template Logical components Actor, flow, extensions Only part dependent on document- format © Generative Software Technologies Corp.

Performance 39 © Generative Software Technologies Corp.

Can we extract logical structures from real- world documents? 40 © Generative Software Technologies Corp.

Document Set 43 documents 24 from 3 companies 11 from public sources 6 student projects 2,000 to 23,000 words Content Use Cases Data Objects Business Rules Functional Reqs Non-Functional Reqs … 41 DocsDocs © Generative Software Technologies Corp.

ET 2) Verify extraction Template Development 42 UC1 UC Template 1) Write template manually UC2 ?? 3) Refine template © Generative Software Technologies Corp.

Results 36 logical structures Use cases, data objects, business rules, … Template sizes from 3 to 52 LOC Total 942 instances Nearly all instances perfectly recognized 100% recall for 33 templates; over 80% for remaining 3 100% precision for 35 templates; 87% for remaining 1 Error causes Severe formatting problems, e.g., manual line breaks Forgotten ids 43 © Generative Software Technologies Corp.

Other Questions Amount & kind of template change in refinement 1% – 25% LOC affected during refinement 81% changes concern optionality (add ? or component) Amount of iterations 1 instance (11 cases) to 50% of all instances (6 cases) e.g., 10 out of 20 (2 cases); mostly simple edits, add `? Implication Start with few examples, then edit the template based on expert knowledge (e.g., add `?) 44 © Generative Software Technologies Corp.

Related Work Import to Req Mgmt Tools Tools prescribe document structure Manual markup for fine-grained extraction Wrapper induction Machine generated docs (web pages) Induced Regex not human readable (no modeling language) Natural language processing Can benefit from structure- induced semantic tags 45 © Generative Software Technologies Corp.

Future: Template by Example 46 UC1 UC Template UC2 3) Refine template 1) Mark up sample document UC Template TE 2) Extract template 3) Verify extraction ET © Generative Software Technologies Corp.

Summary 47 © Generative Software Technologies Corp.

ET – Design 48 Functional Reqs B. Rules Use Case B. Rules Use Case PSE Physical components Spec Doc UC Template LSE Logical components Spec Doc Use Case QT Query Functional Reqs B. Rules Use Case ET Import Tracing Conformance Application scenarios Template development Evaluation results Nearly all instances perfectly recognized 43 real-world documents © Generative Software Technologies Corp.

Technology available at © Generative Software Technologies Corp.