Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop.

Slides:



Advertisements
Similar presentations
Maurice Hendrix, A3H AH2008, 29/07/2008 A meta level for LAG Adaptation Language.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
The eXtensible Markup Language (XML) An Applied Tutorial Kevin Thomas.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Software: Systems and Application Software Chapter 4.
PulseHR Time and Attendance software development and coding web development, web hosting IT project management and consulting Str. Ghioceilor.
© Fraunhofer ISI Ulrich Schmoch, Nicole Schulze MATCHING OF AUTHORS AND INVENTORS A NEW APPROACH CONTRIBUTION TO THE ESF-APE-INV 2ND „NAME GAME“ WORKSHOP.
Presenter: Hsini Huang Co-authors: Li Tang and John P. Walsh Georgia institute of Technology ESF-APE-INV 2 nd “Name Game” workshop, Dec 9, 2010 Madrid,
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
SCIENTIFIC SOLUTIONS Thomson ResearchSoft Paul Torpey April 8, 2005.
New continent or Bermuda? The consolidation of e-resources in Hong Kong Shue Yan University Library. Mr. Joe Chow & Mr. Cyrus Fong 8th Annual HKIUG Meeting.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
DEiXTo.
PART A Emac Lisp   Emac Lisp is a programming language  Emacs Lisp is a dialect.
Powerful, modern desktops enable next generation applications Hardware acceleration brings real-time lighting, texturing and rendering Visual.
The APE‐INV Project: An Introduction Francesco Lissoni DIMI-Univ. of Brescia & KITES-Bocconi Univ., Milan APE-INV workshop “Disambiguation of inventors'
Introduction to.NET. Getting Started Isn’t.NET development expensive? If.NET was important then we would learn about it at University.NET is proprietary.
SAINT Toolkit for Applied Scientometrics Edwin Horlings August 2012.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
Tool name : Firebug A URL for more information about the tool, or where to buy or download it : Firebug is.
Visual Basic Advanced Programming.
Tutorial 11 Using and Writing Visual Basic for Applications Code
MSR Sense The Microsoft Research Networked Embedded Sensing Toolkit Stewart Tansley, PhD Adapted from: Feng Zhao.
Standard Grade Computing System Software & Operating Systems.
1 XML Data Management Course Outline and Organisation Werner Nutt.
Budget Module For Sage MIP Fund Accounting. Sage Requirements Fund Accounting 10.0 or higher Budget Module optional but required for multiple budget versions.
Extracting metadata for spatially- aware information retrieval on the internet Pual Clough Presented by Ali Khodaei CS 572.
Principles of Information Systems, Sixth Edition Software: Systems and Application Software Chapter 4.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
Scalable Game Development William Roberts Senior Game Engineer
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Epilogue The Future of Knowledge Management.
1 3. Computing System Fundamentals 3.1 Language Translators.
By: Michael K. Pa’ekukui Grand Canyon University TEC 539.
Copyright © 2013, SAS Institute Inc. All rights reserved. SAS GLOBAL FORUM: NEW & NOTEWORTHY MATT MALCZEWSKI – COMMUNITIES MANAGER.
Inventor Disambiguation Workshop EVALUATION OUTCOMES.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Open Source Evaluation - FileZilla Michael Nye ITEC 400 Assignment 14-1 Professor D’Andrea Franklin University April 10, 2008.
Kuliah 4 Pengantar Teknologi Informasi Oleh Coky Fauzi Alfi cokyfauzialfi.wordpress.com Software.
Computing System Fundamentals 3.1 Language Translators.
CS779 Term Project Steve Shoyer Section 5 December 9, 2006 Week 6.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
TimeTablePublisher History and Current Status Open Source Software Benefits and Potential Technical Support T3 Webinar May 14, 2008 Bibiana McHugh.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
SQL. Originally developed by IBM Standardized in 80’s by ANSI and ISO Language to access relational database and English-like non-procedural Predominant.
1 COLLEGE OF MANAGEMENT OF TECHNOLOGY A stroll along the academic use of IP statistics by Julio Raffo EPFL – MTEI – CEMI 1.
System Programming Basics Cha#2 H.M.Bilal. Operating Systems An operating system is the software on a computer that manages the way different programs.
Principles of Information Systems, Sixth Edition 1 Software: Systems and Application Software Chapter 4.
WHAT IS HARDWARE ? Computer hardware is the collection of physical elements that comprise a COMPUTER SYSTEM LIKE A MOUSE, MONITOR, KEYBOARD, SPEAKER MICROPHONE,
ASP.NET 2.0 Mohammed Abdelhadi Developer.NET Evangelist Microsoft Corporation.
How to Start SQL Server and SSDT BI in Local
1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
Smart Versioning: Get Relevant, Save Money
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
DELLSOFT Technologies Pvt. Ltd.
DarkWynter Global Integration Strategy
Introduction to Visual Basic 2008 Programming
The importance of being Connected
Software for scientific calculations
An-Najah National University Computer Engineering Department Software Graduation Project (66581) Supervised By: Dr. Luai M. Malhis Examiners.
Paul Piatek & Chris Gilliland COSC 316 Fall 2011
Computer Software CS 107 Lecture 2 September 1, :53 PM.
.NET and .NET Core Foot View of .NET Pan Wuming 2017.
Software.
An Introduction to Linux
0. Overview of 2-Day Academic .NET Workshop
RText CSSE2003 Tu8a Adam Setch Dale Bliss Steven Edge Milosh Kovacevic.
Introduction to ASP.NET Parts 1 & 2
What's New in eCognition 9
Presentation transcript:

Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December nd "NameGame" APE-INV workshop

Outline Background Objectives & Rationale Results User Friendly Software –Concept –Alpha test Further steps December nd "NameGame" APE-INV workshop

Background Automatic patent retrieval is becoming compulsory due to the size of data sets. Growing literature looking at this NameGame: –On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall, 2006; Thoma et al –On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc. ‏ Our ESF Project outcomes: –New matching best practices –APE-INV database December nd "NameGame" APE-INV workshop

Minimize False positive (=higher precision) ‏ Minimize False negative (=higher recall) ‏ Objectives of the NameGame December nd "NameGame" APE-INV workshop ? Maximizing True positives

Rationale behind: A three step game December nd "NameGame" APE-INV workshop

Examples on matching (EPFL) 6December nd "NameGame" APE-INV workshop

Examples on filtering (EPFL) 7December nd "NameGame" APE-INV workshop

What we learned so far? General –Matching algorithms are not perfect, but improve considerably the results. Cleaning step –Data origin changes substantially the data preparation process Matching step –There is a hierarchy pattern across algorithms, although specific to each particular case Filtering step –Supplementary data availability enhances or constraints the disambiguation process December nd "NameGame" APE-INV workshop 8

Why to create a user friendly software? December nd "NameGame" APE-INV workshop PATSTAT / APE-INV Database PATSTAT / APE-INV Database SurveyPATVAL EU FW Program SCOPUS ISI Thomson

Concept behind Mr. JOTL Intuitive for beginner users Flexible on inputs and its preparation Fair variety of standard matching processes Adaptable on the disambiguation filters But soundly customizable for advanced users Conceived and coded to be expanded in the future by multiple developers December nd "NameGame" APE-INV workshop 10

From concept to real (ok for the moment just an alpha!) December nd "NameGame" APE-INV workshop

Inputs IPTS, Sevilla May

13IPTS, Sevilla May Parsing

Matching IPTS, Sevilla May

Disambiguation IPTS, Sevilla May SSM

LET’S TEST IT! December nd "NameGame" APE-INV workshop 16

Technical notes OS supported (so far): –Windows XP, Vista, Seven (Server & x64) Coded in C sharp –Pros: Free Development Environment Low cost of entry Large Developer community –Cons: Proprietary language and libraries Less performing memory management Libraries needed: Scintella: open source lexer, syntax highlighter Customizable code: –C sharp & VBA Suggested environment for future development: –Visual Studio (Express version is free to use) –Mono in Linux December nd "NameGame" APE-INV workshop 17

Further developments Full coding existing algorithms. Testing performance against large dataset (>Million records). Pre-setting standard routines (as XML). Drafting documentation (+Video). Proof-testing with first time users (at EPFL). December nd "NameGame" APE-INV workshop

Openness and its governance How to share it? –GitHub? –Forums How to develop a dynamic sharing community? December nd "NameGame" APE-INV workshop 19

Thank you! December nd "NameGame" APE-INV workshop 20