Download presentation
Presentation is loading. Please wait.
Published byAllison May Modified over 9 years ago
1
Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós December 20101 2nd "NameGame" APE-INV workshop
2
Outline Background Objectives & Rationale Results User Friendly Software –Concept –Alpha test Further steps December 20102 2nd "NameGame" APE-INV workshop
3
Background Automatic patent retrieval is becoming compulsory due to the size of data sets. Growing literature looking at this NameGame: –On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall, 2006; Thoma et al. 2007. –On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc. Our ESF Project outcomes: –New matching best practices –APE-INV database December 20103 2nd "NameGame" APE-INV workshop
4
Minimize False positive (=higher precision) Minimize False negative (=higher recall) Objectives of the NameGame December 20104 2nd "NameGame" APE-INV workshop ? Maximizing True positives
5
Rationale behind: A three step game December 20105 2nd "NameGame" APE-INV workshop
6
Examples on matching (EPFL) 6December 2010 2nd "NameGame" APE-INV workshop
7
Examples on filtering (EPFL) 7December 2010 2nd "NameGame" APE-INV workshop
8
What we learned so far? General –Matching algorithms are not perfect, but improve considerably the results. Cleaning step –Data origin changes substantially the data preparation process Matching step –There is a hierarchy pattern across algorithms, although specific to each particular case Filtering step –Supplementary data availability enhances or constraints the disambiguation process December 2010 2nd "NameGame" APE-INV workshop 8
9
Why to create a user friendly software? December 20109 2nd "NameGame" APE-INV workshop PATSTAT / APE-INV Database PATSTAT / APE-INV Database SurveyPATVAL EU FW Program SCOPUS ISI Thomson
10
Concept behind Mr. JOTL Intuitive for beginner users Flexible on inputs and its preparation Fair variety of standard matching processes Adaptable on the disambiguation filters But soundly customizable for advanced users Conceived and coded to be expanded in the future by multiple developers December 2010 2nd "NameGame" APE-INV workshop 10
11
From concept to real (ok for the moment just an alpha!) December 201011 2nd "NameGame" APE-INV workshop
12
Inputs IPTS, Sevilla May 2010.12
13
13IPTS, Sevilla May 2010. Parsing
14
Matching IPTS, Sevilla May 2010.14
15
Disambiguation IPTS, Sevilla May 2010.15 SSM
16
LET’S TEST IT! December 2010 2nd "NameGame" APE-INV workshop 16
17
Technical notes OS supported (so far): –Windows XP, Vista, Seven (Server & x64) Coded in C sharp –Pros: Free Development Environment Low cost of entry Large Developer community –Cons: Proprietary language and libraries Less performing memory management Libraries needed: Scintella: open source lexer, syntax highlighter Customizable code: –C sharp & VBA Suggested environment for future development: –Visual Studio (Express version is free to use) –Mono in Linux December 2010 2nd "NameGame" APE-INV workshop 17
18
Further developments Full coding existing algorithms. Testing performance against large dataset (>Million records). Pre-setting standard routines (as XML). Drafting documentation (+Video). Proof-testing with first time users (at EPFL). December 201018 2nd "NameGame" APE-INV workshop
19
Openness and its governance How to share it? –GitHub? –Forums How to develop a dynamic sharing community? December 2010 2nd "NameGame" APE-INV workshop 19
20
Thank you! December 2010 2nd "NameGame" APE-INV workshop 20
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.