Download presentation
Presentation is loading. Please wait.
1
Improving Automated Patent Claim Parsing:
SINICA IIS 2017, Summer Intern Pei Shuang, Haung 9, Aug, 2017 Improving Automated Patent Claim Parsing: Dataset, System, and Experiment Mengke Hu, David Cinciruk, John MacLaren Walsh Subject : arXiv: [cs.CL] Submission : May 2016
2
OUTLINE 01 05 02 04 03 Introduction Application
Experimental Validation: Improved Patent Subject Classification Collect Curated Tag Corpus 02 Dataset Collection via Amazon Mechanical Turk Improve Claim NLP 04 Verb Corrector 03 Experimental Validation : Improved Claim NLP Automatic Parts of Speech Tag Fixer
3
Introduction 01
4
Why is Patent ? What is Patent Claim? 01 Patent Trend Analysis
INTRODUCTION Why is Patent ? What is Patent Claim? Trend Analysis Technology Forecasting Competitor Analysis Infringement Analysis … The key of one patent 申請專利範圍 補充點先備知識
5
“ ” One Sentence - Describe this paper
01 Briefly INTRODUCTION One Sentence - Describe this paper “ Stanford NLP Parsing software not work well on Patent Claim, so, therefore, Authors design Corrector to check and build real Parsing Tree ” Why : 1. 句法過長 2. 有些常用詞,詞性不一定對 1. To begin with, patent claims involve sentences that are much longer than those typically found in other sources of text. 2. Another source of mismatch, which remains after chunking, arises from the use of words as less common parts of speech in patent claim language. For instance “said” most commonly functions as an adjective, and “claim” typically functions as a noun, when occurring in patent claims, 花錢找人重新全部標記太放費錢
6
02 Collect Curated Tag Corpus
Dataset Collection via Amazon Mechanical Turk
7
AMT (Amazon Mechanical Turk )
02 Where Collect Curated Tag Corpus AMT (Amazon Mechanical Turk ) A platform, where you can post TASK H I T (Human Intelligent Task) Worker get paid when finishing the task Curated Tag 補充點先備知識 Each HIT = 18 real + 2 test If accuracy > 0.9 : 2 test are curated Tag Else : uncurated Tag
8
02 Change 1 Campaign 1 2 Campaign 2 3 Campaign 3 4 Campaign 4 18 days
Collect Curated Tag Corpus Campaign 1 1 18 days 29 days Campaign 2 2 3 32 days Campaign 3 6 days 4 Campaign 4
9
“ ” Only 69% After collecting 174246 curated tags and
02 Result Collect Curated Tag Corpus After collecting curated tags and identifying these as benchmark, we find that… “ The accuracy of Stanford NLP Parsing software on verb is Why : 1. 句法過長 2. 有些常用詞,詞性不一定對 1. To begin with, patent claims involve sentences that are much longer than those typically found in other sources of text. 2. Another source of mismatch, which remains after chunking, arises from the use of words as less common parts of speech in patent claim language. For instance “said” most commonly functions as an adjective, and “claim” typically functions as a noun, when occurring in patent claims, Only 69% ”
10
Verb Corrector 03 Automatic Parts of Speech Tag Fixer
11
What is `Verb Corrector’?
03 Function Verb Corrector What is `Verb Corrector’? A classifier trained by SVM, output N, V, adj … Why `Verb’ ? Right verb-tag, almost right parsing tree How to train ? concatenated Using curated AMT tag corpus, only train `Verb Triplicate’ by just correcting the incorrect verbs tags and rerunning the parser, a correct parsing can often be obtained 所以第二章節所提到的在AMT上每一個Human Intelligent Task 每個任務中的20題,幾乎都focus 在那些看起來是動詞的詞
12
After training, how it work ?
03 Flowchart Verb Corrector Word2Vec (s) After training, how it work ? Step4 Verb Triplicate(s) Step3 SVM Rule-Based Changing Step5 Step2 Stanford Parsing Step1 Step6 Step7 Input `Segment’ Parsing Tree (s) Parsing Tree (Highest score)
13
Improve Claim NLP 04 Experimental Validation : Improved Claim NLP
14
Percentage of Error about verb-tag ?
04 Error 1 Improve Claim NLP Percentage of Error about verb-tag ? 0% 10% 20% 30% 40% Stanford POS Tagging Rule-Based Corrector ; ; 9.17 Verb Corrector Rule-Based + SVM
15
Comparing two parsing tree
04 Error 2 Improve Claim NLP Comparing two parsing tree ground truth trees can be created by taking the portion of the AMT corpus that was not utilized in training the tag corrector, Calculate Precision & Recall
16
Comparing Stanford Parsing and Verb Corrector
04 Error 2 Improve Claim NLP Comparing Stanford Parsing and Verb Corrector 左邊是都和標準答案比 Corpus 分成兩半 -> 一半訓練SVM, 一半拿來當正確答案 和丟入SVM 右邊是以Verb Corrector 當正確答案 看得出Verb Corrector 更接近正確答案
17
Application 05 Experimental Validation: Improved Patent Subject Classification
18
Why Patent Classification ?
05 Function Application Why Patent Classification ? Patent Search, avoiding Infringement Big topic !
19
Conclusion and What we can learn from
06
20
1:Experience 2:Domain Knowledge 05 总结回顾
REVIEW 1:Experience How to handle similar domain problem like Patent Claim 2:Domain Knowledge (1) paper proposed an alternative way of adapting NLP software to claim language by simply correcting the POS tags of putative verbs (2) Rule- Based Corrector (3) Verb Triplicate (?)
21
Speaker:Pei Shuang, Haung
Thank you Speaker:Pei Shuang, Haung
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.