Improving Automated Patent Claim Parsing：

Improving Automated Patent Claim Parsing：
SINICA IIS 2017, Summer Intern Pei Shuang, Haung 9, Aug, 2017 Improving Automated Patent Claim Parsing： Dataset, System, and Experiment Mengke Hu, David Cinciruk, John MacLaren Walsh Subject : arXiv: [cs.CL] Submission : May 2016

OUTLINE 01 05 02 04 03 Introduction Application
Experimental Validation: Improved Patent Subject Classification Collect Curated Tag Corpus 02 Dataset Collection via Amazon Mechanical Turk Improve Claim NLP 04 Verb Corrector 03 Experimental Validation : Improved Claim NLP Automatic Parts of Speech Tag Fixer

Introduction 01

Why is Patent ? What is Patent Claim? 01 Patent Trend Analysis
INTRODUCTION Why is Patent ? What is Patent Claim? Trend Analysis Technology Forecasting Competitor Analysis Infringement Analysis … The key of one patent 申請專利範圍補充點先備知識

“ ” One Sentence - Describe this paper
01 Briefly INTRODUCTION One Sentence - Describe this paper “ Stanford NLP Parsing software not work well on Patent Claim, so, therefore, Authors design Corrector to check and build real Parsing Tree ” Why : 1. 句法過長 2. 有些常用詞，詞性不一定對 1. To begin with, patent claims involve sentences that are much longer than those typically found in other sources of text. 2. Another source of mismatch, which remains after chunking, arises from the use of words as less common parts of speech in patent claim language. For instance “said” most commonly functions as an adjective, and “claim” typically functions as a noun, when occurring in patent claims, 花錢找人重新全部標記太放費錢

02 Collect Curated Tag Corpus
Dataset Collection via Amazon Mechanical Turk

AMT (Amazon Mechanical Turk )
02 Where Collect Curated Tag Corpus AMT (Amazon Mechanical Turk ) A platform, where you can post TASK H I T (Human Intelligent Task) Worker get paid when finishing the task Curated Tag 補充點先備知識 Each HIT = 18 real + 2 test If accuracy > 0.9 : 2 test are curated Tag Else : uncurated Tag

02 Change 1 Campaign 1 2 Campaign 2 3 Campaign 3 4 Campaign 4 18 days
Collect Curated Tag Corpus Campaign 1 1 18 days 29 days Campaign 2 2 3 32 days Campaign 3 6 days 4 Campaign 4

“ ” Only 69% After collecting 174246 curated tags and
02 Result Collect Curated Tag Corpus After collecting curated tags and identifying these as benchmark, we find that… “ The accuracy of Stanford NLP Parsing software on verb is Why : 1. 句法過長 2. 有些常用詞，詞性不一定對 1. To begin with, patent claims involve sentences that are much longer than those typically found in other sources of text. 2. Another source of mismatch, which remains after chunking, arises from the use of words as less common parts of speech in patent claim language. For instance “said” most commonly functions as an adjective, and “claim” typically functions as a noun, when occurring in patent claims, Only 69% ”

Verb Corrector 03 Automatic Parts of Speech Tag Fixer

What is `Verb Corrector’?
03 Function Verb Corrector What is `Verb Corrector’? A classifier trained by SVM, output N, V, adj … Why `Verb’ ? Right verb-tag, almost right parsing tree How to train ? concatenated Using curated AMT tag corpus, only train `Verb Triplicate’ by just correcting the incorrect verbs tags and rerunning the parser, a correct parsing can often be obtained 所以第二章節所提到的在AMT上每一個Human Intelligent Task 每個任務中的20題，幾乎都focus 在那些看起來是動詞的詞

After training, how it work ?
03 Flowchart Verb Corrector Word2Vec (s) After training, how it work ? Step4 Verb Triplicate(s) Step3 SVM Rule-Based Changing Step5 Step2 Stanford Parsing Step1 Step6 Step7 Input `Segment’ Parsing Tree (s) Parsing Tree (Highest score)

Improve Claim NLP 04 Experimental Validation : Improved Claim NLP

Percentage of Error about verb-tag ?
04 Error 1 Improve Claim NLP Percentage of Error about verb-tag ? 0% 10% 20% 30% 40% Stanford POS Tagging Rule-Based Corrector ; ; 9.17 Verb Corrector Rule-Based + SVM

Comparing two parsing tree
04 Error 2 Improve Claim NLP Comparing two parsing tree ground truth trees can be created by taking the portion of the AMT corpus that was not utilized in training the tag corrector, Calculate Precision & Recall

Comparing Stanford Parsing and Verb Corrector
04 Error 2 Improve Claim NLP Comparing Stanford Parsing and Verb Corrector 左邊是都和標準答案比 Corpus 分成兩半 -> 一半訓練SVM, 一半拿來當正確答案和丟入SVM 右邊是以Verb Corrector 當正確答案看得出Verb Corrector 更接近正確答案

Application 05 Experimental Validation: Improved Patent Subject Classification

Why Patent Classification ?
05 Function Application Why Patent Classification ? Patent Search, avoiding Infringement Big topic !

Conclusion and What we can learn from
06

1：Experience 2：Domain Knowledge 05 总结回顾
REVIEW 1：Experience How to handle similar domain problem like Patent Claim 2：Domain Knowledge (1) paper proposed an alternative way of adapting NLP software to claim language by simply correcting the POS tags of putative verbs (2) Rule- Based Corrector (3) Verb Triplicate (?)

Speaker：Pei Shuang, Haung
Thank you Speaker：Pei Shuang, Haung

Improving Automated Patent Claim Parsing：

Similar presentations

Presentation on theme: "Improving Automated Patent Claim Parsing："— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Automated Patent Claim Parsing：

Similar presentations

Presentation on theme: "Improving Automated Patent Claim Parsing："— Presentation transcript:

Similar presentations

About project

Feedback