Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Similar presentations


Presentation on theme: "Automated Patent Classification By Yu Hu. Class 706 Subclass 12."— Presentation transcript:

1 Automated Patent Classification By Yu Hu

2 Class 706 Subclass 12

3 Patent Classifier Input: descriptions of the invention(abstracts) Output: US Classification Data from USPTO Full-text database Extract abstracts and classifications

4 Abstract of US8452718 Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information.

5 Bag of Words Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information

6 K Nearest Neighbor

7 Data 631 most recently filed patent application of Apple Inc. Preprocessing: Remove html tags, punctuation, stopwords, (numbers) Extract abstracts and classifications

8 Document_term_matrix Training/Test Split: 70/30 # of Training documents: 110 K= 9 Confusion Matrix computer graphics v. document processing v. telecommunication

9 382 Image Analysis v. 435 Chemistry: molecular biology and microbiology Training documents: ~ 400; 80% split 382 : Precision = 86.4% Recall = 97.4% 435: Precision = 81.8% Recall = 96.4%

10 Subclasses of Image Analysis Overlap

11 Subclass classification of 382 Image Analysis 382/181: Pattern Recognition 382/232: Image Compression If-Idf: term frequency-inverse document frequency This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. 181 : Precision = 83.7% Rrecall = 75% 232: Precision = 67.5% Recall = 78.1%

12 Thank you! Questions?


Download ppt "Automated Patent Classification By Yu Hu. Class 706 Subclass 12."

Similar presentations


Ads by Google