Automated Patent Classification By Yu Hu. Class 706 Subclass 12.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Face Recognition: A Convolutional Neural Network Approach
Aggregating local image descriptors into compact codes
Automatic determination of skeletal age from hand radiographs of children Image Science Institute Utrecht University C.A.Maas.
An Overview of Machine Learning
MESA LAB Two papers in IFAC14 Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of California,
Face Recognition & Biometric Systems, 2005/2006 Face recognition process.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Implementing a reliable neuro-classifier
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Representation of hypertext documents based on terms, links and text compressibility Julian Szymański Department of Computer Systems Architecture, Gdańsk.
Automatic Collection “Recruiter” Shuang Song. Project Goal Given a collection, automatically suggest other items to add to the collection  Design a process.
Introduction to Machine Learning Approach Lecture 5.
Chapter 5: Information Retrieval and Web Search
Introduction to machine learning
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Masquerade Detection Mark Stamp 1Masquerade Detection.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Multimodal Interaction Dr. Mike Spann
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Image Classification 영상분류
Procedures for managing workflow components Workflow components: A workflow can usually be described using formal or informal flow diagramming techniques,
Chapter 6: Information Retrieval and Web Search
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 25 Nov 4, 2005 Nanjing University of Science & Technology.
Intelligent Control and Automation, WCICA 2008.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Link Distribution on Wikipedia [0407]KwangHee Park.
An ANN Approach to Identify if Driver is Wearing Safety Belts Hanwen Chen 12/9/2013.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
DeepDive Case Study Dongfang Xu School of Information.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Can Computer Algorithms Guess Your Age and Gender?
Final Year Project Presentation --- Magic Paint Face
Brian Whitman Paris Smaragdis MIT Media Lab
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Classifying enterprises by economic activity
network of simple neuron-like computing elements
Introduction PCA (Principal Component Analysis) Characteristics:
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Pre-classification and AI
Face Recognition: A Convolutional Neural Network Approach
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
An introduction to Machine Learning (ML)
Presentation transcript:

Automated Patent Classification By Yu Hu

Class 706 Subclass 12

Patent Classifier Input: descriptions of the invention(abstracts) Output: US Classification Data from USPTO Full-text database Extract abstracts and classifications

Abstract of US Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information.

Bag of Words Automated determination of a number of profiles for a training data set to be used in training a machine learning system for generating target function information from modeled profile parameters. In one embodiment, a first principal component analysis (PCA) is performed on a training data set, and a second PCA is performed on a combined data set which includes the training data set and a test data set. A test data set estimate is generated based on the first PCA transform and the second PCA matrix. The size of error between the test data set and the test data set estimate is used to determine whether a number of profiles associated with the training data set is sufficiently large for training a machine learning system to generate a library of spectral information

K Nearest Neighbor

Data 631 most recently filed patent application of Apple Inc. Preprocessing: Remove html tags, punctuation, stopwords, (numbers) Extract abstracts and classifications

Document_term_matrix Training/Test Split: 70/30 # of Training documents: 110 K= 9 Confusion Matrix computer graphics v. document processing v. telecommunication

382 Image Analysis v. 435 Chemistry: molecular biology and microbiology Training documents: ~ 400; 80% split 382 : Precision = 86.4% Recall = 97.4% 435: Precision = 81.8% Recall = 96.4%

Subclasses of Image Analysis Overlap

Subclass classification of 382 Image Analysis 382/181: Pattern Recognition 382/232: Image Compression If-Idf: term frequency-inverse document frequency This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. 181 : Precision = 83.7% Rrecall = 75% 232: Precision = 67.5% Recall = 78.1%

Thank you! Questions?