Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.

Slides:



Advertisements
Similar presentations
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Advertisements

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Nisha Ranga TURNING DOWN THE NOISE IN BLOGOSPHERE.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Digital Library Service Integration (DLSI) --> Looking for Collections and Services to be DLSI Testbeds
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Evaluating Statistically Generated Phrases University of Melbourne Department of Computer Science and Software Engineering Raymond Wan and Alistair Moffat.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Information Retrieval
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
NTU & MSRA Ming-Feng Tsai
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Queensland University of Technology
PRESENTED BY: PEAR A BHUIYAN
Introduction to Search Engines
Presentation transcript:

Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information and Communication University, South Korea

Introduction Motive: Patent text is a good source to discover technological progresses. Problem: Previous solutions(citation analysis, network-based patent analysis) for patent domain have some drawbacks – Need domain expertise – Not easy to recognize salient concepts – Hamper wide application of the proposed method

Introduction In this paper, the authors want to – Avoid the limitations mentioned previously Method 1.Semantic key-phrase extraction(No experts) 2.Technological trend discovery(Unsupervised) Semantic key-phrase define: – Problem, such as “recognizing spoken language” – Solution, such as “language model” – Domain, such as “speech recognition”

Introduction Application: help users explore numerous technical documents efficiently to get the technological trends, the below is a example

Overall procedure 1.Technology identification through semantic key-phrase extraction The probabilistic framework with linguistic clues The probabilistic framework have weighting The linguistic clues have weighting Finally, Using statistical learner to learn(Libsvm) 2.Discover technological trends by Select important technologies during a time sapn Linking them according to semantic relatedness

Problem Formulation Definition – Domain : A field of technology given by a user query, then generate a collection of related field – Problem : A patent or a method attempts to solve – Solution : A method, a model or an approach that is associated with a particular problem – Technology : A combination of a problem, a solution, and the given domain – Time Span :

Problem Formulation Definition – Technological Trend : a main stream of technologies during a time span l. Example:

Technological Trend Discovery System Structure of Patent Documents Semantic Key-phrase Extraction – Problem Extraction – Solution Extraction Technological Trend Discovery

Structure of Patent Documents Database : USPTO(United States Patent and Trademark office) Time span Cite information Linguistic features

Semantic Key-phrase Extraction Step 1 – Parsing a patent to get smallest noun phrase as key- phrase candidates(e.g. signal patterns) – Expand NP to V+NP by dependency(e.g. recognizing signal patterns) Step 2 – Identify Problem key-phrase by classifying Step 3 – Among the rest of candidate, extract solution key- phrase to get

Problem Extraction Feature Topical language model(unigram) Consider the dependency(bigram model) Special smoothing: Relevance & background language model

Problem Extraction Question: Probability model is biased to the topicality, need other mechanism to revise it Method: Linguistic clues – Gather all distinct patterns from the annotation – Generalize grammar by these pattern – E.g. (method/NN+in/PP )and(system/NN+in/PP) ==> ( method | system )NN+in/PP

Problem Extraction Feature 342 generalized patterns

Problem Extraction generalized patterns need a confidence A statistical machine learner(Libsvm) to the linguistic clues and the language models. Libsvm classify the candidate into problem & non-problem by using the above features

Solution Extraction Probability features work would not be useful – The solution phrase are rarely share within cited document Add the “head word” feature(i.e. model, approach, method, methodology etc.) the other feature category is the same as Problem Extraction

Technology Trend Discovery Reduction: Select several salient technologies and associate semantic relations between them How to find an good time span can discover effective technological trends – KL-divergence to compare two language model

Technology Trend Discovery How to find salient technologies within time spans. – If a technology is important, many patent will refer to it – Mutual information concept

Technology Trend Discovery Algorithm Step 1 – Define an initial time span(by dense of the data) Step 2 – Generate all possible combination of time span(e.g. ) Step 3 – Calculate KL-divergences of all pairs from step 2,rank them Step 4 – Select the most important technology among the top n pairs

Experiment Database: USPTO Domain: Speech recognition Data number: US 1420 patent document Time: Annotator: three computer science graduate students Annotated number:400 document(uniformly select over the span of time)

Experiment Annotated work – Deal with the acronym(by Wiki and simple parenthetical patterns) – WordNet to normalize the noun and verb Technology phrase(Answer) is produced by gold standard with majority votes Agreements for 78% of sample(about 300 ) Technology Trend Discovery do not have a standard, it is too hard.(too many time span) ==>do not have good evaluation

Experiment Set the background language model Used LIBSVM as a machine learner,used 5-fold cross validation

Experiment All feature was proven the effectiveness

Experiment From the above step, we can discover many meaningful problems and solutions Question: Synonymy issue(even utilize synonyms from WordNet)

Experiment Discover technological trends by the Technology Trend Discovery Algorithm

Conclusion & future work Discover such trends can reveal latent technologies Also can assist an exploration by alleviating information overload caused by search results Future work  Synonymy issue in Semantic Extraction  TTD standardized evaluation needs to investigated