Examination Software «E-patent examiner» World Wide United Patent Space WW UPS.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 5: Introduction to Information Retrieval
A Blended Curriculum for Bermuda Public Primary Schools
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
Introduction to NLP What is Natural Language Processing?
Information Retrieval in Practice
Search Engines and Information Retrieval
1 Adaptive Management Portal April
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Search Engines and Information Retrieval Chapter 1.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Introduction to NLP ch1 What is Natural Language Processing?
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Who Needs All Those Indexes ? One is Enough Bruce Lindsay IBM Almaden Research Center
Research Methodology Class.   Your report must contains,  Abstract  Chapter 1 - Introduction  Chapter 2 - Literature Review  Chapter 3 - System.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Link Distribution on Wikipedia [0407]KwangHee Park.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Developing GRID Applications GRACE Project
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
PATENT SEARCHES Istituto Nazionale Fisica Nucleare Rossella Osella.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Proposal for Term Project
RECENT TRENDS IN METADATA GENERATION
Introduction to NLP What is Natural Language Processing?
Social Knowledge Mining
CSE 635 Multimedia Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Examination Software «E-patent examiner» World Wide United Patent Space WW UPS

2 Content 1.Introduction. Big data and how handle them. 2.Machine learning and natural language processing. 3.Statistics and/or semantics. Successful collaboration. 4.Patent Information Space structure. Evaluation of novelty and industrial applicability 5.«E-patent examiner»: aims, scope and procedure 6.Multidimensional Patent Information Space 7.Patent Information Portrait 8.Unified Patent Information Space: distributed base of knowledge 9.Experiment description: one language, one class 10.Experiment description: one language, patents and open sources 11.Pilot project: bilingual, “cloud”-deployed. (Examination from mobile phone) 12.Conclusions and Future.

3

4 Types of tools typically used in Big Data Scenario Where is the processing hosted? – Distributed server/cloud Where data is stored? – Distributed Storage (eg: Amazon s3) Where is the programming model? – Distributed processing (Map Reduce) How data is stored and indexed? – High performance schema free database What operations are performed on the data? – Analytic/Semantic Processing (Eg. RDF/OWL)

Natural Language Processing Question answering (QA) Part-of-speech (POS) tagging Named entity recognition (NER) Parsing Summarization Information extraction (IE) Machine translation (MT) Dialog Sentiment analysis Spam detection Let’s go to Agra! Buy V1AGRA … ✓ ✗ Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV Einstein met with UN officials in Princeton PERSON ORG LOC You’re invited to our dinner party, Friday May 27 at 8:30 Party May 27 add Best roast chicken in San Francisco! The waiter ignored us for 20 minutes. The 13 th Shanghai International Film Festival… 第 13 届上海国际电影节开幕 … The Dow Jones is up Housing prices rose Economy is good Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? I can see Alcatraz from the window! Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket? The S&P500 jumped

6 Statistics and/or semantics. Successful collaboration

7 Patent Information Space three-layer structure FUNDAMENTAL KNOWLEDGE PATENTS AND PATENT APPLICATIONS OPEN SOURCES New Patent Application 1.Superlarge volumes of unstructured information 2.Incomplete information 3.The subjectivity of the examiner NOVELTY ZONE «E-patent examiner»: automatic statistical and semantic analysis Industrial applicability Novelty OR Inventive step Industrial applicability Novelty Inventive step Industrial applicability Novelty Inventive step A pioneering invention Multidimensional Patent Information Space

8 «E-patent examiner» procedure New Patent Application (NPA) input Automatic topics of NPA statistical profile positioning at the Patent Information Space Establishing the sub-network of relevant documents by statistical profiles Sub-network semantic analysis to complete the visualization and to make conclusions about: – Novelty; – Industrial applicability; – Inventive step; – A pioneering invention.

9 The processing algorithm for the existing patent base Latent Dirichlet allocation (LDA) Patent base Membership vector to latent clusters for each patent, sentence, word Principal Component Analysis(PCA) Singular Value Decomposition(SVD) - Reduced belonging vector - Term-document matrix Setting up a model for semantic network construction The customized model for new patent semantic network construction

10 NPA processing algorithm LDA + PCA/SVD Incoming application -Membership vector of application and its proposals to the latent clusters - Key terms The proximity function calculation with the existing patents on the n- dimensional vector space Relevant patents, ranked by the value of the proximity function Semantic networks intersection analysis YES/NO decision. Visualization Semantically close sectors in the other patents The output of the relevant patents ranked list with semantically related concepts highlighted

11 Multidimensional Patent Information Space Node: statistic profile  semantic profile  source meta-data Relation: statistic measure of proximity  semantic networks intersection proportion International patent classification Network core of «E-patent examiner» Automatic topics classification

12 The Source Information portrait NPA Patent 3 Patent 6 Semantic profile of the patent application Matching nodes Key concept nodes that are absent in other patents Clearly different nodes

13 Implementation. Architecture. 1. Array of data extracted from the patent 2. Dictionaries, grammar, morphology domains 3. Preprocessed data for analysis 4. List of relevant patents The original text of the application Data for visualization Data for decision making Parallel processes Data exchange Subject areas knowledge base Latent clustering subsystem Preprocessing subsystem Multi-agent subsystem for information search and retrieval Semantic analysis subsystem Semantic visualization subsystem Subsystem for construction and visualization of the patent landscape Application Patent knowledge base External sources

14 Patent base of knowledge Semantic profile (network) Statistical profile (n –dimensional thematic vector) The source information portrait Sources Gallery (patents, articles and etc.) The examination decision making support system: distributed base of knowledge Examination tables Subject fields base of knowledge Meta dictionary of distributed database … … РБД патентов … The Russian Federation Patent Database EU Patent database USA Patent database Patents distributed database Digital portrait: database format independence “Cloud” and distributed architecture: No high technical requirements Statistical processing: language independence Is it necessary to use unified database format? Is it necessary to use common language? Is it necessary to use common patent classification? Automatic topics: patent classifiers independence Are any special technical requirements?

15 Visualized tips for expert for finding the intersections with other patents

16 Experiment description: one language, one class Russian foodstuffs patents 240 topics 1000 iterations all patents are preprocessed titles, abstracts and claims are used as input for LDA

17 Original patent: PRODUCTION METHOD OF CANNED “Heart stewed in tomato sauce” RU C1 Invention formula A method of producing canned "Heart stewed in tomato sauce ", providing prescription components preparation, cutting and saute in bone fat onion and mix it with the bone fat, tomato paste, sugar, salt, red hot pepper and bay leaf from the sauce, cut the heart, packaging of the heart and sauce sealing and sterilization, characterized in that the sauce additionally introduced sunflower flour before mixing onions milled sunflower flour poured water and allowed to swell, and components used in the following proportions costs... Experiment description: one language, one class. Semantic analysis of patent descriptions Relevant patent: PRODUCTION METHOD OF CANNED "HEART IN RED SAUCE MAINLY with sauerkraut" SPECIAL PURPOSE (OPTIONS) RU C1 Invention formula … production method for canned "Heart in red sauce with mostly cabbage " special purpose provides for the preparation of prescription components, cutting, frying in ghee and grinding on grinder heart, shredder, freezing and grinding on grinder with fresh cabbage, cut, saute in ghee grinder and grinding on carrots, parsley root and onion, rubbing garlic saute wheat flour, mixing these components with the bone broth, tomato paste, sugar, table salt, citric acid and extracts of biomass micromicetes, bitter black pepper and bay leaf to give the sauce, filling the mixture into the aluminum tube next flow components... Semantic analysis for the relevant patent description

18 Experiment description: one language, one class. Patent application semantic web Coincident vertices Key concepts that are not in another patent Clear differences

19 Experiment description: one language, patents and open sources ● Amount of documents: ● Evaluation — a way close to expert assessment: – Allocate a list of references for each patent. – Select ones, which refer at least once to the patents from base, m - the number of such references. – For each of them to find similar patents: n. – Search quality for a patent: n / m, if the first 20 found n similar patents. – 100 topic, 1000 iterations. ● Result: – Porter Stemmer: 72.4%, – AOT.ru Stemmer : 78.3%. Building of Patent Information Space Experimental database fragment: patents Statistical profiles building time: 7 hours 100 topics, 1000 iterations Office PC documents per 1 minute

20 Experiment description: one language, patents and open sources 30 issued patents as examples of NPA 64 top links to patents from Russian Federation patent database, patft.uspto.gov, findpatent.ru 50 topics, 10 iterations ● Result – 100% experts found links – 12 additional relevant links Patent RU C2 Examiner«E-PATENT EXAMINER» US A1, US , US , WO 2004/ A2, FR A, US A1 US A, RU A US A1, US , FR A, WO 2004/ A2, US A1 US , US A, US A1 Cited patents search module

21 EP A2 Invention-title: Antireflective porogens. Applicants: SEIKO EPSON CORP. Claim: The porous organo polysilica dielectric matrix materials of the present invention are particularly suitable for use electronic device manufacture, such as in integrated circuit manufacture. Thus, the present invention provides a method of manufacturing an electronic device including the steps of: a) disposing on the substrate a B-staged organo polysilica dielectric material including porogen; b) curing the B-staged organo polysilica dielectric material to form an organo polysilica dielectric matrix material without substantially degrading the porogen; c) thereafter subjecting the organo polysilica dielectric matrix material to conditions which at least partially remove the porogen to form a porous organo polysilica dielectric material without substantially degrading the organo polysilica dielectric material, wherein the porogen includes one or more chromophores. Pilot project: bilingual, “cloud”-deployed

22 EP A1 Invention-title: Porous materials. Applicants: SHIPLEY CO LLC. A method of manufacturing a porous organo polysilica dielectric material suitable for use in electronic device manufacture comprising the steps of: a)dispersing a plurality of removable polymeric porogen particles in a B- staged organo polysilica dielectric material; b)curing the B-staged organo polysilica dielectric material to form a dielectric matrix material without substantially degrading the porogen particles; c) subjecting the organo polysilica dielectric matrix material to conditions which at least partially remove the porogen to form a porous dielectric material without substantially degrading the organo polysilica dielectric material, wherein the porogen is substantially compatible with the B-staged organo polysilica dielectric material, wherein the porogen comprises as polymerized units at least one compound selected from silyl containing monomers or poly(alkylene oxide) monomers, wherein the dielectric material is 30% porous, wherein the mean particle size of the plurality of porogen particles is selected to provide a closed cell pore structure. Pilot project: bilingual, “cloud”-deployed

23 Pilot project: bilingual, “cloud”-deployed

24 Pilot project: bilingual, “cloud”-deployed. NPA input

25 Pilot project: bilingual, “cloud”-deployed. Decision NO

26 Pilot project: bilingual, “cloud”-deployed. Explanation of the decision

27 Patent Information Space three-layer structure FUNDAMENTAL KNOWLEDGE PATENTS AND PATENT APPLICATIONS OPEN SOURCES New Patent Application NOVELTY ZONE «E-patent examiner»: automatic statistical and semantic analysis Industrial applicability Novelty OR Inventive step

28 Pilot project: bilingual, “cloud”-deployed. Decision YES

29 Pilot project: bilingual, “cloud”-deployed. Explanation of the decision

30 Patent Information Space three-layer structure FUNDAMENTAL KNOWLEDGE PATENTS AND PATENT APPLICATIONS New Patent Application NOVELTY ZONE «E-patent examiner»: automatic statistical and semantic analysis Industrial applicability Novelty Inventive step

31 Results Pilot version of «E-patent examiner» is deployed in Amazon “cloud” servers The time of patents processing was reduced to 1000 docs in 58 sec by parallel algorithms Bilingual algorithm was trained on more than patents Patents base of knowledge was created

32 Future Scaling algorithms for full patent base of knowledge Application embedded objects processing Multilingual processing Implementation of new developed statistical method “Text explosion” that performs much better than LDA and is easily scalable

33 Conclusions «E-PATENT EXAMINER» solves problems of an examiner subjectivity and time spent for examination It’s necessary to develop a fundamentally new approach to the analysis of patent space The proposed approach implements a new global paradigm of United Patent Information Space The united efforts of the international community will make the transition from local databases to a universal environment for creating new technical solutions

34 WORLD WIDE UPS «E-PATENT EXAMINER»