USPTO Patent Data Source and Data Extraction Mandy Dang MIS 580 University of Arizona 02-06-2008.

Slides:



Advertisements
Similar presentations
Programme: 145 sessions & social events
Advertisements

Reversing Offshore Economics and Improving Financial Regulation: Curtailing Illicit Financial Flows Petr Janský Economist and Consultant Charles University.
“American high-school education is ‘obsolete’… In 2001, India graduated almost a million more students from college than the United States did. China graduates.
Poverty in the US Who is considered to be living in poverty? 2010 Poverty Thresholds, Selected Family Types Single Individual Under 65 years$ 11,
AsiaCrypt Program Committee Report Chi Sung Laih Nov.30~Dec.4,2003 Taipei, Taiwan.
NanoMapper: A Knowledge Mapping System for Nanotechnology Funding and Developments Daning Hu Xin Li Yan Dang Joyce Chan Dr. Hsinchun Chen Dec 2006.
1 Searching Patents for Chemical Processes Evans Library Texas A&M University February 12, 2010 With special thanks to Ron Hambric and Brian Carpenter.
Worldwide Nanotechnology Development: A Comparative Study of USPTO, EPO, and JPO Patents Artificial Intelligence Lab Department of Management Information.
SAP Student Interest Group
Hello to UMD from Cirrus. Brief History of Cirrus Cirrus founded in 1984 Began development of the VK-30 in 1988 Began development of ST50 in
1 Basic Facts about Patents Chem 3380 Fall Patent Documents  Legal Document A patent is a legal right granted by a government to an inventor.
© Lloyd’s Regional Watch Content Guide CLICK ANY BOX AMERICAS IMEA EUROPE ASIA PACIFIC.
1 OECD Work on the Safety of Manufactured Nanomaterials Environment, Health and Safety Division Environment Directorate OECD.
A Wide Spectrum Radio Waves and The Radio JOVE Project NSTA Presentation March 21, 2009.
Poverty & Human Capability 101 Introductory Class.
Klancy Kennedy Product Development – Weekly Task Report 8_13_2012 Pay to Play Gamer Demographics Q: What countries have the most paying customers for games.
Join the MVA Community! ▪ Microsoft Virtual Academy—Free online training! ‒ Tailored for IT Pros and Developers ‒ Over 2M registered users ▪ Earn while.
Web 2.0 Presentation BUS 111 Michael Elliott.  Logos Skype.
1 QUESTEL ORBIT.COM. 2 QUESTEL French company Producer and provider of online and internet services Collection of patents, trademarks, designs, scientific-technical.
Presented by: 1 The Psychology of Search Erica Schmidt, Global Search
Solar Physics Board Meeting Rio de Janeiro July, 2009.
For more information Internet: Tel: +(351) Fax:+1(801) Our Contracts Copyrights, Trademarks,
Worldwide Nanotechnology Development: A Comparative Study of USPTO, EPO, and JPO Patents Yiling Lin Advisor: Hsinchun Chen Dec, 2006.
OECD Organisation for Economic Co-operation and Development Organisation and Content Overview.
Hello UMD from Cirrus Aircraft
ANALYSIS OF THE PATENTS GRANTED ON USES OF VARIOUS CROPS At UNITED STATES PATENTS & TRADEMARK OFFICE (USPTO) from th JULY, CASE 1. PSYLLIUM.
Product news and Updates Future Roadmap Paul Greaves Sales Director.
“Measuring the Information Economy” WITSA Public Policy Meeting hosted by BIAC 24 October 2002.
1 Announcing … Global broadband subscribers to 30 June 2005 Total: 176 million 115 million * 65% * choose DSL.
Global MAX Welcome to the world of…. About us We take pleasure in inviting you to become a member of Global MAX. We have two objectives: 1 st to provide.
Perfection in Automation
1 U.S. Science & Technology: Issues and Outlook by Erich Bloch Director September 26, 2005.
Chapter 15 Development of the profession of O&M around the world.
What is an invention??. Inventions  To invent is to create through independent investigation, experimentation, and basic brain power.  Inventions can.
About Parkopedia Parkopedia is the world's leading parking information provider used by millions of drivers and organizations such as Audi, BMW, Ford,
National Safer Internet Center Bulgarian example for public private partnership.
Chapter 27 Chapter 27 Geographic Variability in Hip and Vertebral Fractures Copyright © 2013 Elsevier Inc. All rights reserved.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
The United States The Economy. What is GDP ? Gross Domestic Product (GDP): The total market (or dollar) value of all final goods and services produced.
E u r o g u i d a n c e A Network of National Resource and Information Centres for Guidance Established in 1992.
Doświadczenia programu "Nauka bez granic" w Polsce na przykładzie studentów GUMED An experience of participants of the "Science Without Borders" Program.
OAKWOOD CAPITAL MANAGEMENT LLC Annual Return (%) Equity Returns of Developed Markets Boxed Return is highest return for the year. In US dollars. Source:
Figure 1. PARTICIPATING STEM CELL DONOR REGISTRIES Number of registries Year ©BMDW.
Introducing EPO PATSTAT EPO Worldwide Patent Statistical Database James Rollinson.
Discover the white spots on your map… this can be your future depending on your next step! Let’s share the buffet plate of the world; DO YOU WANT TO BE.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
Chief Financial Officers List
OPEN FOR BUSINESS An introduction to New Zealand August 2014.
IEC System of Conformity Assessment Schemes for Electrotechnical Equipment and Components.
Chief Accounting Officers Database List A chief accounting officer or a CAO plays a vital role in the organization as he/she is responsible for.
Chief Security Officers List
Global B2B Contacts LLC CEO Mailing List  CEO Mailing List If you try to create your own chief executive officers (CEO) list,
With Global B2B Contacts COO mailing list, you can effectively reach the COO.
Best Sustainable Development Practices for Food Security UV-B radiation: A Specific Regulator of Plant Growth and Food Quality in a Changing Climate The.
USD billion

Six Sigma Total Error Percent Process Sigma 1,000, ,000 10% 2.78
INNOVATION AND PATENTS – AN OVERVIEW
IKEA.
The 1680 Family’s Reach.
Citi Virtual Card Accounts – Continued Global Expansion
Farhad Rezagholi – Amir Hadi University of Kurdistan
Average Freshman Graduation Rates,
Adhoc group: 20/40MHz channelization
List the most powerful women in the world…
2006 Rank Adjusted for Purchasing Power
Denmark Business List Presented by Globalmailmedia,
China Business List Presented by Globalmailmedia,
Korea Business List Presented by Globalmailmedia,
Australian Business List Presented by Globalmailmedia
Presentation transcript:

USPTO Patent Data Source and Data Extraction Mandy Dang MIS 580 University of Arizona

2 Outline Patent USPTO Search USPTO Patents Data Extraction: Case Study of NSE Patents

3 Patent “Patent" usually refers to a right granted to anyone who invents or discovers any new and useful process, machine, article of manufacture, or composition of matter, or any new and useful improvement. –A patent is not a right to practice or use the invention. Rather, it provides the right to exclude others from making, using, selling, offering for sale, usually 20 years from the filing date. –It is a limited property right that the government offers to inventors in exchange for their agreement to share the details of their inventions with the public. A patent is a special type of technology document which documents many important innovations and technology advances.

4 USPTO The United States Patent and Trademark Office (USPTO) is an agency in the United States Department of Commerce that provides patent protection to inventors and businesses for their inventions, and trademark registration for product and intellectual property identification. Each year, the USPTO issues thousands of patents to companies and individuals worldwide. As of March 2006, the USPTO has issued over 7 million patents, with 3,500 to 4,500 newly granted patents each week. USPTO provides online full-text access for patents issued since URLs: –USPTO Official Website: –USPTO Patent Search:

5 Search USPTO Patents

6

7

8 Data Extraction: Case Study of NSE Patents Nanoscale Science and Engineering (NSE) field –Fundamental technology that is critical for a nation’s technological competence. –Revolutionize a wide range of application domains. Nanotechnology –Is an applied science/ technology field that is multi- disciplinary and encompasses engineering and other work taking place at the nanoscale. –Critical for a nation’s technological competence. –R&D status attracts various communities’ interest.

9 Data Extraction Procedure The goal is to gather all the related patents from USPTO Web site as free-text html pages and then parse them into structured data and stored in a database. Procedure of extracting NSE patents from USPTO: 1.Spider search results (summary pages) 2.Spider individual patent documents (detailed pages) 3.Noise filtering 4.Parsing

10 1. Spider search results (summary pages) A list of keywords can be used to search for patents related to NSE domain. The keywords were provided by domain experts. A spider program written by Perl was used to spider the search result pages.

11 use HTML::TokeParser; use LWP; use URI::Escape; use strict; sub query { … … … … open(f, $ARGV[0]); = ; close(f); … … $query_url = " bool.html&r=0&f=S&l=50&TERM1=$kw&FIELD1=&co1=AND&TERM2=$start%3E$end&FIELD2=ISD&d=ptx"; $response = $browser->get($query_url); $result = $response->content(); open(f, "> $fpage-$pno.html"); select(f); print $result; close(f); } query('1/1/2007', '12/31/2007'); Example code Get keywords Download search p ages Set up time range

12 Patent IDs Search result page example

13 2. Spider individual patent documents (detailed pages) In this step, we need to: –1st, collect all the patent IDs; –2nd, download all the patents based on the patent IDs by using proxies. The data set is often very large, so using proxies can save a lot of time.

14 1 Download detailed patent documents Create several files, each of which contains a fixed amount of patent IDs (e.g., 300 patent IDs). Server: Send different patent ID files to different client threads. … … open(f, $ARGV[0]); = ; close(f); my $theid; foreach $theid $new_sock = $sock->accept(); my $buf = ; print ($new_sock $theid."\n"); print $buf. " ". $theid."\n"; close $new_sock; … … Client: Use proxy to download the patents whose IDs are in the file sent from the server. … … do { $response = $browser->get($pat_url); if (!$response->is_success()){ select(stdout); print $response->status_line, "\n\n"; sleep(rand(7)+1); }while (!$response->is_success()) … …

15 Patent document example

16

17 3. Noise filtering Some patents we gathered may have noisy NSE keywords, some may even have no NSE keywords. –Such patents need to be filtered out. Noise keywords includes: –nanosecond –nanoliter –nano$ –nano-second –nano-liter –nano.sub –nano [space] –nano2

18 4. Parsing Extract different data fields from the HTML patent documents and parse into database.

19 public static void processAssignees( ) throws IOException {… … … … String[] assignees = assigneeString.split(" "); for (int i = 0; i < assignees.length; i++) { currentassignee=assignees[i].trim(); if(currentassignee.length()==0) continue; currentassignee = currentassignee.replaceAll("\r\n", ""); name =findBetween(currentassignee,0," "," "); currPosition=currentassignee.indexOf(" ")+" ".length(); address=findBetween(currentassignee,currPosition,"(",")"); if(address==null) { System.err.println("wrong address: " + patentId); } int startIndex=0, endIndex=0; if((endIndex = address.lastIndexOf(',')) >= 0) {city = address.substring(0, endIndex); if (city.lastIndexOf(',') >= 0) {city = city.substring(city.lastIndexOf(',') + 1); city.replaceAll("[^a-zA-Z]", ""); } startIndex = endIndex + 1; } else city="-"; address = address.substring(startIndex); country=findBetween(address,0," "," "); if(country==null) {country="US"; state=address.trim(); } else state="-"; name=name.trim(); city=city.trim(); state=state.trim(); rank++; } Parsing example: parsing inventor data Process inventor name Process inventor address Keep the ranking order of inventors

20 Data Analysis Examples Bibliographic analysis –Top 50 countries select c.countryName, count(distinct b.patentId) from usp_assignee a, usp_patentAssignee b, usp_countryName c where a.assigneeId=b.assigneeId and a.aCountry not in ('unknown','') and a.aCountry=c.countryCode group by c.countryName order by count(distinct b.patentId)desc RankAssignee Country Number of Patents 1United States13,506 2Japan2,653 3Federal Republic of Germany836 4France534 5China (Taiwan)428 6Republic of Korea406 7Canada333 8Netherlands325 9Australia276 10United Kingdom258 11Switzerland193 12Israel163 13Sweden108 14Belgium106 15Italy82 16Singapore70 17China66 18Denmark56 19Finland51 20India39 21Hong Kong33 22Bermuda28 23Ireland26 24Austria24 25Norway23 26Spain15 27Liechtenstein13 28Barbados13 29British Virgin Islands7 30New Zealand7

21 Citation Network Analysis Developing software: Graphviz

22 Content Map Analysis Developing software: multi-level self-organizing map algorithm developed by AI Lab at the U of Arizona

23 Thanks!