BUILDING NANOBANK Data Structure and Selection Criteria Jason Fong and Emre Uyar University of California, Los Angeles 1.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

SCOPUS Searching for Scientific Articles By Mohamed Atani UNEP.
Comparison of BIDS ISI (Enhanced) with Web of Science Lisa Haddow.
Welcome to the Academic Search Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
BEA Economic Areas Aligning Workforce & Economic Information Association of Public Data Users APDU 2008 Annual Meeting The Brookings Institution Washington,
Library Resources for Physics Postgraduates October 2010 Barbara Dorward.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
Using Databases for Education Research EBSCOhost ProQuest.
Using the ERIC Database This tutorial will show you how to access ERIC which contains citations, abstracts and some full-text materials from journals and.
Welcome to the CINAHL* tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to make your.
Exercise Your your Library ® Smart Searching UW Library Winter 2007.
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
PubMed/How to Search, Display, Download & (module 4.1)
Welcome to the Web of Science tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques to.
Urban Growth and Structure Kreg Walvoord And Hillary Campbell.
Web of Science. Copyright 2006 Thomson Corporation 2 Example: (bird* or avian) and (flu or influenz*) Enter your terms to be searched. Search fields are.
Refining Design Focus Through Literature Searching Morag Coyne Liaison Librarian for SDE ext SYDE 161 Workshop:
 A databases is a collection of data organized to make it easy to search and easy to retrieve in a useful, usable form.
Library Resources Barbara Dorward November Previous session  Catalogues  Library resources  Finding information on the web  Evaluation of information.
Searching Databases. What is in the Library? The Online Library has thousands of journal articles and electronic books available for your use. Also available.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Library Information and Services CSE Librarian: Jason Neal Phone: Office: B 03 E Nedderman Hall UTA.
Reference Databases Guides 2010 ISI Web of Science IEEE ACM SpringerLink Wilson Proquest Wiley-Blackwell ScienceDirect Scopus SciFinder.
BIO1130 Lab 2 Scientific literature. Laboratory objectives After completing this laboratory, you should be able to: Determine whether a publication can.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Support.ebsco.com Basic Searching for K-12 School Libraries Tutorial.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
W orkshops in I nformation S kills and E lectronic R esources Oxford University Library Services – Information Skills Training Social Sciences Web of Knowledge.
IL Step 2: Searching for Information Information Literacy 1.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Assignee Name Harmonization Efforts at the U.S. Patent and Trademark Office US Patent and Trademark Office Office of Electronic Information Products Patent.
User Guide to DBPIA for Institutional Members Nurimedia Co., Ltd. 2012
Current Events and Issues Using Index Databases for Finding Answers.
Thomson Reuters ISI (Information Sciences Institute) Azam Raoofi, Head of Indexing & Education Departments, Kowsar Editorial Meeting, Sep 19 th 2013.
WISER: Citation searching Web of Knowledge is a powerful way to access the ISI's multidisciplinary citation indexes. It allows you to discover what research.
CiNii Articles is a service that provides information on scholastic articles, with an emphasis on Japanese papers. It allows users to find the articles.
ITGS Databases.
Welcome to the Business Source Premier tutorial By the end of this tutorial you should be able to: Do a basic search to find references Use search techniques.
EconLit Using indexes University Library click = next.
Web of Science: Citation Indexes on the Web Gary Wiggins 9/29/2004.
Comparative Labor History Research Tools & Strategies.
Mr. P’s Class Term Paper All the Steps on the Path to an “A” Term Paper in World History.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
 Using Online Databases. What are Scholarly Databases?  Professionals in various fields conduct scientific research and publish their research to share.
A brief tour of Academic Search Premier. Agenda: Agenda: What is a database? What is a database? Searching keywords and using truncation. Searching keywords.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
(Click to advance the presentation.). The best source for locating these articles is the collection of research databases at the Online Library. While.
1 e-Resources on Social Sciences: Scopus. 2 Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Publication Pattern of CA-A Cancer Journal for Clinician Hsin Chen 1 *, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School of Public Health, Taipei Medical University.
Databases Post-Graduate Workshop 2011 Letitia Lekay.
INFORMATION SOURCES Resources in a library are determined by the information requirements of the users of the Library.
BIO1130 Lab 2 Scientific literature
Bibliometrics toolkit: Thomson Reuters products
Library Website, Catalog, DATABASES and Free Web Resources
SIC & NAICS Codes Finding the Codes
Test Review Be prepared to provide an answer.
EBSCO Discovery Service
CAB Abstracts, Medline & Zoological Record
ISI Web of Knowledge Early updates
IL Step 3: Using Bibliographic Databases
Introduction of KNS55 Platform
IL Step 2: Searching for Information
Bibliometric Analysis of Quality of Life Publication
A Comprehensive Index for Classical Studies
Google Patents google.com/patents.
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

BUILDING NANOBANK Data Structure and Selection Criteria Jason Fong and Emre Uyar University of California, Los Angeles 1

What is Nanobank? Nanobank is a collection of observations from various sources (scientific articles, patents and government grants), determined to be related to nanotechnology field, either by probabilistic information retrieval (IR) methods or by being declared nano by a source authority. 2

Data Sources - Articles 580,711 scientific articles from peer reviewed journals. Source: Science Citation Index, Arts & Humanities Citation Index and Social Sciences Citation Index of the Institute for Scientific Information Inc. (ISI®). All together, these indexes contain more than 24,250,000 entries from over 8,700 peer reviewed scientific journals. 3

Data Sources – Patents and Grants 240,437 patents from U.S. Patenting and Trademark Office’s online database of more than 4,000,000 patents, granted by USPTO from 1976 to ,831 grants from NIH and NSF databases. 4

Data Contents Articles ◦ Titles ◦ Journal volume and issue numbers ◦ Publication years ◦ Author names ◦ Names and addresses of organizations affiliated with authors 5

Data Contents Patents ◦ Titles and abstracts ◦ Application and grant dates ◦ Names and addresses of inventors and assignees ◦ U.S. and international patent classifications 6

Data Contents Grants ◦ Titles and abstracts ◦ Receiving organization names and addresses ◦ PI and co-PI names ◦ Grant amounts 7

Nanobank Data Structure Internal database – Stored in a relational database – Separate tables for various data items – ID numbers for each item link between tables Version posted on Nanobank.org – Denormalized form of internal database – Storing redundant data isn’t as space-efficient, but lessens the need to join multiple tables – Nanobank Codebook contains detailed information on tables and fields available in each 8

Document Selection Document Selection Methods ◦ Keywords ◦ Probabilistic ◦ Authority-selected Tables include a field to indicate selection method: ◦ “nanobank_flag” = 1 if selected by Keywords or Probabilistic; 0 otherwise ◦ “authority_flag” = 1 if Authority-selected; 0 otherwise 9

Document Selection: Keywords Search for text patterns matching words or phrases related to nanotechnology Words and phrases chosen by subject specialists Less effective for identifying very early or recent documents – Early documents were written before the terms were in common usage – Recent documents have terms that are too new to be included in the search patterns 10

Document Selection: Probabilistic Incorporates new terms as they come into common usage Uses the Xapian search engine library to perform ranking calculations Analyzes document text and ranks against a set of query terms 11

Document Selection: Probabilistic Initial query terms from the Virtual Journal of Nanoscale Science & Technology (VJNano): ◦ All articles in VJNano assumed to be relevant ◦ Select highest ranked terms Document selection process: ◦ Use initial query terms to select relevant documents from all journal articles ◦ Select additional terms from those relevant documents and add to query ◦ Repeat selection with expanded query terms 12

Document Selection: Authority Set Articles – Listed in the Virtual Journal of Nanoscale Science & Technology Patents – Listed under United States Patent Classification Class 977 (Nanotechnology) NSF Grants – program name contains “nano” NIH Grants – NIH descriptive tag contains “nano” 13

GEOCODING Standardizing between differing naming conventions used in different sources. Standardizing between non-uniformity in how observations are recorded. Correcting common mistakes. For US observations: Providing different grouping units (other than city and state) not available in original data sources, like counties and BEA areas. 14

COUNTRY GEOCODING Country names in all observations are cleaned, standardized and assigned an ISO code (2 digit alphabetical) Current ISO list of countries is taken as basis; historical entries assigned to the closest current country to the extend available. 15

US GEOCODING US observations are those in 50 US states, DC and 7 US associated areas. Cities, states, counties and BEA economic areas are coded using “Populated Places” data obtained from FIPS 55 database and BEA. Basis is the city-state combination. City names are standardized and matched to the names in FIPS database on a state-by-state basis. In articles, 99.98% of US observations have been assigned a definite city - state code. 16

US GEOCODING: Variables Created 1. Standard_city_name: Standardized name as it appears on the FIPS database (corrected for misspelings, abbreviations, etc...) 2. State_code: 2 digit numeric code. 3. City_code: 5 digit numeric code, unique by state. 4. County_code: 5 digit numeric code. 5. County_name City code + state code uniquely determine a populated place. Numeric codes are same as the codes used by FIPS. 17

GEOCODING – US BEA Areas Bureau of Economic Analysis (BEA) created 179 Economic Areas in the US by asigning each county is assigned to a unique BEA. BEA_code: 3 digit numeric code that determines the associated BEA Economic area for each observation. "BEA's economic areas define the relevant regional markets surrounding metropolitan or micropolitan statistical areas. They consist of one or more economic nodes - metropolitan or micropolitan statistical areas that serve as regional centers of economic activity and the surrounding counties that are economically related to the nodes. The economic areas were redefined on November 17, 2004, and are based on commuting data from the 2000 decennial population census, on redefined statistical areas from OMB (February 2004), and on newspaper circulation data from the Audit Bureau of Circulations for 2001." 18

ORGANIZATION CODES Each observation is assigned an alpha numerical code. 2 digit alphabetical part determines the organization type. Numeric part groups names that are same up to standardization and hand cleaning First 2 digitsOrganization type FIFirm UNUniversity NLNational Lab RIResearch Inst UGUS Government HOHospital ASAcademy of Sciences NONo Organization SCSchool OTOther 19

Organization Codes: Types of Cleaning 1. Standardization of common identifiers: ◦ IBM = IBM Corp. = IBM Corporation ◦ Univ = University = University of = Universidade = Universidad = Univerzitet = Universita = Universitat = Universiti = Universite = Universitet = Universiteit 2. Using look up tables and hand cleaning to identify common variants (and misspellings) of names used by the same organization: ◦ IBM = Int Buisness Machines = International Business Machines Corporation = Int Business Machines Operation 20