LIS618 lecture 4 Thomas Krichel 2003-10-19. Structure Document preprocessing Practice: Nexis –document preprocessing –segment theory and practice Practice:

Slides:



Advertisements
Similar presentations
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
Advertisements

Using CAB Abstracts to Search for Articles. Objectives Learn what CAB Abstracts is Know the main features of CAB Abstracts Learn how to conduct searches.
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
LIS618 lecture 3 Thomas Krichel Structure Theory: discussion of the Boolean model Theory: the vector model Practice: Nexis.
LIS618 lecture 3 Thomas Krichel Structure Revision of what was done last week. Theory: discussion of the Boolean model Theory: the vector.
LIS618 lecture 6 Thomas Krichel structure DIALOG –basic vs additional index –initial database file selection (files) Lexis/Nexis.
LIS618 lecture 1 Thomas Krichel Structure of talk Recap on Boolean Before online searching Working with DIALOG –Overview –Search command –Bluesheets.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 2 1 Microsoft Office Word 2003 Tutorial 2 – Editing and Formatting a Document.
Knowledge is Empowerment Guide no. 5 Searching MEDLINE Full Text: by Subject, & by Publications. Register in My Ebsco Host & Create Alerts.
Waiting to begin … Click when youre ready!. Sharon Elin Revised 2009 Citing Internet Sources the Easy Way ~ Using Easybib.com.
Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Lesson 13 Editing and Formatting Documents
Jacqueline A. Gill, Associate Professor Slides will change automatically or you may click the screen to move forwards.
1 State Records Center Searching and Requesting Inventory  Versatile web address:  Look for any new ‘Special.
R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
Guidelines for Writing Technical Documents Computer Science 312.
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
Intelligent Information Retrieval CS 336 –Lecture 3: Text Operations Xiaoyan Li Spring 2006.
Punctuation & Grammar., ?; :’!., ?; “” :’!., ?; “” :’!
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
THE NEW NEXIS Jude Hayes, MLS LexisNexis
WMES3103 : INFORMATION RETRIEVAL
Basics Computer Internet Search Strategy. Computer Basics IP address: Internet Protocol Address An identifier for a computer or device on a network The.
Web of Science: An Introduction Peggy Jobe
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
WISER: Newspapers online : an introduction to the scope and range of recent and current newspapers available on Oxlip, including hints on effective search.
Chapter 5: Information Retrieval and Web Search
MIS 300…Information Systems- Theory and Practice Library Instruction Session Dr Bee Yew and Matt Lawson, IT Librarian Charles W. Chesnutt Library Tel:
Guide no. 49 Ten tips on how to search EBSCO databases Tutorial.
Rosalind Moore Entrepreneurship and Small Business Department LIS 620.
LIS618 lecture 5 Thomas Krichel Structure of talk Nexis.com OCLC firstsearch.
LIS618 lecture 2 Thomas Krichel Structure of talk General round trip on theoretical matters, part –Information retrieval models vector model.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
1 ScopusScopus Empowering Your Research. 2 As a Comprehensive Abstracts Database ~18,000 sources (90% peer-reviewed journals) from 5,000 publishers Comprehensive.
Using Electronic Sources to Find Information Kay Grieves Information Services, 2002.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Research & Learning For Libraries and Patrons that need to stay Ahead of the Learning Curve Presenter Name Here Books24x7® for Libraries.
Limits From the initial (HINARI) PubMed page, we will click on the Limits search option. Note also the hyperlinks to Advanced search and Help options.
LIS618 lecture 4 Thomas Krichel Structure Brief discussion of the Dialog worksheet. Document preprocessing Practice: Nexis.
W orkshops in I nformation S kills and E lectronic R esources Oxford University Library Services – Information Skills Training Social Sciences Web of Knowledge.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
LIS618 lecture 4 Thomas Krichel Structure Document preprocessing Practice: Nexis.
LIS618 lecture 8 Credo and Gale Thomas Krichel
Chapter 6: Information Retrieval and Web Search
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
LIS 7450, Searching Electronic Databases Basic: Database Structure & Database Construction Dialog: Database Construction for Dialog (FYI) Deborah A. Torres.
CAB Abstracts. How can I access CAB Abstracts CAB Abstracts can be accessed from the Jotello F Soga Library’s ◦ “The Virtual Library in your office” (
Web of Science: Citation Indexes on the Web Gary Wiggins 9/29/2004.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
Using OARE Search Engines. Environmental Index (EBSCO) Advanced Search.
LIS618 lecture 8 Thomas Krichel Lexis/Nexis Lexis is a specialized legal research service Nexis is primarily a news services adds an important.
LIS618 lecture 4 Thomas Krichel Structure of talk The blue sheet Working with Dialog Nexis.com.
Guide to Lexis. Introduction Lexis provides access to case law from UK, Australia, USA, New Zealand and Canada Lexis provides access to case law from.
Proofing Documents Lesson 9 #1.09.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
IUB Libraries Faculty & Graduate Student Updates Web of Science: Citation Indexes on the Web Presented by Gary Wiggins
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Using Google Scholar Ronald Wirtz, Ph.D.Calvin T. Ryan LibraryDec Finding Scholarly Information With A Popular Search Engine Tool.
Basic Westlaw Training Katherine Jones. What is Westlaw? Westlaw is a vast but easily searchable online database of case law, legislation, legal journals,
TEN TIPS ON HOW TO SEARCH EBSCO DATABASES
Internet Searching: Finding Quality Information
Bountiful High School MAP Ethics Research Project
Search Techniques and Advanced tools for Researchers
DATABASES By: Hanna Ben-Or Phone:
Presentation transcript:

LIS618 lecture 4 Thomas Krichel

Structure Document preprocessing Practice: Nexis –document preprocessing –segment theory and practice Practice: Factiva

document preprocessing There are some operations that may be done to the documents before indexing –lexical analysis –stemming of words –elimination of stop words –selection of index terms –construction of term categorization structures we will look at those in turn in many cases, document preprocessing is not well documented by the provider. but searchers need to be aware of them…

lexical analysis divides a stream of characters into a stream of words seems easy enough but…. should we keep numbers? hyphens. compare "state-of-the-art" with "b-52" removal of punctuation, but "333B.C." casing. compare "bank" and "Bank"

stemming in general, users search for the occurrence of a term irrespective of grammar plural, gerund forms, past tense can be subject to stemming important algorithm by Porter evidence about the effect of stemming on information retrieval is mixed stemming is relatively rare these days.

elimination of stop words some words carry no meaning and should be eliminated in fact any word that appears in 80% of all documents is pretty much useless, but consider a searcher for "to be or not to be". It is better to reduce the index weight of terms that appear very frequently

index term selection In printed indexes, we use nouns only some nouns that appear heavily together can be considered to be one index term, such as "computer science" Dialog deals with this through phrase indexing. Most web engines, however, index all words, and all of the individually

thesauri a list of words and for each word, a list of related words –synonyms –broader terms –narrower terms used –to provide a consistent vocabulary for indexing and searching –to assist users with locating terms for query formulation –allow users to broaden or narrow query

use of thesauri Thesauri are limited to experimental systems, or some high-quality systems, see bin/thesaurus.pl for an example, or look at Nexis It can be confusing to users. Frequently the relationship between terms in the query is badly served by the relationships in the thesaurus. Thus thesaurus expansion of an initial query (if performed automatically) can lead to bad results.

Back to Nexis: word limits The following are always considered word limits –hyphens –slashes –parentheses –spaces

plurals Nexis indexes plural and possive as the singular. But in power search, you can use the following –PLURAL (term) only the plural of term –SINGULAR (term) only the singular of term –ALLCAPS (term) only capitals of term –NOCAPS (term) no capitals of term –CAPS (term) capitalized term only

Document preprocessing in Nexis ampersand: if it is surrounded by blanks, it treats it as "and". If it is not, it treats it as a normal character company(at&t). apostrophe: works if not followed by "s", in which case it is a possessive at-sign: used for sections in case law, ignored otherwise, e.g. in addresses: presidentwhitehouse.com

Document preprocessing in nexis colon and comma are read as a space unless adjacent characters are numbers. hyphen / and \ is read as a space percent and pound sign mean themselves and are not equivalent to anything. " ? $ ; are all ignored ® is replaced by the word "R", is replaced by the word "TM".

equivalents Nexis has a number of "equivalents" where, depending on sources, it replaces one with the other. Contrary to their claims they also work in quick search First (second, third, etc.)is 1st (2nd, 3rd, etc.) Monday (All days ex. Sunday) Mon (Tues, Weds, etc.) January (Abbreviations work) Jan (Feb, Mar, etc.) One (all numbers < 20) 1 (2, 3, etc.) and& companyco corporationcorp incorporated inc

noise and reserved words Noise words are common words –in power search, noise words are ignored, replace by space –in quick search, you can use phrases –no list of noise words Reserved words are –and –or –not used in Boolean expressions. They are not indexed.

Nexis segments Nexis does some document preprocessing for characters, discussed in a later slide. The processed document has a number field/value pairs that are called segments Not every source has every segment. I make a distinction between –native –smart-indexed segments.

some segments in legal docs CITE CLASS DATE common search for any date field FIRST-ACTION date HISTORY ISSUED-BY LAST-ACTION date NAME REFERENCES TEXT full text TITLE same as name TYPE

typical segments in news BYLINE CORRRECTION CORRECTION-DATE DATE DATELINE(not a date) GRAPHIC HEADLINE HIGHLIGHT LEAD HLEAD is HEADLINE, HIGHLIGHT, & LEAD

typical segments in news PUBLICATION name and copyright SECTION SERIES SOURCE TICKER TYPE

typical smart-indexed segments CITY COMPANY COUNTRY GEOGRAPHIC INDUSTRY KEYWORD ORGANIZATION PERSON PRODUCT SUBJECT TICKER TYPE TERMS includes all these

segment search You can place query terms and connectors in a segment and then search for it. Example: hlead((drug or substance) w/10 abuse)

using segments for news uses power search expressions, plus hlead (expression) ? headline (expression) company (expression) for a company byline (expression) for the author show (expression) for a television show transcript expression is a Boolean expression or simple keyword.

power search for legal data uses power search expressions, plus name (expression) for the name of a party cite (expression) for a citation expression for case law title (expression) for the title of a law article expression is a Boolean expression or simple keyword

Search forms There are special forms for –News –Company reports –Market indicators –Portfolio –News and quotes about companies

Personal news alert do a search then click on track in personal news to get to a screen where you can enter –periodicity –what documents to be sent –subject This works for real estate for me.

Real time news This uses a different query language –terms are implicitly ANDed –explicit AND and OR allowed –phrases have to be put in quotes –* starts for any number of characters, not just one as in power search –parenthesis can be used I have poor experience with this.

Summary on Nexis Nexis has a rich set of resources. It can be searched by inexperienced, but likely to get poor result. Clever learning about its features can get you quite far, however, the features are not well documented online. There is not enough detail.

Factiva Nexis is news with legal "stuff". Factiva is news with business "stuff". It will only work with Microsoft Internet Explorer! This violates the most important rule of web site design. It is because the use asp technology. A bad choice!

Login to factiva We have a public account that will serve up to 30 users concurrently up to –user id: mls003 –password: transcripts –name space: 16 asp has the login Sessions time out after 30 minutes.

More on Factiva has –downloadable brochures –case studies –white papers –product tour I looked the broshure "Inside-Out". Well written, ordered copies.

Free text search similar to nexis power search operators "and" "or" "not" "w/i", "near/n" where n is a number. /f/n requires the preceding expression to be in the first n words in the full text. "same" stands for same paragraph "atleastn" requires at least n occurences. "wc" is a word count, use and then a number, e.g. wc<1000.

but as well You can add codes from indexing terms. Note that the + shows that there is more. When you press the triangle the code is dropped into the text box.

Thank you for your attention!