Download presentation
Presentation is loading. Please wait.
Published byPaul Moody Modified over 9 years ago
1
Alexandria Digital Library Project Textual-Geospatial Integration Project J AMES F REW University of California, Santa Barbara
2
Textual-Geospatial Integration 2Frew JCDL 2002 gazetteer workshop 2002-07-18 Geospatially-Augmented Search o What’s here? Find library objects associated with a given location: –Place name(s) –“Footprint” (geographic extent) o Where’s this? Find the location(s) associated with a given library object
3
Textual-Geospatial Integration 3Frew JCDL 2002 gazetteer workshop 2002-07-18 Examples (from TREC-9) o Find documents that contain residential real estate listings within New Jersey. o Find reports on automobile traffic in the Washington, DC metropolitan area. o What forms of entertainment are available in Newport Beach, California?
4
Textual-Geospatial Integration 4Frew JCDL 2002 gazetteer workshop 2002-07-18 Why Is GAS ® Difficult? o Few library objects have explicit locations Assigned reliably Identified in object’s metadata o Many objects (especially text documents) have implicit locations Present in, or inferable from, object’s content Not necessarily identified as locations
5
Textual-Geospatial Integration 5Frew JCDL 2002 gazetteer workshop 2002-07-18 “Where’s This” Service PARSE LOOKUP ANALYZE EVALUATE text document type thesaurus gazetteer potential names, types, coordinates gazetteer entries (known places) ranked footprints and placenames “best” name(s) composite footprint
6
Textual-Geospatial Integration 6Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing o Extract “geographic facts” from text o Characterize by Potential place component –name, type, footprint Related fact (with preposition) –“in …”, “northeast of …”, etc. Frequency Importance Context
7
Textual-Geospatial Integration 7Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing Example (1/2) (California,,,,1,K) (Callahan,,,(in,California),1,K) (Callahan-Yreka,,,(area of,),1,T) (Early Cambrian,,,,1,B) (Klamath Mountains,,,(eastern,),1,T) (Klamath Mountains,,,(within,),1,B) (Klamath Mountains,,,,1,K) (Northern California,,,,1,T) (Ordovician,,,,1,B) (Ordovician,,,,1,K) (Paleozoic,,,(in,California),1,B) (Paleozoic,,,,1,K)
8
Textual-Geospatial Integration 8Frew JCDL 2002 gazetteer workshop 2002-07-18 Geo-parsing Example (2/2) (Silurian,,,,1,K) (Siskiyou County,,,(in,California),1,K) (Skookum Gulch,,,,1,K) (Skookum Gulch,,,,1,T) (Skookum Gulch,,,,2,B) (United States,,,,1,K) (Yreka,,,(in,California),1,K) (,fault,,,2,B) (,rocks,,,6,B) (,,N410000N420000W1220000W1230000,,1,C) (,,,(in,North America),1,B)
9
Textual-Geospatial Integration 9Frew JCDL 2002 gazetteer workshop 2002-07-18 Lookup Example: Feature Type o Fault: partial match: fault zones o Rocks: use: natural rock formations
10
Textual-Geospatial Integration 10Frew JCDL 2002 gazetteer workshop 2002-07-18 Lookup Example: Gazetteer Place Name exact partial Skookum Gulch1 0 Klamath Mountains1 0 Northern California1 0 California1492 Callahan*1 1 Silurian0 5 Siskiyou County*1 14 United States1273 Yreka*1 12 North America0 8 *within footprint of California
11
Textual-Geospatial Integration 11Frew JCDL 2002 gazetteer workshop 2002-07-18 Analysis Criteria o Placement in document e.g. keywords, title > body o Frequency in document o Exact match in gazetteer o Accuracy of gazetteer footprint e.g. points < bounding boxes o Scale of gazetteer footprint Size of focus area / size of footprint
12
Textual-Geospatial Integration 12Frew JCDL 2002 gazetteer workshop 2002-07-18 Analysis Example: Results o High confidence Callahan in California Yreka in California Skookum Gulch Klamath Mountains (eastern) Siskiyou County o Low confidence Northern California United States North America
13
Textual-Geospatial Integration 13Frew JCDL 2002 gazetteer workshop 2002-07-18 Evaluation Example Skookum GluchKlamath MountainsCaliforniaCallahan in California Siskiyou County in CaliforniaUnited StatesYreka in California Additional placenames Shasta Butte City Yreka City Thompson's Dry Diggings Eastern Klamath Mountains Area of Callahan-Yreka Skookum Gulch Derived footprint
14
Textual-Geospatial Integration 14Frew JCDL 2002 gazetteer workshop 2002-07-18 “What’s Here” Service Gazetteer AIRE Document Ranker User Interface Query Parser Query Expansion Example Query: Bodies of Water near Chicago Expansion Terms: Lake Michigan, Chicago River
15
Textual-Geospatial Integration 15Frew JCDL 2002 gazetteer workshop 2002-07-18 Manual Relevance Feedback Gazetteer AIRE Query Parser User Interface Place Names “Chicago” Spatial Synonyms “Chicago, IL” “Chicago River” Query
16
Textual-Geospatial Integration 16Frew JCDL 2002 gazetteer workshop 2002-07-18 Automatic Relevance Feedback Gazetteer AIRE Document Ranker RF System Place Names, Surrounding Type Terms “Bodies of Water” Spatial Query Results “Chicago River, Lake Michigan” Expanded Query
17
Textual-Geospatial Integration 17Frew JCDL 2002 gazetteer workshop 2002-07-18 “What’s Here” Components o Place names footprints Requires: place name ranking scheme –Chicago, IL > Chicago tectonic plate in Brazil o Type terms classes Requires: class thesaurus API –“Bodies of Water” “Water Bodies” o 3. Gazetteer spatial synonyms Requires: gazetteer API; results ranking –“Bodies of Water near Chicago” set of gazetteer queries
18
Textual-Geospatial Integration 18Frew JCDL 2002 gazetteer workshop 2002-07-18 The Light at the End of the Tunnel o You submit: a document o You get: a place –Best –Also-rans –Alternatives o What you do with this is your business
19
Textual-Geospatial Integration 19Frew JCDL 2002 gazetteer workshop 2002-07-18 Brought To You By o UCSB Linda Hill Greg Janée Dave Valentine Satoshi Ikeda (Japan Patent Office) o IIT Steven Beitzel Ophir Frieder David Grossman Eric Jensen Vasif Shaikh
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.