Download presentation
Presentation is loading. Please wait.
Published byRegina Day Modified over 9 years ago
1
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta Inc
2
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Thanks to Keith Baker Kenneth Baker Michael Bukatin András Kornai
3
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Plan of the talk Database background Relating geographic names and features Handling ambiguities and inconsistencies in geographic names Classification and storage system for geographic features
4
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Databases No DB (faking it with flat files) -- clumsy Record-oriented -- still runs the world Relational -- making headway Object-oriented -- still very academic For MetaCarta GazDB, relational approach made most sense: Overlapping records (McKinley/Denali) Need for frequent updates of subparts of records
5
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Gazetteer production process
6
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Conversion scripts Enforce uniform structure on the data Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …) Configuration required once per source Load data in GazDB Combination perl/SQL
7
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Relating features and names
8
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Other tables used in GazDB Population Elevation Language Feature type Source/versioning info Temporal extent Hierarchical information Confidence Comments Change logs (full auditing)
9
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Geographic names Internationalization Full Unicode (UTF8) support Maintain detail language information (SIL) Name resolution Canonical form (16 bits) Display form (8 bit) Search form (6 bit) Authoritativeness Explicitness
10
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Updating a name in the GazDB
11
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Geographic features Spatial representations Point, line, area, … Functional classes Building, field, campus, city, … Administrative types Nation, province, county, international org, …
12
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Export scripts Read GazDB Select which fields to include in custom output Creates.gbdm (MetaCarta format) binaries Combination perl/SQL Not yet general across binary output formats
13
Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. Conclusions Accept multiple sources (only configure once per source) Fast loading of large datasets (1m entries per hour on linux desktop) Simple update procedure Outputting large binary custom gazetteers for different purposes at extreme speeds (1m entries per minute)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.