BioGeomancer: Semi-automated Georeferencing Engine

Slides:



Advertisements
Similar presentations
Georeferencing: Collaboration and Automation
Advertisements

Lecture 6 Nondeterministic Finite Automata (NFA)
© Copyright 2011 John Wiley & Sons, Inc.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Good and Bad Locality Descriptions Elements and Examples.
What you don’t know can hurt you: uncertainties in georeferencing John Wieczorek Museum of Vertebrate Zoology University of California, Berkeley.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Chapter 9 Hypothesis Testing Testing Hypothesis about µ, when the s.t of population is known.
1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation.
1 Turing Machines. 2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
Turing Machines.
1 More Applications of the Pumping Lemma. 2 The Pumping Lemma: Given a infinite regular language there exists an integer for any string with length we.
Crash Course in Georeferencing Michelle Koo, Carol Spencer, Andrew Reagan, Lauren Scheinberg.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Nelson E. Rios Tulane University Museum of Natural History Geospatially Enabling Natural History Collections Data.
Community Building and Collaborative Georeferencing using GEOLocate Nelson E. Rios & Henry L. Bart Jr. Tulane University Museum of Natural History.
Concept of Map Projection Presented by Reza Wahadj University of California,San Diego (UCSD)
Concept of Map Projection. Map Projection A map projection is a set of rules for transforming features from the three- dimensional earth onto a two-dimensional.
Georeferencing Workshop Rebecca J. Rowe University of Chicago Committee on Evolutionary Biology & Division of Mammals The Field Museum.
Biological data: georeferencing Monica Papeş University of Kansas
Georeferencing Concepts. MaNIS/HerpNET/ORNIS (MHO) Guidelines Uses point-radius representation of georeferences Circle.
Test plans CSCI102 - Systems ITCS905 - Systems MCS Systems.
Michelle Koo, Carol Spencer, David Bloom, Nelson Rios Museum of Vertebrate Zoology (UC Berkeley), VertNet, & Tulane University Georeferencing Introduction:
Dave Bloom Museum of Vertebrate Zoology University of California, Berkeley Georeferencing Introduction: Collaboration to Automation.
Parallels of Latitude Meridians of Longitude Graticular Network Georeferencing Using MaNIS/HerpNET/ORNIS Guidelines.
Automated Georeferencing of Natural History Museum Data Nelson E. Rios Discussion The Tulane University Fish Collection, with 7.1 million fluid-preserved.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 School of Innovation, Design and Engineering Mälardalen University 2012.
John Wieczorek (for BGWG) Museum of Vertebrate Zoology University of California, Berkeley BioGeomancer: Collaboration to Automation.
What is a georeference? A numerical description of a place that can be mapped.
Best Practices for Managing Historical Imagery Cody Benkelman Kumar Dhruv.
Georeferencing Methods. 1) Read Guidelines: Point-radius method Point radius method for georeferencing locality descriptions and calculating associated.
Interoperability Georeferencing Web Services. Current Services: Georeferencing: – GEOLocate – BioGeomancer Validation: – Taxonomic Footprint Validation.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Example with shaded urban area: Davis, California  Linear extent- Defined as the distance from the geographic center of the location to the furthest point.
Paper Maps Review. Georeferencing Process 1.Find the most specific named place. 2.Determine the coordinates and datum of the named place using orthogonal.
Train-the-Trainers 2 Workshop Overview August, 2013 iDigBio, Gainesville, Florida (What have we gotten ourselves into?)
1 Chapter 9 Undecidability  Turing Machines Coded as Binary Strings  Universal Turing machine  Diagonalizing over Turing Machines  Problems as Languages.
The Role of Semantics and Terminologies in a Service-Oriented Architecture Paul Smits, Michael Lutz European Commission – DG Joint Research Centre Ispra,
1 Turing Machines. 2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
1 Turing Machines. 2 The Language Hierarchy Regular Languages Context-Free Languages ? ?
BioGeomancer: Semi-automated Georeferencing Engine John Wieczorek, Aaron Steele, Dave Neufeld, P. Bryan Heidorn, Robert Guralnick, Reed Beaman, Chris Frazier,
Lecture 15: Theory of Automata:2014 Finite Automata with Output.
Transition Graphs.
MaNIS/HerpNET/ORNIS Guidelines & How To Write Localities
CSE 105 theory of computation
Chapter 9 Hypothesis Testing
Systems Analysis and Design 5th Edition Chapter 4. Use Case Analysis
Systems Analysis and Design 5th Edition Chapter 4. Use Case Analysis
Spatial Analysis and Functions
OTHER MODELS OF TURING MACHINES
Systems Analysis and Design 5th Edition Chapter 4. Use Case Analysis
Chapter 9 TURING MACHINES.
Lecture 12: Data Wrangling
Non-Deterministic Finite Automata
BioGeomancer: Semi-automated Georeferencing Engine
Intro to Data Structures
Georeferencing Concepts
Basic Text Processing: Sentence Segmentation
Georeferencing Introduction: Collaboration to Automation
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
CSC312 Automata Theory Transition Graphs Lecture # 9
Instructor: Aaron Roth
Databases and Information Management
CSE 105 theory of computation
BioGeomancer: Semi-automated Georeferencing Engine
Confidence Intervals Usually set at 95 % What does that mean ?
Good and Bad Locality Descriptions
Presentation transcript:

BioGeomancer: Semi-automated Georeferencing Engine John Wieczorek, Aaron Steele, Dave Neufeld, P. Bryan Heidorn, Robert Guralnick, Reed Beaman, Chris Frazier, Paul Flemons, Nelson Rios, Greg Hill, Youjun Guo

Spatially Challenged Occurrence Data LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

Spatially Enabled Occurrence Data

Input - Verbatim Locality Strings LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

Legacy Locality Data Issues Treat locality description as accurate Treat locality description as complete

Legacy Locality Data Issues Treat locality description as accurate Treat locality description as complete We need these to start processing.

Legacy Locality Data Issues Treat locality description as accurate Treat locality description as complete We need these to start processing. These are assumptions we should not hold to be true.

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation There is more than one way to accomplish string interpretation.

Locality Interpretation Methods Regular expression analysis GeoLocate - Tulane Enhanced BioGeomancer Classic – Yale Machine Learning/Natural Language Processing U. Illinois, Urbana-Champagne Inxight Software, Inc.

Locality Types F – feature P – path FO – offset from a feature, sans heading FOH – offset from feature at a heading FO+ – orthogonal offsets from a feature FPOH – offset at a heading from a feature along a path 31 other locality types known so far

Five Most Common Locality Types* 51.0% - feature 21.4% - locality not recorded 17.6% - offset from feature at a heading 8.6% - path 5.8% - undefined *based on 500 records randomly selected from the 296k records georeferenced manually in the MaNIS Project.

Clause Subset of a locality description to which a locality type can be applied.

Step 1: Define Clause Boundaries LA PEÑITA; 5.5. KM N Baird Mtns.; Salmon R. headwaters CALIENTE MOUNTAIN 10 MI SW CANAS, RIO HIGUERON near Sedan 4.4 MI N, 6.2 MI W SEMINOLE

Step 1: Define Clause Boundaries <LA PEÑITA; 5.5. KM N>

Step 1: Define Clause Boundaries <LA PEÑITA; 5.5. KM N> <Baird Mtns.; >

Step 1: Define Clause Boundaries <LA PEÑITA; 5.5. KM N> <Baird Mtns.; ><Salmon R. headwaters>

Step 1: Define Clause Boundaries <LA PEÑITA; 5.5. KM N> <Baird Mtns.; ><Salmon R. headwaters> <CALIENTE MOUNTAIN> <10 MI SW CANAS, ><RIO HIGUERON> <near Sedan> <4.4 MI N, 6.2 MI W SEMINOLE>

Step 2: Determine Locality Types <FOH>LA PEÑITA; 5.5. KM N</FOH>

Step 2: Determine Locality Types <FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F>

Step 2: Determine Locality Types <FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS>

Step 2: Determine Locality Types <FOH>LA PEÑITA; 5.5. KM N</FOH> <F>Baird Mtns.; </F><PS>Salmon R. headwaters</PS> <F>CALIENTE MOUNTAIN</F> <FOH>10 MI SW CANAS, </FOH><P>RIO HIGUERON</P> <NF>near Sedan</NF> <FO+>4.4 MI N, 6.2 MI W SEMINOLE</FO+>

Step 3: Interpret Clauses <FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

Step 4: Find Feature Descriptions <FOH>LA PEÑITA; 5.5. KM N</FOH> Feature: LA PEÑITA Offset: 5.5 Offset Units: KM Heading: N

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate This is another assumption we should not hold to be true.

“Davis, Yolo County, California” testing slide 2

“Davis, Yolo County, California” testing slide 2

“Davis, Yolo County, California” testing slide 2

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate Apply rules for spatial description building

Step 5: Construct Spatial Description for Each Clause

Step 5: Construct Spatial Description for Each Clause West of B

Step 6: Construct Final Spatial Interpretation 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P>

Step 6: Construct Final Spatial Interpretation 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true.

Step 6: Construct Final Spatial Interpretation 10 MI SW CANAS, RIO HIGUERON Clause 1: <FOH>10 MI SW CANAS, </FOH> Clause 2: <P>RIO HIGUERON</P> We hold these clauses to be simultaneously true. The final spatial description is the intersection of the spatial descriptions of all clauses.

Legacy Locality Data Issues Treat locality description is accurate Treat locality description as complete Apply rules for locality string interpretation Treat spatial data references as accurate Apply rules for spatial description building Apply criteria to reject unwanted hypotheses

Additional Input - Preferences Assume terrestrial locations Assume aquatic locations marine only freshwater only Assume direct offsets Assume offsets by road, if possible

Output Original data Zero, one, or more spatial interpretations - spatial footprint - point-radius description Process metadata preferences (e.g., GeoLocate method, assume by road) omissions (e.g., unused information) confidence values

Conclusion Georeferences are hypotheses Hypotheses require testing Tested hypotheses should be so noted