Building a TAPIR-Lite Toolkit for the Global Invasive Species Information Network (GISIN) Jim Graham and Catherine Jarnevich
GISIN Applications GISIN is going to help: –Resource managers find information on new or potential invaders to manage them and prevent invasions more effectively –Data modelers provide improved predictive capability to decision makers to prioritize funding to areas where it will be the most effective –The public understand invasive species and what they can do to minimize damage
Highlights of Survey & Interview Results At least 3 languages/frameworks important (php, asp, jsp) Data providers willing to commit from one hour to “as long as it takes” Minimal web service expertise Various installation scenarios DiGIR did not meet all needs –Complex queries not needed –Database performance problems
Results Since Last Year Initial Web Portal Finalized 3 data models PHP Toolkit Available ASP Toolkit in the works Two meetings held for data providers and standards development
GISIN Web Portal
Protocol Approach TAPIR-Lite –Eliminated complex queries (Key Value Pair Only) –“Flat” data models (no complex hierarchies) Data Models defined for Invasive Species Research and Management
GISIN Protocol Transaction Location Obser. Org2Org1 SQL Query SELECT * FROM Areas JOIN Surveys… JOIN Organisms… WHERE Genus=‘Tamarix’ LatitudeLongitudeDateScientific Name /2/2007Tamarix ramossima /10/1999Tamarix chinensis Request ?Op=Inventory &Model=Occurrences &Count=true &Genus=Tamarix &Concept=Latitude &Concept=Longitude &Concept=Date &Concept=ScientificName Response /12/2000 Tamarix ramosissma …
Data Model Needs Single standardized data models Controlled Vocabularies Flat as possible Includes accuracy, precision, and process information Able to eliminate duplicate records Able to trace data to original source Citations
Controlled Vocabularies Example of ambiguous data: –United States, USA, United States of America, US, Estados Unidoes, etc. GISIN has chosen to use Country Codes: –CA = Canada, NZ = New Zealand, etc. Vocabulary mapping/cross-walking allows fast, reliable searching
Data Models Protocol: Implemented: –SpeciesStatus: Indigenous, Harmful, etc. –Occurrences: X, Y coordinates (DarwinCore) –ResourceURLs: results return URL lists and Language. Defined, but to be reviewed: –ImpactStatus: harm type (enviroment, economy, health) harm impact (strong, weak, unknown) –ManagementStatus: prevention, interception, control, etc. –DispersalStatus: cause of introduction, date, vector, etc.
GISIN Toolkit Needs to be... Easy to install As small as possible Available in multiple programming languages Customizable Testable with built in turn-on tests –Other tests will be available in the portal
GISIN Toolkit Characteristics 100% open source Offers standard URL web service access Databases supported: MySQL, PostGRES, SQL Server, MS-Access, etc. Easy to support and customize –8 files –~3,000 lines of annotated code
Easy to Install PHP Version Now Available for Beta Testing! –One folder installation –PHP 5 required, but no additional extensions –Web-based user interface, to map database fields to the GISIN Data Models Installation requirements: –Ability to copy a folder to a directory on a web server that is accessible from the Internet –Ability to use a web page to configure the service –Understanding of the provider’s database structure Toolkit is preconfigured with a sample database for quick startup and testing
Admin UI
Toolkit Design: Data Flow Provider Web Service Database Connection Provider Database Metadata.xml Capabilities.xml GISIN Internet Web Date Utilities Typically the only files to modify Admin Web Site SQLBuilder Service Provider.xml
Current Providers
What We Have Learned Harvesting is required to resolve performance problems Funding is being sought to manage a centralized database cache and so improve system performance This will allow the toolkit and protocol to be further simplified
Next Steps ASP Version of the Toolkit Testing: –More databases connected –Improved error tracking Portal –Initial harvest model in place –Incremental improvements More standards group meetings More provider meetings
Current Web Sites GISIN Organization Site: –Meeting documents –List of online invasive species databases –Network news GISIN Directory: –Browse Directory –Search for data from providers: BioStatus, Occurrences, ProfileURLs –Technical Information: Edit Registry Get Toolkit Sample Provider (based on the toolkit) Manual exercising of TAPIR-GISIN web services Automated tests are coming!
Acknowledgements Funded by NSF, NBII (USGS), GBIF, TDWG, GEO Thanks to: Jerry Cooper, Renato De Giovanni, Roger Hyam, Donald Hobern, Markus During, Hannu Saarenmaa, Kevin Richards, Peter Fox, Debra McGuiness, Michael Browne, Brian Steves, Pam Fuller, John Pickering, Shawn Dalton, Greg Ruiz, and other GISIN members Contacts: