Download presentation
Presentation is loading. Please wait.
Published bySybil Richards Modified over 9 years ago
1
INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA
2
WEB Tools and Data Cleaning These tools were developed within the scope of the speciesLink project, so, in some cases, there is a complete dependency on the architecture, the local database, and the libraries that were developed by CRIA. Data Cleaning started as an idea that had not a very clear direction, it became a very particular system.
3
The speciesLink project is being funded by FAPESP (São Paulo state agency) from October, 2001 to October, 2005.
4
Col 1 Col 2 Col 3 Col 4 Col 5 program search interface Win2000 Brahms Linux MySQL Win98 Access Win98 biota FreeeBSD PostgreSQL ? ? ? ? ? Different data sources software and systems
5
Protocol and Content Schema DiGIR protocol (Distributed Generic Information Retrieval) Potential to be globally accepted DiGIR software (Java Portal & PHP Provider) Collaborative development DarwinCore v.2 Covers the basic content elements (taxonomic identification, location and date of collecting event)
6
speciesLink site Presentation Layer speciesLink site Presentation Layer DiGIR Portal (Java) DiGIR Portal (Java) Perl Slow or unstable connectivity Fast and stable connectivity Data SOAP client Collection Management System SQL Collection C Data Repository Data SOAP client Collection Management System SQL Collection B Data Repository Postgres PHP Provider SOAP Server SQL Mirror Server Data PHP Provider Collection Management System SQL Collection A System’s Architecture
7
~40 connected collections ~940.000 on-line records March/2006 JBRJ speciesLink network
8
WEB Tools geoLoc spOutlier infoXY conversor speciesMapper data cleaning
9
About geoLoc to assist biological collections in geo-referencing their data the database includes approximately 110 thousand names of Brazilian localities, obtained from: Brazilian Institute of National Statistics and Geography (IBGE) GEOnet Names Server (GNS) speciesLink/Fapesp algorithm based on concepts in the Egaz program (Shattuck 1997) capable of calculating a coordinate for a distance and direction Tools
10
26 Noroeste-NW Campinas São Paulo
11
Tools About spOutlier to assist biological collections in identifying possible suspect points in existing records uses techniques modified from Chapman 1999 to detect outliers in latitude, longitude and altitude allows users to indicate their data set as either terrestrial or marine useful to biologists around the world who wish to identify possible errors in their data
12
1, -63.25, -4.916666667, 795 2, -67.05, -10.96666667, 805 3, -68.0125, -12.66666667, 809 4, -68.75, -13.60111111, 815 5, -68.9102, -13.83333, 810 6, -72.3666, -14.36611111, 790 7, -78.3166, -14.38916667, 801 8, -72.137, -11.8647, 700
13
marine
14
1, -63.25, -4.91667 2, 34.3239,67.9836 aus, 150.0417,-34.9081 3, -68.0125, -12.6667 4, -22.0400, 63.9514 id_teste, -45, -22 6, -75.3667, -14.3661 7, 71.37, -19.37 eua, -80.8011,26.0506 9,-120.7642,58.7217 10,26.0089,-29.5197 11,-95.3781,16.7639
15
Input/Output: -degrees, min, sec -decimal degrees -UTM DATUM: -WGS84 (World) -SAD69 (Brazil) -Córrego Alegre (SP) -3.5800, 52.0633 34.3239, 67.9836 -45, -22 03d34'47"W, 52d3'47"N 34d19'23"E, 67d59'0"N 44d59'58"W, 21d59'58"S degrees, min, s
16
Plot georeferenced points on a map. Available layers: -World -South and Central America -Brazil -São Paulo State -95.6 -39.5166 -70.2833 -4.2 -70.033333 -4.35 -69.914889 0.274694 -69.7333 -4.2333 -69.6661 -3.908333...
17
Trachurus trachurus Pteroscion pele Gaidropsarus biscayensis
18
Using Data PostgreSQL spOutlier geoLoc SOAP Web service job1job2 Maps PostGIS Maps PostGIS
19
Tools About Data Cleaning Aim at helping curators in identifying possible errors and to standardize data Records are not modified The system just presents "suspect" records
20
Col 1Col 2Col 3Col n National collections Col 1Col 2 Internacional collections... Tables of Suspect Records chart.pm (Perl) Local Database dc_tax dc_geo PostgreSQL Detect Suspect Records Perl Web speciesLink Portal Java How Data Cleaning Works
21
Demonstration on-line
22
Thank you! marino@cria.org.br Obrigado!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.