Download presentation
Presentation is loading. Please wait.
Published byBryce Rogers Modified over 9 years ago
1
Beispielbild OpenUp! BioCASe Workshop Jörg Holetschek, Gabriele Dröge Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories Königin-Luise-Straße 6-8 14195 Berlin BioCASe Workshop Berlin, May 30 th / 31 st 2011
2
2BioCASe Workshop, Berlin, May 30-31st 2011 Agenda Monday 11.00Welcome by Walter Berendsohn, Housekeeping 11.20 – 12.00 The BioCASe Architecture: An Overview 12.00 – 13.00 The BioCASe Provider Software I: An Overview 13.00 – 14.00 Lunch break 14.00 – 15.45 The BioCASe Provider Software II: Installation (Hands-on) 16.00 – 17.00The ABCD data standard: Intention, Structure, Elements, Use 17.00 – 18.00Preparing the database for BioCASe/ABCD 19.00Dinner Tuesday 09.30 – 12.00Setting Up Datasources with the BPS (Hands-on): DB connection, Table Setup, Mapping; Testing, Data Backups 12.00 – 13.00Lunch break 13.00 – 14.30Setting up Networks with BioCASe (Hands-on) 15.00 – 15.30A Thematic BioCASe Network: The DNA Bank Network 15.30 – 17.00Questions (and answers?)
3
3BioCASe Workshop, Berlin, May 30-31st 2011 Workshop Presentation http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt WiFi Network:Conference Key:g59mn3w2
4
Beispielbild 1.BioCASe Technology: Motivation, Idea and Architecture
5
5BioCASe Workshop, Berlin, May 30-31st 2011 Primary Biodiversity Information © Agnes Kirchhoff, J. Holstein et al.
6
6BioCASe Workshop, Berlin, May 30-31st 2011 Primary Biodiversity Data Items -Living specimen -Preserved specimen -Multimedia document (drawing, photo, video, sound) -Observation = Primary Biodiversity Data Record Documentation of the occurrence of one species at a given location at a certain point in time Biological Collection Access Service
7
7BioCASe Workshop, Berlin, May 30-31st 2011 Data sources worldwide -Index Herbariorum: 3,293 herbaria, 400 million herbarium sheets -50-100,000 natural history collections, 1.5-2 billion specimens -With observations added, occurrence records 3+ billion (10b?) Over 75% of biodiversity information are stored in developed countries. Est. 75% of all species are found in the developing world. Source: BARTHLOTT et al. 1999
8
8BioCASe Workshop, Berlin, May 30-31st 2011 Accessibility Stage 0: Only in real world (paper catalogues, just stacks) Only meta information available on the web Stage 1:Stage 2: Online catalogue Digitalization of specimen
9
9BioCASe Workshop, Berlin, May 30-31st 2011 Biodiversity Data Level 3: Networking the databases
10
10BioCASe Workshop, Berlin, May 30-31st 2011 Global Biodiversity Information Facility (GBIF)
11
11BioCASe Workshop, Berlin, May 30-31st 2011 Biological Collection Access Service (BioCASe)
12
12BioCASe Workshop, Berlin, May 30-31st 2011 Architecture of Biodiversity Networks 2. Wrapper Software: BioCASe Provider Software 1.Protocols/Data Standards: BioCASe Protocol/ABCD Data Quality Checker DataMining 3. Applications Data Portal
13
13BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Design Principles No central database Data remain in the existing DB systems Data Provider gets full credit Full control over published data by collection holder Partial publication possible Collection holder can withhold information from publication (e.g., locality data for endangered species) or exclude records (e.g. until research results are published) Wrapper principle Data remain in original collection management system No changes in workflow for curator/local users
14
14BioCASe Workshop, Berlin, May 30-31st 2011 2: The BioCASe ProviderSoftware Wrapper: BioCASe Provider Software Protocols/Data Standards Data Quality Checker DataMining Applications Data Portal
15
15BioCASe Workshop, Berlin, May 30-31st 2011 Software package that „wraps“ around the collection database Equips it with a BioCASe protocol compliant interface 1.Accepts requests from the network 3. Transforms results into ABCD documents and sends them back BioCASe Provider Software (Wrapper) Marmota marmota? 2.Translates queries to the collection database SELECT * FROM specimen WHERE ScientificName LIKE “Marmota marmota%“
16
16BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Provider Software (Wrapper) Compatible with several protocols (BioCASe, DiGIR) and data schemas (ABCD, DarwinCore, ABCD-EFG, ABCD-DNA) Works with most SQL-compliant databases (Access, MySQL, Postgres, SQL Server,...) Currently ~95 production installations serving ~1,500 collections with ~33.5m records to GBIF and BioCASe Platform independent Support available!
17
17BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Providers Worldwide ~95 production installations serving ~1.500 collections
18
18BioCASe Workshop, Berlin, May 30-31st 2011 Requirements 1.SQL compliant database with existing Python connectivity module: MySQL, SQL Server, Postgres, Access, Foxpro, Excel 2.Webserver (preferrably Apache), allowing the execution of Python scripts 3.Privileges to install additional Python packages
19
19BioCASe Workshop, Berlin, May 30-31st 2011 Steps 1.Installing Apache 2.Installing Python 3.Downloading BPS 4.Installing BPS (from repository/archive) 5.Creating the link Apache/BPS 6.Test of Installation 7.Changing directory permissions 8.Setup of additional packages (DB Connectivity Package)
20
20BioCASe Workshop, Berlin, May 30-31st 2011 1. Installing Apache http://httpd.apache.org/download
21
21BioCASe Workshop, Berlin, May 30-31st 2011 2. Installing Python http://www.python.org/download/
22
22BioCASe Workshop, Berlin, May 30-31st 2011 3. Downloading BPS Archive: http://www.biocase.org/products/provider_software/http://www.biocase.org/products/provider_software/ Subversion repository Latest stable version: http://ww2.biocase.org/svn/bps2/branches/stable Defined version: http://ww2.biocase.org/svn/bps2/tags/release_2.5.3 http://ww2.biocase.org/svn/bps2/branches/stable http://ww2.biocase.org/svn/bps2/tags/release_2.5.3 Linux: svn co Windows: Tortoise client
23
23BioCASe Workshop, Berlin, May 30-31st 2011 4. Installing the BPS Setup.py No files copies, only adapted!
24
24BioCASe Workshop, Berlin, May 30-31st 2011 5. Linking BPS with Apache http.conf
25
25BioCASe Workshop, Berlin, May 30-31st 2011 6. Testing BPS, Installing Additional Packages http://localhost/biocasehttp://localhost/biocase Utilities Library Test
26
26BioCASe Workshop, Berlin, May 30-31st 2011 6. Write permissions …/bps2/configuration …/bps2/log
27
27BioCASe Workshop, Berlin, May 30-31st 2011 7a: mysqldb http://sourceforge.net/projects/mysql-python/
28
28BioCASe Workshop, Berlin, May 30-31st 2011 Changing the Password... /bps/configuration.ini
29
29BioCASe Workshop, Berlin, May 30-31st 2011 3: ABCD Standard Protocols/Data Standards Wrapper Software Data Quality Checker DataMining Applications Data Portal
30
30BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Data Schema Access to Biological Collection Data: Data schema for all types of primary biodiversity data (living/preserved/observational, botanical/zoological/bacterial/viral, marine/terrestrial) XML (eXtensible Markup Language) based can be consumed by humans and machines Highly complex, hierarchical, currently 1,055 data elements almost every data item will fit in Extendable (plug-in slot for additional information) standard (currently version 2.06)
31
31BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Structure Namespace: http://www.tdwg.org/schemas/abcd/2.06
32
32BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Technical/Content Contact
33
33BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Description
34
34BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Coverage
35
35BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Revision/Version
36
36BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Ownership
37
37BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata: Intellectual Property Rights
38
38BioCASe Workshop, Berlin, May 30-31st 2011 ABCD Metadata
39
39BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Triple ID, Record Basis
40
40BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Identification (multiple)
41
41BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Gathering Event
42
42BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Multimedia OpenUp: Thumbnails will be created Always provide link to image file!
43
43BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Unit Associations
44
44BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: Specialised Portions Specimen Unit: Acquisition, Accession, Peparation, Duplicate Distribution, Type Status Herbarium Unit: Loan Information Botanical Garden Unit: Location in Garden, Hardiness, Lineage, Cultivation, Planting Date Other Specialised Subtrees for Observations Culture Collections Mycological Units Zoological Units Paleontological Units Plant Genetic Resources
45
45BioCASe Workshop, Berlin, May 30-31st 2011 ABCD: UnitExtension Own Namespace for Extension http://www.chah.org.au/schemas/hispid/5http://www.chah.org.au/schemas/hispid/5 Other Extensions: Extension for Geoscienes (ABCD-EFG) DNA Bank Network (ABCD-DNA)
46
46BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol Biological Collection Access Service Protocol: Manages data exchange between data providers (collections) and applications (data portals) Vehicle for transporting requests: data portal collection and responses (ABCD documents): collection database data portal XML based
47
47BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Capabilities request
48
48BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Inventory Request
49
49BioCASe Workshop, Berlin, May 30-31st 2011 BioCASe Protocol: Search Request
50
Beispielbild 4. Preparing the database for BioCASe
51
51BioCASe Workshop, Berlin, May 30-31st 2011 4. Reasons for not publishing the live DB 1.Publishing the live DB is not desired creating snapshots for publication 2.DBMS not accessible for the BPS export into another DBMS 3.Performance considerations (too highly normalized) partial, controlled denormalization 4.Repeatable elements kept in columns, not in separate rows Moving repeatable elements to separate records
52
52BioCASe Workshop, Berlin, May 30-31st 2011 Each repeatable elements needs its own primary key! Repeatable elements kept in columns specimen_id...classorderfamily 3476...ConjugatophyceaeDesmidiales Desmidiaceae 3477...ConjugatophyceaeDesmidiales Desmidiaceae 3478...ConjugatophyceaeDesmidiales Closteriaceae specimen_id... 3476... 3477... 3478... sp_idht_entryht_rankht_name 3476456765classConjugatophyceae 3476456766orderDesmidiales 3476456767family Desmidiaceae 3477456768classConjugatophyceae 3477456769orderDesmidiales 3477456770family Desmidiaceae 3478456771classConjugatophyceae 3478456772orderDesmidiales 3478456773family Closteriaceae
53
53BioCASe Workshop, Berlin, May 30-31st 2011 Example View CREATE VIEW [dbo].[vwHigherTaxa] AS SELECT 'k_' + [EDIT_ATBI_RecordID] AS id, [EDIT_ATBI_RecordID] AS unit_id, [kingdom] AS name, 'kingdom' AS rank FROM unit_data WHERE [kingdom] IS NOT NULL UNION SELECT 'p_' + [EDIT_ATBI_RecordID], [EDIT_ATBI_RecordID], [phylum], 'phylum‚ FROM unit_data WHERE [phylum] IS NOT NULL UNION...
54
54BioCASe Workshop, Berlin, May 30-31st 2011 Commonly used repeatable elements - Identification - HigherTaxon - GatheringSite/NamedArea - Metadata/Scope/GeoecologicalTerms - Metadata/Scope/TaxonomicTerms - MultimediaObjects - MeasurementsOrFacts -...
55
55BioCASe Workshop, Berlin, May 30-31st 2011 Controlled Denormalization insert into [dbo].[abcd_Object] SELECT dbo.CollectionObject.CollectionObjectID, ISNULL(dbo.CatalogSeries.SeriesName, '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.SubNumber AS nvarchar(20)), '') + '-' + ISNULL(CAST(dbo.CollectionObjectCatalog.CatalogNumber AS nvarchar(20)), ''), dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID), dbo.f_getCollectingEventID(dbo.CollectionObject.CollectionObjectID), dbo.f_getFieldNumber(dbo.CollectionObject.CollectionObjectID), cast(dbo.CollectionObjectCatalog.CatalogNumber as int), dbo.CollectionObject.PreparationMethod, case when Sex = ' ' then NULL else Sex end, case when Stage = ' ' then NULL else Stage end, case when dbo.CollectionObject.Text1 is null then '' else 'Barcode: ' + dbo.CollectionObject.Text1 + '; ' end + case when dbo.Accession.Number is null then '' else 'Specimen Location: ' + dbo.Accession.Number end + case when DerivedFrom.Remarks is null then '' else ' ' + cast(DerivedFrom.Remarks as nvarchar(2000)) end FROM dbo.BiologicalObjectAttributes RIGHT OUTER JOIN dbo.CollectionObject ON dbo.BiologicalObjectAttributes.BiologicalObjectAttributesID = dbo.f_getParentID(dbo.CollectionObject.CollectionObjectID) LEFT OUTER JOIN dbo.CollectionObjectCatalog LEFT OUTER JOIN dbo.CatalogSeries ON dbo.CollectionObjectCatalog.CatalogSeriesID = dbo.CatalogSeries.CatalogSeriesID ON dbo.CollectionObject.CollectionObjectID = dbo.CollectionObjectCatalog.CollectionObjectCatalogID LEFT JOIN dbo.Accession on Accession.AccessionID = CollectionObjectCatalog.AccessionID LEFT JOIN dbo.CollectionObject AS DerivedFrom ON CollectionObject.DerivedFromID = DerivedFrom.collectionObjectID WHERE (dbo.f_hasChildObjects(dbo.CollectionObject.CollectionObjectID) = 0) AND...
56
56BioCASe Workshop, Berlin, May 30-31st 2011 How Do I See Someting is Wrong? Errors in ABCD documents: Several datasets (one for each unit) Reason: Metadata field stored in Units table (no separate PK several datasets need to be created) Several units for one specimen record Reason: Several records in DB for non-repeatable elements (several ABCD objects are necessary to create a valid document)
57
Beispielbild 5. Setting Up a BioCASe Data Source: Database connection, Table Setup, Schema Mapping
58
58BioCASe Workshop, Berlin, May 30-31st 2011 BPS Datasource URL for a BioCASe protocol compliant webservice: http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=AlgenEngels search http://www.tdwg.org/schemas/abcd/2.06 http://www.tdwg.org/schemas/abcd/2.06 A* false
59
59BioCASe Workshop, Berlin, May 30-31st 2011 BPS QueryForms Tool for sending Scan, Search and Capabilities Requests to a datasource Choose Datasource „Test and Debug“
60
60BioCASe Workshop, Berlin, May 30-31st 2011 Steps for Setting Up a Datasource 1.Create a new Datasource 2.Configure Datasource: 1. Database Connection 2. Table Setup 3. Create new empty Mapping 4. Edit Mapping: 1. Choose root table 2. Edit mandatory ABCD elements (red) 3. Save Configration, test datasource (QueryForms) 4. Add additional ABCD elements, occasional testing 3.Test/Debug Datasource
61
61BioCASe Workshop, Berlin, May 30-31st 2011 FloraExsiccataBavarica: Additional Fields ConceptTable/Column Metadata/… Description/Representation/Detailsmetadata.description (text) IconURImetadata.logo_url (text) Version/Majormetadata.source_version (text) Metadata/IPRStatements/… Citations/Citation/Textmetadata.citationsText (text) Copyrights/Copyright/Textmetadata.copyright (text) Disclaimers/Disclaimer/Textmetadata.disclaimer (text) Acknowledgements/Acknolwedgement/Textmetadata.acknowledgement (text) TermsOfUseStatements/TermsOfUse/Textmetadata.terms_of_use (text) Units/Unit/Gathering/… Agents/GatheringAgent/Person/FullNameunit.sammler (text) Altitude/MeasurementOrFactTextunit.hoehe (text) + “m” Altitude/MeasurementOrFactAtomised/LowerValueunit.hoehe (text) Altitude/MeasuremntOrFactAtomised/UnitOfMeasurement“m” Country/ISO3166Code“DE” Country/Name“Germany” DateTime/DateTextunit.datum1 (text) LocalityTextunit.fundort (text) NamedAreas/NamedArea/AreaClass“State” NamedAreas/NamedArea/AreaName“Bavaria”
62
62BioCASe Workshop, Berlin, May 30-31st 2011 How The BPS performs requests 1.Get an ID list of records matching the filter 2.Loading all details for the matching IDs Joining of ALL tables, beginning with the root table (table with UnitID, one record per Unit)
63
63BioCASe Workshop, Berlin, May 30-31st 2011 Typical Mapping Errors -Incomplete Mappings -Missing explicit mappings for implicit knowledge (e.g. Country = “Germany” for a German collection) -Abusing the MultimediaObject for non-multimedia Documents (e.g. Links to taxon pages) -Providing “0” values for non-existent data
64
64BioCASe Workshop, Berlin, May 30-31st 2011 Datasource Loglevel The lower the loglevel, the more information is logged: Debug < Info < Warning < Error Datasource Configuration Settings
65
65BioCASe Workshop, Berlin, May 30-31st 2011 Datasources folder... /configuration/datasources/ querytool_prefs.xml Just what its name says. cmf_xxx.xml Concept mapping; one for each supported schema. provider_setup_file.xml Database conncetion, table setup, supported schemas. Regular backup of configuration folder is highly recommended!
66
66BioCASe Workshop, Berlin, May 30-31st 2011 Metadata tables If metadata differ for each or some of the records: several records in metadata table, linked to unit by foreign key If metadata is unique for all records possible to hold data in one record no reference key is needed static table
67
67BioCASe Workshop, Berlin, May 30-31st 2011 Applications 2. Wrapper Software 1. Protocols/Data Standards Data Quality Checker DataMining 3. Applications Data Portal
68
68BioCASe Workshop, Berlin, May 30-31st 2011 Local QueryTool
69
69BioCASe Workshop, Berlin, May 30-31st 2011 Distributed Search: BioCASe Simple UI BioCASe Distributed Search: http://search.biocase.org/simple-uihttp://search.biocase.org/simple-ui
70
70BioCASe Workshop, Berlin, May 30-31st 2011 Harvesting: GBIF Data Portal
71
71BioCASe Workshop, Berlin, May 30-31st 2011 GBIF Registration
72
72BioCASe Workshop, Berlin, May 30-31st 2011 GBIF Indexing History
73
73BioCASe Workshop, Berlin, May 30-31st 2011 EDIT Specimen Explorer: Interactive filters
74
74BioCASe Workshop, Berlin, May 30-31st 2011 Distributed Search vs. Harvesting Distributed Search + No harvesting application/database required + No Delay with data updates (instantly visible) - Dependent on Provider Availability - Slow - No data verification - No maps, taxon lists, … Harvesting - Need for a harvester/cache database - Delays when records get updated/added/removed + No heavy dependency on provider availability + Fast (as long as your portal is) + Data verification/improvements/transformation in harvesting process + Maps, suggestion lists, Interactive filters, …
75
75BioCASe Workshop, Berlin, May 30-31st 2011 OpenUp! Harvesting BioCASE OpenUp! Harvester OAI-PMH Harvester ABCD ESE EDM
76
76BioCASe Workshop, Berlin, May 30-31st 2011 Jörg Holetschek, Gabriele Dröge Botanischer Garten & Botanisches Museum Abteilung Biodiversitätsinformatik & Labors Königin-Luise-Straße 6-8 14195 Berlin-Dahlem j.holetschek@bgbm.org j.holetschek@bgbm.org Tel. +49 30 838 50150 0448 831 980 www.bgbm.org/biodivinf www.biocase.org search.biocase.org search.biocase.de http://www.biocase.org/files/BioCASe_Workshop_Berlin_2011.ppt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.