The BHL way to content William Ulate BHL Technical Director Global BHL Coordinator Leiden, Netherlands February 14, 2013.

Slides:



Advertisements
Similar presentations
AUSTRALIA’S VIRTUAL HERBARIUM
Advertisements

1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
1 Integrating Arab e-Infrastructure in a Global Environment December 2011 Amman, Jordan KM, ICTs for development and e-learning initiatives in the.
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
Environmental Sciences & Pollution Management Coverage Subjects include: agricultural biotechnology, air quality, aquatic pollution, bacteriology, ecology,
National Diet Library Digital Archive Portal - PORTA - Gateway to digital information in Japan April 3, 2008 Hideki Takeuchi Planning.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Programs and Research Public Private Agreements for Mass Digitisation Ricky Erway JISC Digitisation Conference July 2007.
Cybertaxonom y Vincent S. Smith The use of computers and networks in a program of taxonomic research.
International Children’s Digital Library (ICDL)
EU Bookshop A single access to the official general publications of the European Union Dr. Silke Stapel - Publications Office Luxembourg/ Brussels, December.
IUFRO International Union of Forest Research Organizations Eero Mikkola Results of WP2 – Report Introduction to the work of WP2: Metadata, Keywords and.
Social Sciences Collections & Research: a new content-based team Gillian Ridgley, Ian Cooke, Jerry Jenkins.
APLAWS Content Management System What is content? Content is a resource Content -articles -reports -pictures -audio - Call each of these a content.
CLDs, stewardship, resource discovery and collections management (hmm…catchy) Nick Poole ICT Adviser Resource: The Council for Museums, Archives and Libraries.
Purposeful gaming and BHL: engaging the public in improving and enhancing access to digital texts.
Lorcan Dempsey OCLC Big Heads – Heads of Technical Services of Large Research Libraries ALA 2013 Chicago 28 June things about
[ 1 ] © 2011 iParadigms, LLC Benefits for Teaching. Impact on Learning. Introduction to Turnitin.
More than a thousand publishers participate in JSTOR.
Don’t make me think Biodiversity data publishing made easy Vince Smith, Alice Heaton, Laurence Livermore, Simon Rycroft, Ben Scott & Lyubomir Penev* The.
OpenUp! General Overview. OpenUp! – What it aims at: Because access to multimedia resources from natural history collections in Europe.
OpenUp! General Overview. OpenUp! – What it aims at: Because access to multimedia resources from natural history collections in Europe.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
WEB OF KNOWLEDGE 5.2
What’s Changing, What’s New? Eric Pepper SPIE Director of Publications
How the University Library can help you with your term paper
Sylvia OrliSylvia Orli Department of BotanyDepartment of Botany National Museum of Natural HistoryNational Museum of Natural History Smithsonian InstitutionSmithsonian.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
1 L U N D U N I V E R S I T Y a home grown, bespoke institutional Federated Search tool JIBS Conference at The John Rylands University Library,
Trish Rose-Sandler, Missouri Botanical Garden TDWG Oct 2013 Florence Italy Art of Life project Finding a goldmine of natural history illustrations within.
Web of Science Search and Navigation in the Web of Knowledge
Converging parallel universes Library services as building blocks of digital humanities research 42nd LIBER Annual Conference Munich June 2013 Gregor Horstkemper.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Biodiversity Heritage Library by Connie Rinaldo. Overview History EOL/BHL: WHY? Members/Collaborators Process Governance Sustainability: Legal and Financial.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Link yourself or perish? PhytoKeys, the next generation journal in systematic botany Lyubomir Penev 1, W. John Kress 2, Sandra Knapp 3, De-Zhu Li 4, Susanne.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
New Innovative Access to Educational and Cultural Multimedia Contents Yuka Egusa Educational Resources Research Center, National Institute for Educational.
OPEN ACCESS IN CONFLICT WITH COPYRIGHT AND TECHNICAL BARRIERS By Dr. Ta Ba Hung Director, NACESTI, Vietnam 2 nd International IFLA Presidential Meeting.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Data Citation Mechanism and.
The TARO Project Texas Archival Resources Online Fred Gilmore Sr Operating Systems Specialist UT Austin General Libraries April.
The Pensoft Journal System and XML-based workflow Lyubomir Penev Life and Literature Conference, Chicago 2011 ViBRANT Virtual Biodversity.
Next steps for BHL and Linked Data John Mignault Technical Advisory Group Biodiversity Heritage Library
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Tom Garnett April 12, 2007 Smithsonian Institution Libraries National Museum of Natural History Board Science Committee Meeting Biodiversity Heritage Library.
Breakouts. Penguins: Skunks: Cacti: Beetles: Classroom A - Suzanne Classroom C - Chris Lecture Hall 2 - Connie Ward Lecture Hall - Marie (Theme: Content.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Crowd-sourcing the creation of “articles” within the Biodiversity Heritage Library Bianca Crowley Trish Rose-Sandler
Building Digital Bridges Wellcome Arabic Manuscripts:
TDWG 2006 Conference, St Louis Digitizing the legacy literature of biodiversity An introduction to the Biodiversity Heritage Library (BHL) Neil Thomson.
China and the West Database: China and the West Cultural Relations between China and the West Anne-Marie Werner, Wolfgang Behr (University.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
ADL Alexandria digital Library – Davidson Library, UCSB Alexandria Digital Library (ADL) Brief intro to ADL Item vs Collection Level Metadata Collection.
Margret Plank 17th International Conference on Grey Literature 1st and 2nd December 2015, Amsterdam (Netherlands) Move beyond text – How TIB manages the.
The AECID digital library and its possibilities to contribute to sustainable international development Araceli GARCÍA IFLA Satellite Conference. August.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Primo at the British Library Mandy Stewart. 2 About the British Library The British Library is the National Library of the UK It is a world-class.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Taxonomic Name Recognition (TNR) in Biodiversity Heritage Library (生物多样性图书馆分 类学名称识别) Qin Wei (魏琴), Chris Freeland, P. Bryan Heidorn Missouri Botanical.
Biodiversity Heritage Library: A Successful Collaboration, A Fully Open Access Collection Marty Schlabach Mann Library, Cornell University Upstate New.
Freeland, LAPI II, 18 NOV 2008 Digital Libraries for Science: Botanicus & Biodiversity Heritage Library Chris Freeland Director of Bioinformatics, Missouri.
World wide access to biodiversity literature The Biodiversity Heritage Library Henning Scholz 1 & Tom Garnett 2 1 Museum für Naturkunde, Berlin, Germany.
The High Energy Physics information platform: Introduction
Flanders Marine Institute (VLIZ)
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
Jonathan Griffin, Managing Director, IFIS Publishing &
Mendeley Overview VISHAL GUPTA Customer Consultant South Asia
Presentation transcript:

The BHL way to content William Ulate BHL Technical Director Global BHL Coordinator Leiden, Netherlands February 14, 2013

What is BHL? The Biodiversity Heritage Library is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The Biodiversity Heritage Library is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.”

Extensive

Global…

New Partners and Geographies

Dear Sir / Madam Can i just congratulate you on an absolutely brilliant online resource. I am compiling a report on an invasive hydromedusae and could not believe the ease and efficiency of this web page which genuinely saved me weeks of my life Research that previously took months now takes only a few hours La plus grande #bibliotheque #botanique & #zoologique online The largest online botanical & zoological #library #BHL The freeing of knowledge may lead to new discoveries and changes in the way the natural world is perceived

More Online Content

San Francisco Woods Hole London Alexandria Beijing Global Replication & Serving Replicated Data Center Portal Application

> 390,000 views in 10 months > 1200 sets > 60,000+ images

The Art of Life project: describing and providing access to natural history illustrations from the Biodiversity Heritage Library (BHL) TitleStictospiza formosa TypeIllustrations DatePublication: 1898 AgentAuthor: Arthur G. Butler ( ) Illustrator: F.W. Frohawk ( ) DescriptionA pair of finches with green and yellow bodies resting on reeds SubjectsScientific name: Amandava formosa (Latham, 1790) Vernacular Name: Green Avadavat or Green Munia Accepted Name: Amandava formosa (Latham, 1790) Birds, finches Inscriptionsbottom center: Green Amaduvade Waxbill (Stictospiza formosa) Source Butler, Arthur Gardiner. Foreign finches in captivity. Hull and London: Brumby and Clarke, limited,1889 (2nd edition). This image comes from the Biodiversity Heritage Library, and is available online at biodiversitylibrary.org/page/ Biodiversity Heritage Librarybiodiversitylibrary.org/page/ RightsPublic domain ElementDefinitionExamplesRepea t Agentsperson or corporate entity involved in the creation, design, production, or publication of a visual resource. Curtis,John publisher Y CopyrightThe copyright status of the visual resource. Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) nc/2.0/deed.en N DateDate or range of dates associated with the creation or publication of the visual resource Y DescriptionA free-text note about content of the image, including comments, description, or interpretation, that gives additional information not recorded in other categories. This illustration shows a scale, coloured illustration of Sepsis annulipes (now known as Encita annulipes) beside the Trifolium ochroleucum plant. Several dissections from Sepsis cylindrica Fab. (all these details are provided on the next page of this book and the subsequent page). Y InscriptionsAll marks, caption, or written words added to the object at the time of production or in its subsequent history, including signatures, dates, dedications, texts, and colophons, as well as marks, such as the stamps of silversmiths, publishers, or printers. bottom Radula of L. souleyetianum on a more reduced scale Y SourceA citation for the book, journal or resource that hosts the visual resource Butler, Arthur Gardiner. Foreign finches in captivity. HullBrumby and Clarke, limited,1889 (2nd edition). N SubjectTerms or phrases that describe, identify, or interpret the visual resource. Carl Linnaeus Plant: Picea abies Plant: Picea abies Plant: Norway spruce Y TitleThe title or identifying phrase given to an Image Sepsis annulipes Orangutan Y TypeIdentifies a general category for the visual resource maps forestry maps Y Example of illustration described using Art of Life schema Art of Life schema elements required in Red We welcome your feedback on the schema!

Where are we? Scientific Name Extraction – Improved algorithm (Thanks uBio!) Articles – Extended BHL data model to store article metadata – Content and Process to harvest data from BioStor in place Create user interfaces for adding article metadata and associated files – Functional requirements defined – Process flow for adding article metadata and associated files – Implement UI changes Change BHL UI to accommodate article search Change BHL UI to accommodate article display (TOC)

Scientific Name Extraction TaxonFinder algorithm in production since 2008 – More than 100 million candidate name strings – More than 1.5 million unique, verified names – Available through UI, APIs, Data Exports & Internet Archive New collaboration with Global Names – Improved algorithm, better precision & recall – More data with TaxonFinder and Neti Neti!

Taxon Names BEFORE Name Instances 101,591, ,288,804 Unique Names 7,498,554 7,464,924 Verified Names 1,905,507 1,902,803 EOL Names 63,130,350 62,963,582 EOL Pages 13,579,868 13,532,684 AFTER Name Instances 151,222, ,066,425 Unique Names 29,246,382 29,091,767 Verified Names 10,153,165 10,109,540 EOL Names 87,791,695 87,135,089 EOL Pages 15,466,713 15,342,867

Article-level metadata Chapter-level metadataTreatment-level metadata

Part-level metadata Disambiguating and locating structural components in the corpus Done by automated and crowdsourced means – Thanks Rod Page! Welcome others! Greatly increases semantic value of the dataset Addressing important – makes data addressable and thus linkable

Articles in the BHL UI

Images

PDF Generator

Support citation reconciliation L. Sp. Pl. 2: Linneaus, C. Species Plantarum, vol. 2 p Linné, Carl von. Sp. Pl. Vol. 2 Page Caroli Linnaei, Species Plantarum exhibentes plantas rite cognitas, ad genera relatas, cum Differentis Specificis, Nominibus Trivialibus, Synonymis Selectis, Locis Natalibus, secundum SYSTEMA SEXUALE digestas.. 2: Zea mays

Citations Providers

What we’d like to do Improve OCR Rekeying Tables of Contents Researching candidate Scientific Names Image identification & extraction – – Currently funded by NEH ^Challenges framed as games

2007 Name Finding Study >35% OCR error rate for names only 1Insert Space8n->v 2Omit Space9l->i 3e->c10r->i 4u->I11u->ii 5u->n12h->l 6i->l13h->ii 7c->e14e->o Top OCR errors 35.16% Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Wei, et al. An Evaluation of Taxonomic Name Recognition (TNR) in the Biodiversity Heritage Library. Proceedings of TDWG

Abbild ungen und Beschreibungen der Fische Syriens, nebst einer neuen Classification und Characteristik sämmtlicher Gattungen der i JOH. JAKOB HECKEL, Inipectoi am k. k. Hof-Natur.-iUenkabinete in Wien, mehr, yelelirt. UeHtllMeii. MIfglivd. STUTTGART. E. Schweizerbart' sehe Verlagshandlung, 1843.

Older material Great deal of material is pre-1923 Irregular fonts – blackletter Multiple languages on same page – English text with Latin scientific names Changes in geographic names Changes in scientific names

*E.xvi � c � piteI von c. cXx.WptdvonfnrWmn bu � fbe;bcn.5 am cix bIa � S &3rn~ 41X a � m cv(f b1air � 'o � et ert oiensr � ; � ', : � hlrfc � c wa ff � 4am.diug bist a 6aiw~s ff oJrJtwt nof bL4ecImt& blfafra mem b t wag `wr 4 cn wiu 4 e8t5m.ed bvUratflb ck wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra tif vrmr Waff C * t6rmnli an `tn � ciblatGteaM w ?ffoaifrn w4wmeu nu weib e, wpiteI voE5teiri ct c ober gtUcr cit cm` 91 cLi biar J ' >bSciatl � Oiff ;Bruet wacfttc n qmcx b1a bl: bt5c lttmtt bb9 lkr w.llr#e iti ncn xoa ff cu :r trtuft *e t � B Rn " � trv W1Rt' ?Cm c blas waIwutr Ober � ci ti 1V Ces ' wt gbtiemwwajfu tpctt, afferain 9 c: b � titbfof � r f eran m rs bra wlg auig4;f aer � m *mc vrt blatcabtfm wfru an'deg~m rt blas Iaum bwWt � run f ncmai b14ianf tJobrrfan ebrut4net vnber Brwt Ober awawi*m.crriii btafwfm uww c on$ 'it ttu wttkc 5,10 $ m~C fca trc* cx u W � e � &mcyfbq4 Mabtt mmw rc a iiu bc Jcn ncI.end.*, blat s. a\ u: � rprd3 rw4ftf wm c ii,+ ttCC tn wa frr9fr orfab fcfbt enb c optiti bt -r9 ceDa ttDcn i34M sn Sem i

Expanding scope Manuscripts, field notebooks –mostly handwritten, often with drawings Global expansion means dealing with non- Western script systems and a whole new set of OCR problems – Arabic materials from Bibliotheca Alexandria in Egypt

Images

OCR Improvements Gaming Transcription

OCR Improvements Transcription Purposeful Gaming Crowdsource Markup

Transcribe Bentham A collaboration of the University of London Computer Centre, UCL Library Services and UCL Learning and Media Services with consultation from the UCL Centre for Digital Humanities Volunteer users can log-in and transcribe previously unstudied and unpublished manuscripts from the Bentham Papers collection in UCL Library's Special Collections in the Transcription Desk.UCL Library's Special Collections Transcription Desk Since launch, volunteers from around the world have transcribed several thousand Bentham manuscripts to an extremely high standard. Results and findings:

Transcribe Bentham Who were the volunteers?

Transcribe Bentham Age ranges

Purposeful Gaming SpaceClimateHumanitiesNatureBiology

Purposeful Gaming DIGITALKOOT Joint project run by the National Library of Finland and Microtask to index the library's enormous archives so that they are searchable on the Internet for easier access to the Finnish cultural heritage.National Library of FinlandMicrotask Launched on Feb , nearly participants completed over 8 million word fixing tasks by Nov DigiTalkoot enabled volunteers to participate in this fixing work by playing games.

OCR Improvements German text interpreted by the OCR process as: “unb auf ben ©elnrgen be6 fublic{)en”

OCR Improvements Different resulting texts from parsing the phrase: “und auf den Gebirgen des südlichen Deutschlands” (“and on the mountains of southern Germany”) IA OCROCR 2 Transcription 1 Transcription 2 1 unbund Ok 2 denbenden Ok 3 ©elnrgen©ebirgenBebirgenGebirgenX 4 be6desde5desChk 5 fublic{)enfublichenFüdlichenSüdlichenX 6 £)eittfc{)(anb6DeutfchlanbsDeutfchlandsDeutschlandsX

Purposeful Gaming

Crowdsource Markup Display textSpecies Profile Model category General/summaryTaxonBiology Geographic rangeDistribution Habitat Food sources and feeding behaviorTrophicStrategy Physical description (general)Description Physical description (detailed morphology)DiagnosticDescription

Thank you William Ulate Global BHL Project Manager / Technical Director Missouri Botanical Garden Skype: william_ulate_r