Article Semanticizer – Stitching Data Mining Services Into a Standalone Search Appliance David P. Shorthouse Université de Montréal / Canadensys Dmitry.

Slides:



Advertisements
Similar presentations
African Journals Online (AJOL). Publisher: Various Name of service: African Journals Online (AJOL) Tables of contents and abstracts available to all users.
Advertisements

S Y N E R G I E S The Canadian Research Information Network Canadian Foundation for Innovation External Committee Evaluation –
UvA Catalogue Contents of the catalogue Next = click.
Université de Montréal / Canadensys
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
ED-LEARN 2008 World Conference on E-Learning in Corporate, Goverment, Healthcare, and Higher Education November , 2008 – Las Vegas, Nevada Konrad.
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Dow Jones Interactive Demonstration. Dow Jones Interactive Humanities, science, health, education and others, with focus on business - business news,
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Named Entity Recognition for Digitised Historical Texts by Claire Grover, Sharon Givon, Richard Tobin and Julian Ball (UK) presented by Thomas Packer 1.
Identifying Patterns in Road Networks Topographic Data and Maps Henri Lahtinen Arto Majoinen.
Machine Learning Case study. What is ML ?  The goal of machine learning is to build computer systems that can adapt and learn from their experience.”
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
Toward Automatic Processing and Indexing of Microfilm.
Biodiversity Heritage Library by Connie Rinaldo. Overview History EOL/BHL: WHY? Members/Collaborators Process Governance Sustainability: Legal and Financial.
Input Devices or Ways to create the stuff you want.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Feature extraction Feature extraction involves finding features of the segmented image. Usually performed on a binary image produced from.
Brief overview of ideas In this introductory lecture I will show short explanations of basic image processing methods In next lectures we will go into.
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
Instructions: Your team should use the following slides for your camera point. You need to make 3 different sets of slides – one for each of your three.
Genetic Research Using Bioinformatics: LESSON 6:
United States Geography
PLUG-INs A Student’s Guide to Information Literacy
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Connecting Repositories Zdenek Zdrahal Knowledge Media Institute The Open University, UK UNESCO, Paris, 26 February 2013.
Data quality challenges in the Canadensys network of occurrence records: examples, tools, and solutions Christian Gendreau, David Shorthouse & Peter Desmet.
Note Taking.
Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:
Online Scholarly Editions Introduction to Advanced Research Academic Technology Services.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
ROYAL SOCIETY OF CHEMISTRY
Searching and Retrieving Full-Text Articles form OARE.
Structurae International Database and Gallery of Structures Structurae is the world's largest online gallery and database of civil engineering. It includes.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
EBSCO Host Psychology and Behavioral Sciences Collection.
The application of phenotype and environment ontologies to Natural History Collections Rutger Vos.
 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we.
Copyright © 2009 Pearson Allyn & Bacon Engaging in the Language Arts: Exploring the Power of Language Donna Ogle and James W. Beers This multimedia product.
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, Université de Montréal / Canadensys David P. Shorthouse, Université.
Online Resources for Schools. Why use EBSCO’s Offering via JCS Online Resources? Encourages independent learning through trusted sources All UK / Irish.
Navigation, Visualization and Searching in Databases of Full Body 3D Scans Marc Rioux, Eric Paquet, and Zouhour Benazouz Visual Information Technology,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
IOPscience More than just an e-only experience Tony O’Rourke Assistant Director, Journals IOP Publishing Ltd, Bristol, UK SLA 2007, Denver CO.
Open Access - an introduction, Aleppo, December Open Access – an introduction Ian Johnson.
H.W. Wilson Quality and Value in Reference Products.
Welcome to de Gruyter Reference Global. De Gruyter Reference Global provides you with comprehensive access to high quality academic content Run a quick.
International Conference ICSTI QUESTEL 40 years of experience mastering IP information to help cutting edge companies drive their strategy in an IP centric.
A collaborative tool for sequence annotation. Contact:
ANNUAL REVIEWS
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
Access Science Ashley & Hope. Subject Areas  Articles- source materials  Research Reviews- articles reviewing key research and developments in science.
Mining Wiki Resoures for Multilingual Named Entity Recognition Xiej un
Mapping to Ontologies Nigam Shah
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Deep Indexing in ProQuest Health and Medical Databases.
Implementing Taxonomy Taxonomy Talk from the Publishing World Special Libraries Association Philadelphia, Pennsylvania 14 June 2016.
1 Web Search What are coral reefs? How are they formed? 2 Image Search Find a map that shows the areas of the world where coral reefs can be found.
De Gruyter eBooks User Guide
Taking a Tour of Text Analytics
XINFO – Scanner DS – File Content
Self-Organizing Maps for Content-Based Image Database Retrieval
De Gruyter eBooks User Guide
Good Research Sites.
Poster Title Researchers’ Names Company or Institution
Note Taking.
De Gruyter eBooks User Guide
Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Note Taking.
Конференција на МАКС, 17 јуни
Presentation transcript:

Article Semanticizer – Stitching Data Mining Services Into a Standalone Search Appliance David P. Shorthouse Université de Montréal / Canadensys Dmitry Mozzherin Marine Biological Laboratory /

Biota of Canada

We want to find & then organize data from printed materials but search is exasperatingly limited

15,000 OCR articles & their scanned images (9GB)

Key Players

Global Names

Named Entity Extraction people, companies, organizations, cities, geographic features

elasticsearch

Search Characteristics Tokenizers: path hierarchy Filters: edge Ngram, pattern replace (abbreviated genera), stemmer (English), elisions (French) Analyzers: lowercase, ascii folding, autocomplete Full text Thanks to: Christian Gendreau (Canadensys)

Possible Next Steps Generalize the design to best support content types (eg specimen labels) Better recognition of other entities, text blocks Scientific name plugin for elasticsearch (hackathon?) Share with Journal Map and Mining Biodiversity Engage scientific societies, journals