1 Writeslike.us Em Tonkin, Andrew Hewson

Slides:



Advertisements
Similar presentations
COUNTER: improving usage statistics Peter Shepherd Director COUNTER December 2006.
Advertisements

Open Access at the World Bank OA Policy and Open Knowledge Repository (OKR) Interoperability Jose de Buerba, Sr. Publishing Officer Paschal Ssemaganda,
OpenDOAR The Directory of Open Access Repositories Bill Hubbard SHERPA Manager University of Nottingham.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
A centre of expertise in digital information management Developing a Quality Culture For Digital Library Programmes Author & Presenter Brian Kelly UKOLN.
Linking Repositories Scoping Study Key Perspectives Ltd University of Hull SHERPA University of Southampton.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Health Ingenuity Exchange (HingX) Best Practices for User Groups and Resource Registration.
Periodicals BooksNewspapers Reference tools Online Databases Printed Version Electronic Version Annual reports and other publications.
Dynamic Contextual eLearning – Dynamic Content Discovery, Capture and Learning Object Generation from Open Corpus Sources Shay Lawless, Knowledge & Data.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
SCIDIP-ES Components Oct ,Brussels. Basic Preservation Strategies Often stated as: “Emulate or Migrate” OAIS concepts change these to: Add Representation.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
SciVal Experts & SciVal Funding Information Sessions.
Presenter: Hsini Huang Co-authors: Li Tang and John P. Walsh Georgia institute of Technology ESF-APE-INV 2 nd “Name Game” workshop, Dec 9, 2010 Madrid,
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
Malaysian Grid for Learning October DC 2004, Shanghai, China. © 2004 MIMOS Berhad. All Rights Reserved Metadata Management System DC2004: International.
4th project meeting 27-29/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA agINFRA A data infrastructure for agriculture.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
An Introduction to the Resource Description Framework Eric Miller Online Computer Library Center, Inc. Office of Research Dublin, Ohio 元智資工所 系統實驗室 楊錫謦.
DTIC Discovery Tools 28 March 2012 Moderator: Kapin L. Ferguson.
Amos Kujenga ADLSN Training Coordinator Addis Ababa, Ethiopia 5 – 7 November 2014 Introduction To Digital Libraries and Repositories.
1 Writeslike.us Em Tonkin, Andrew Hewson
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
Metadata in a distributed information environment: Interoperability as recombinant potential Lorcan Dempsey OCLC/SCURL pre-IFLA conference, 15/16 Aug 02.
Open access & visibility Management Digital Preservation ORA: Purposes.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team Multi-Graph.
Networked Information Resources SPARC, E-prints & Open Access initiatives.
InSPIRe Australian initiatives for standardising statistical processes and metadata Simon Wall Australian Bureau of Statistics December
VIVO and Scholarly Repositories: Synergistic Opportunities.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Internet Research – Illustrated, Fourth Edition Unit A.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
ADL Alexandria digital Library – Davidson Library, UCSB Alexandria Digital Library (ADL) Brief intro to ADL Item vs Collection Level Metadata Collection.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
CS276B Text Information Retrieval, Mining, and Exploitation Practical 1 Jan 14, 2003.
Active Data Management in Space 20m DG
Outline Pursue Interoperability: Digital Libraries
Introduction to Implementing an Institutional Repository
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Data types and persistent identifiers in
HingX Project Overview
Conclusion & Discussion Research purposes/ Research hypothesis
Presentation transcript:

1 Writeslike.us Em Tonkin, Andrew Hewson

2 Background Relevant research themes: Metadata harvesting and reuse Automatic metadata extraction Text analysis Social network analysis Scholarly communication, particularly informal communication

3 Aim Helping people to find each other: Finding other researchers with similar interests to yourself in your geographic area Or in your area of research Not everybody with similar interests will attend the same conferences! Helping students find potential research supervisors Encouraging serendipity.

4 Relevant technologies In fact there are an awful lot of these. Social network analysis: Requires a very large dataset Solvable either by a) being Facebook or similar (but adoption rates are far from 100%) b) automated analysis of relevant data Solution b) is cheap, simple, and very fallible. Not a new approach – at the core of bibliometrics

5 Relevant technical problems Author identity disambiguation Formal social networks disambiguate between instances of individual names (for example, if there are many people called 'John Smith', the system can tell you which is which). Needs to be solved to acceptable level. Need to define how good 'acceptable' is. Formal solutions usually depend on unique identifiers + registries Cheap, moderately effective solution: disambiguate via textual characteristics + metadata

6 Methodology Harvest OAI metadata: captures large list of: Author names (somewhat randomly formatted) Digital object titles, descriptions (sometimes), dates (sometimes) and content (sometimes) Citations (sometimes) Spider digital objects, analyse them for formal metadata – retrieve addresses, etc. Retain OAI source: useful clue regarding author affiliations (sometimes)

7 Methodology (II) Analyse text for noun-phrase-like structures – useful clue as to theme Background information required, such as: Institution name, domains/URLs associated with each institution Retrieved via harvesting from Wikipedia Much of this information is not well-structured, so unavailable via DBPedia Poorly structured information needs filtering: for example, author names are not consistently structured between repositories. - machine learning problem. Search with contextual network graph algorithm

8 'Sometimes' and 'usually' Statistics are: Cheap Imperfect Available Rapid innovation philosophy: Cheap is good Simple is good Solutions requiring novel/additional uptake of infrastructure are out of reach

9 Results Basic concept worked well Law of diminishing returns: beyond the first 80-90%, increasing effort led to only minor improvements in dataset (minor niggles!) Interface development actually required more time than the dataset development, and exceeded project length... But useful dataset can be released as linked data, reused for various purposes

10 Walkthrough: Basic search (the harder method!)

11 Advanced search

12

13

14

15

16 Walkthrough

17 Conclusion OAI-DC (and Wikipedia!) is a good source for 'semi-structured' data There is a great deal of potential for using this together with appropriate analysis tools, such as those explored within the FixRep project, to develop social network- like graphs Application of this type of data for the purpose of encouraging informal academic communication/collaboration is an interesting research field with many potential applications