Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

Slides:



Advertisements
Similar presentations
Zetoc.mimas.ac.uk Zetoc Electronic Table of Contents from the British Library Zetoc Support.
Advertisements

Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Step 1 Start your web browser (Internet Explorer or Firefox). Step 2 Type: in the Address box Step 3 Press Enter on the keyboard.
Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Open Archives and Free Online Scholarship Thomas Krichel (RePEc & Long Island University) Simeon M. Warner (ArXiv & Cornell University)
Towards an open library of relational metadata: the experience of RePEc (Research Papers in Economics) Thomas Krichel
Anwendung von open source Ideen in digitalen Bibliotheken: die Beispiele von RePEc und rclis Thomas Krichel
From RePEc to 3lib. the long march for free bibliographic data Thomas Krichel
Digital scholarly communication in Economics: from NetEc to RePEc Thomas Krichel work partly sponsored by the Joint Information.
Acknowledgements Ellen Fischer for her hospitality. Michael Heinz for organizing the seminar.
RePEc, a digital commons for economics Thomas Krichel
Что делать? Thomas Krichel
RePEc, a case to illustrate the evolution and future trends of repositories and open access Thomas Krichel
RePEc: a public-access database that promotes scholarly communication in Economics Thomas Krichel
Designing for the Discipline: Open Libraries and Scholarly Communication Thomas Krichel
Rclis in vision and reality Thomas Krichel
RePEc and OLS Thomas Krichel prepared for the first retreat for disciplinary repositories Monterey
RePEc: An Open Library for Economics Thomas Krichel Work partly supported by the Joint Information Systems Committee of.
Transforming scholarly communities with open libraries Thomas Krichel
RePEc as frontier repository, the business model and what it means to survive as network in a more and more web-collaborative academia and a developing.
Bringing scholarly communication in kicking and screaming into the Internet age Thomas Krichel
Bringing scholarly communication in Economics kicking and screaming into the Internet age: NetEc, RePEc and more to come Thomas Krichel
Disintermediation of Academic Publishing through the Internet: An Intermediate Report from the Front Line Thomas Krichel
Information policy issues in RePEc Thomas Krichel
Open Archives and Open Libraries Thomas Krichel
RePEc: a early example of an open library Thomas Krichel
The future of scholarly communication in Economics Thomas Krichel work partly sponsored by the Joint Information Systems.
Academic self-organization on the Internet. The example of RePEc Thomas Krichel
Document data & personal data Thomas Krichel Long Island University & Novosibirsk State University
How to become an 800 pound gorilla: the case of RePEc. Thomas Krichel 2008–10–29.
Ariw and AuthorClaim: current state Thomas Krichel prepared for the first retreat for disciplinary repositories Monterey.
Use your bean. Count it. Thomas Krichel
My life and times Thomas Krichel LIU & НГУ
LIS510 lecture 0 Thomas Krichel feeling nervous? So am I. It is my second time. Overall approach –I follow what has been done before. –I am.
EndNote Web Reference Management Software (module 5.1)
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
EndNote Web Reference Management Software (module 5)
Building Repositories of eprints in UK Research Universities Bill Hubbard SHERPA Project Manager University of Nottingham.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
NIH Public Access Compliance Cleveland Health Sciences Library Case Western Reserve University Kathleen C. Blazar.
Indispensable tools for research at its best COS Pivot: Accessing Pivot and Managing Your Profile.
Open Stirling: Open Access Publishing and Research Data Management at Stirling Monday 25 th March 2013 Michael White, Information Services STORRE Co-Manager/RMS.
Sunday October 28, www.eprints.org Tim Brody - Stevan Harnad -
Reference Management Software Tools Part A: EndNote Web.
“Facts are stubborn things, but statistics are pliable.”
How the University Library can help you with your term paper
EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
MyiLibrary® ‘Search & View’ Website Training June 8, 2010.
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
Open Bibliographic Data and Author Claiming James R. Griffin III 1, 3 and Thomas Krichel 1, 2, 3 1 Long Island University 2 Novosibirsk State University.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Research evaluation requirements José Manuel Barrueco Universitat de València (SPAIN) Servei de Biblioteques i Documentació May, 2011.
My Bibliography/eRA Commons Integration More utility, less work Bart Trawick Neil Thakur Commons Working Group, 9/22/09.
EndNote. What is EndNote? EndNote is referencing software that enables you to create a database of references from your readings.
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2015 Attila Skulteti
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2015 Attila Skulteti
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2016 Attila Skulteti
Data Management: Documentation & Metadata
Open Access to your Research Papers and Data
Zetoc: Electronic Table of Contents from the British Library
The RePEc database about Economics
Zetoc: Electronic Table of Contents from the British Library
Introduction of KNS55 Platform
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Presentation transcript:

free author registration Thomas Krichel LIU & НГУ

me today I am working for the Palmer School of Library and Information Science in he College of Information and computer science of the CW Post Campus of Long Island University in Brookville NY, U.S.A. and for the Division of Information Systems in the Faculty of Information Technology at Novosibirsk State University in Novosibirsk, Russia. I do a lot of programming & sysadmin.

formerly I am a trained economist. My main claim to fame is the creation and and coordination of the RePEc digital library for economics at My main area of work within RePEc is the NEP: New Economics Papers current awareness service. It's a totally different topic.

RePEc now It is a collection of data about academic economics. The bulk of the data is data about documents. And the bulk of that is –published article data –working paper data But the interesting data is the author, institution and usage data.

RePEc principle of 1997 many archives –archives offer metadata about digital objects (mainly working papers & journal articles) one database –the data from all archives forms one single logical database many services –users can access the data through many service –providers of archives offer their data to all services

repec is based 900+ archives Blackwell MPRA DEGREE S-WoPEc NBER CEPR Taylor & Francis US Fed in Print IMF OECD MIT University of Surrey CO PAH Elsevier

to form a 630k item dataset 254,000 working papers 370,000 journal articles 1,600 software components 4,200 book and chapter listings 17,600 author records 10,800 institutional contact listings

RePEc is used in many services EconPapers NEP: new economics papers Google Scholar RePEc Author Service Twitter bulk posting (planned) LogEc IDEAS RuPEc EDIRC LogEc CitEc MPRA

… describes documents template-type: redif-paper 1.0 title: dynamic aspect of growth and fiscal policy author-name: thomas krichel author-person: repec:per: :thomas_krichel author- author-name: paul levine author- author-workplace-name: university of surrey classification-jel: c61; e21; e23; e62; o41 file-url: ftp:// pub/repec/sur/surrec/surrec9601.pdf file-format: application/pdf creation-date: revision-date: handle: repec:sur:surrec:9601

… describes persons (ras) template-type: redif-person 1.0 name-full: mankiw, n. gregory name-last: mankiw name-first: n. gregory handle: repec:per: :n__gregory_mankiw homepage: mankiw/mankiw.html workplace-institution: repec:edi:deharus workplace-institution: repec:edi:nberrus author-article: repec:aea:aecrev:v:76:y:1986:i:4:p: author-article: repec:aea:aecrev:v:77:y:1987:i:3:p: author-article: repec:aea:aecrev:v:78:y:1988:i:2:p: ….

… describes institutions template-type: redif-institution 1.0 primary-name: university of surrey primary-location: guildford secondary-name: department of economics secondary-phone: (01483) secondary- secondary-fax: (01483) secondary-postal: guildford, surrey gu2 5xh secondary-homepage: handle: repec:edi:desuruk

author registration It started when JISC funding allowed us to hire a student to write an author registration system. The system went online as HoPEc in late It has been renamed RePEc Author Service (RAS). A 2002 grant from OSI allows for a rewrite and expansion.

researcherID researcherID is a system by Thomson ISI. It allows authors to find their documents It has been modeled after the RePEc author service. But the document and personal records are not freely available.

success of RAS Measuring the success of an author registration service is difficult in general. In RePEc we are fortunate that an independent list of top 1000 authors exists. Of those 80% are registered.

author registration ? Author registration is not disambiguation of names. Author registration is not authority control. Author registration is usually done by authors themselves. It involves two steps –Registrants put in some personal data. –Registrants finds in the document data records about documents they have written.

personal data These contains required element: –person's name – and optional elements –institutional affiliation –homepage URL

search for authorships This is based on a set of name variations. A name variations is a string by which document metadata authors may have referred to the registrant. Example: –Thomas Krichel –Крихель, Т. Registrants maintain a name variations profile.

authors An author is a registrant who has at least one work claim. Since author registration is a pionering innovation by yours truly, it's purpose is not yet clearly understood. A user who registers to gain access to data is called a bozo registrant. RAS managers periodically clear presumed bozo registrants.

free? as in $0 Registrations don't pay in money terms for registration. Document data providers don't pay to have their document data list. Registrants data is freely available if they allow it.

free ? as in freedom Author records are freely available for any purpose, as long as we have registrants consent. Registrants' consent is assumed for anything but the address. By default addresses are not exported.

freedom is crucial Users will not register with the intention that the records will be used. They will prefer a system that has high re- usage. Therefore I am confident an open system will win over a closed system.

free document data In principle, document data has to contain only three fields –Title –Author name expressions –URL for further information and/or Such data is in principle not copyrightable. But there are still only few sources that have such data readily available.

service implementation scale Registration of authors can be conducted against any document datasets. What is the appropriate set –type scale? –subject scale? RAS shows it works for a single discipline scale with research paper documents, both article. But economics is fairly insular.

AuthorClaim.org Since 2008 yours truly have been working on an interdisciplinary system. This will be the last important project before my death. The idea is that it will help the fledging institutional repository (IR) movement. Since IRs currently are either empty or contain rubbish, AuthorClaim has to be primed with other contents.

datasets The data used in an AuthorClaim are –PubMed (problematic) –DBLP (XML file only) –CiteSeer –arXiv (not announced yet) –CIS (non-free dataset) –E-LIS Work is under way to include broad range of the repositories listed in DOAR.

PubMed The 800 pound gorilla of bibliographic datasets, with 17 million records. Free only as $0, through a convoluted license. In addition, NLM added the condition that I would not offer the personal records to them. Just saying that they would refuse them if I offered them was not enough for them.

DBLP Not freely available either. –only an XML dump of some records (individual documents) –only for non-commercial purposes Overlap with CiteSeer would be nice to clean up.

CIS This is the Current Index to Statistics. Not a free dataset at all but your truly has access to a database version where extract the 3 metadata fields that are required.

DOAR repositories DOAR repositories used the OAI-PMH protocol. Dirty UTF-8/XML seems to the main culprit. Roughly, out of 1200 registered repositories, ½ work on a particular day. For roughly 2/3 rd we can get some records by trying and stopping when the first error occurs. BTW RePEc makes for the second-largest DOAR repository by record number.

subject coverage and overlap The subject coverage of AuthorClaim will remain uneven unless publishers are giving data directly (replacing libraries, eventually). Overlap is less of a problem than lack of good data. RePEc routinely groups various versions of authors' work together. This is feasible if they are in the claimed set of a person.

scaling issue With 30 times the number of record, and with PubMed only using initials (phew!) registrants with common names have large sets of potential documents to work through. Clearly they also derive more benefits. Example: Joanna P. Davies has currently 795 proposed documents. Now think about Chen or Li.

machine learning In a new project Илья Королёв and Thomas Krichel are working on enhancing ACIS to provide help through machine learning. The idea is that the users will submit a few positive and negative examples, and machine learning sorts the most likely authored documents to the front. The assessment of such a system is really interesting.

ACIS This is the Academic Contribution Information System. It is a generic software to enable author registration services that are somewhat more general. Work on ACIS was sponsored by the Open Society Institute. The software was written by Ivan V. Kurmanov. It is verrrry complicated.

basic idea A contribution is a relationship between document data records and personal records that a registrant can claim. Authorship and editorship are built-in contribution types, but others can be configured. The contribution system allows registrants to provide information about their contribution.

no document creation Using ACIS, registrants can not create document records. While many RAS registrants want to do this, it is considered out of scope for an ACIS installation. ACIS-based systems are not supposed to substitute but complement the work of publishers.

ACIS implementations and document services An ACIS implementation service (AIS) can work with a document submission service (DSS). A DSS would typically run EPrints, Dspace or Fedora-Commons. While such systems are distinct, on different machines etc, they can be so interconnected that they appear integrated to a naive user.

interoperability AIS and DSS interoperability comes in different levels. With each level up, we have more (better) interoperability. We have levels 0 to 4. At level zero, an AIS and an DSS simply live side by side, and no interaction is happening.

level 1 In level 1, a DSS provides metadata about its documents to an AIS. –The data is stored in files. –in a compatible format. for ACIS this would be AMF or ReDIF. The AIS processes the data periodically. –adds new records to the document data set –perform probationary associations between documents and authors

level 2 A DSS delivers to the AIS data for some of its authorships that point to data in the AIS. The AIS can accept any of the following 3 identification avenues –an identifier known to the AIS –a shortID, previously generated by the AIS –an address, know to the AIS as the login of a registrant. This data will have to be entered by a submitter.

level 3 The DSS helps submitters to find the data required for level 2 interoperability. While submitters enter authorship data, the DSS performs searches in the AIS data. If matching records are found, the submitter is invited to select them. The document data is the exported to the AIS in the usual way.

implementing level 3 The AIS needs to expose registrants data to the DSS. The data can not be made available publicly if we want the to be an avenue of identification. The DSS must search the AIS data display optional matches in an unobtrusive way and give submitters an easy way to choose an option.

level 4 The DSS immediately notifies the AIS about a document submission. The AIS processes the notification, the document is added to the research profiles of its identified authors.

level dependency There is level dependency –level 1 is really required for other levels. –level 2 is a basis for level 3. –level 4 can be done without either level 2 or level 3. Current ACIS code can implement all four levels. There is code written for EPrints 2.0 that implements the DSS side of the interoperability.

ACIS components rid is a feeding daemon. It feeds records in files into a processor. It used the Berkeley DB transactional database system. ARDB is a software suite that implements bibliographic relational bibliographical datasets. There is general web application layer. It fires up XSLT.

ACIS components, a few more As shortID system associates shortIDs with documents and more importantly, registrants A userData system manages the data handled by users and feeds it back to the ARBD system. A resources system deals with searches and suggestions.

ACIS functionality Beside the association of documents with users, ACIS provides a range of functionality that complement or extend the basic functionality. I will review some now.

ACIS contact details This is a set of trivial fields – . This detail is required but not exported by default. –homepage –phone number –postal address We don't do pictures of the registrants' dogs etc.

affiliations profile This is more complicated. Institutional data is kept as separate records, not as string data. Registrants can search for existing institutional records to create an affiliation with. Or they can propose a new record to be added by filling out a form.

research profile This is collection of metadata about research documents the registrant has written. Available functions include –display a list of works in the profile –search for new suggested works –manual search for works by title –display refused research documents –change preferences for automatic updates

automatic updates By default, when a document record quotes an person short id, the document is added to the profile. By default, a regular search using the name variations profile identifies a set of potential new documents and reports them to the user via . The registrant may choose to have exact matches of these searches being added to the research profile.

document to document links Document to document links can be created for authors to say that two documents in the profile are related. Document full-text links can be confirmed or rejected. Typically such full-text files would found by an automated search external to the AIS.

citations profile Within this profile, author can partially manage citation information for items is the research profile. Like a DSS may submit data to a AIS a citation discovery service may take give citations data to a AIS. Such data can be maintained in the citations profile.

references processing References are processed to see if they may correspond to a document in the research profile. If a document in the profile has a potential citation it is called an interesting document. Once reference processing is done, registrants can navigate by decreasing level of interest.

suggestions processing Registrants navigate the set of suggested citations to see if the reference string really matches the research profile item. If the registrant refuses a citations, there is a screen where she can later overturn such a decision.

automatic citation updates If the reference is very close to citation data, the registrant can have it added automatically. When a co-author has identified a citation to an item in her profile, the registrant can allow it to be added automatically.

thank you for your attention!