Download presentation
Presentation is loading. Please wait.
Published byClifton Hunt Modified over 9 years ago
1
IDENTIFYING OPEN ACCESS ARTICLES: VALID AND INVALID METHODS David Goodman Palmer School of Library and Information Science, Long Island University Kristin Antelman Associate Director for Information Technology, NCSU Libraries Nisa Bakkalbasi General Science Librarian, Yale Library XXV Charleston Conference, Nov. 4, 2005
2
Nov. 4, 2005Charleston2 We gratefully acknowledge the courtesy of ISI to Stevan Harnad in supplying their citation data the courtesy of Stevan Harnad in supplying his analyzed data to us his acceptance of our offer to carry out a manual evaluation the assistance of Chawki Hajjem and Stevan Harnad in explaining the details of their methodology, and their helpful comments on our measurements
3
Nov. 4, 2005Charleston3 Why do we want to identify OA? So users can find it (findability) So people can link to it (linkability) To measure % of articles OA To measure OA Advantage (OAA) (increase in citations by being published OA)
4
Nov. 4, 2005Charleston4 Specific Fields Lawrence 2001 OA of conference proceedings in electrical engineering and computer science Location by Research Index using Google Matched pairs Kurtz 2003-5 Astronomy papers in ADS (Astrophysics Data System)
5
Nov. 4, 2005Charleston5 Specific Journals Wren 2005 References to articles in selected medical journals Other individual journal studies...
6
Nov. 4, 2005Charleston6 Multiple fields Antelman, 2004 selected subject areas in many academic fields manual identification OAA = 15% - 40%, depending on subject
7
Nov. 4, 2005Charleston7 Multiple fields Brody, Harnad et al, 2004-5 selected subject areas in science automated identification in arXiv by algorithm automated citation check in WoS OAA = up to 300%, depending on subject
8
Nov. 4, 2005Charleston8 Multiple fields Brody, Stamerjohanns, Harnad et al, 2004-5 selected subject areas in social science automated identification in arXiv or web by algorithm automated citation check in WoS OAA = up to 200%, depending on subject
9
Nov. 4, 2005Charleston9 Multiple fields Hajjem, Harnad et al., 2005- all subject areas in science & social science automated identification on web by algorithm automated citation check in WoS OAA = up to 200%, depending on subject (This data has been posted by Hajjem at Soton, but is still unpublished)
10
Nov. 4, 2005Charleston10 Our Purpose to confirm validity of algorithmic OA/non-OA determinations, to verify measurements of OA and OAA
11
Nov. 4, 2005Charleston11 Our Technique Selected years and subject fields from Hajjem, Harnad, et al., with OA determination and citations Sample from algorithm's OA and non-OA Manual check in the web to either confirm OA or not find OA (Google, author's sites, etc.) Tabulation of ISI citations to determine OA Advantage (more complete details forthcoming)
12
Nov. 4, 2005Charleston12 OA articles include Published authentic text in OA Journals Posted authentic text--publisher's PDF Posted author's corrected manuscript
13
Nov. 4, 2005Charleston13 Dubious OA items include: Embargoed published articles, after embargo ends Embargoed author's manuscripts, after embargo ends Editorials, Letters, Review articles Abstract-only publication
14
Nov. 4, 2005Charleston14 Non-OA articles include: Published authentic text in subscription journals Abstracts on publisher's site Listing in title pages alerting services blogs course notes references in other articles links from a posting to publisher's site
15
Nov. 4, 2005Charleston15 Set One Examined Articles from year 2002 (Classical) Biology (ISI category) 1% sample
16
Nov. 4, 2005Charleston16 Decision Table (biology) Manual detection OAnon-OATOTAL Algorithm detection OA 106160266 non-OA 32239271 TOTAL 138399537
17
Nov. 4, 2005Charleston17 Interpretation (biology): Of the 266 items labeled "OA" by the algorithm only 106 were actually OA Of the 138 items actually OA, the algorithm missed 32 Of the total 537 items, the algorithm got 345 right, and 192 wrong. 1.
18
Nov. 4, 2005Charleston18 Signal theory Distributions:
19
Nov. 4, 2005Charleston19 Set Two Examined Articles from year 2000 Sociology (ISI category) 8% sample
20
Nov. 4, 2005Charleston20 Decision Table (sociology) Manual detection OAnon-OATOTAL Algorithm detection OA 29148177 non-OA 25151176 TOTAL 54299353
21
Nov. 4, 2005Charleston21 Interpretation (sociology): Of the 177 items labeled "OA" by the algorithm only 29 were actually OA Of the 54 items actually OA, the algorithm missed 25 Of the total 353 items, the algorithm got 180 right, and 173 wrong. 1.
22
Nov. 4, 2005Charleston22 Signal theory Distributions:
23
Nov. 4, 2005Charleston23 Our Determinations: Comparison: note that the apparent similarity is due to compensating errors: There are many more non-OA articles than OA, so the small error on missed OA cancels out the big error on over-coded OA. Biology 2002 %OA Sociology 2000 %OA Biology 2002 OAA Algorithm's value14%23%51% Our Value16%15%63% (Sociology OAA not measurable due to error in matching ISI data)
24
Nov. 4, 2005Charleston24 All Determinations (including ours): Unavoidable Systematic Errors Problematic material types Delayed OA articles Articles posted long after publication Variation in titles Different publications with same titles Articles removed from web Invisibility to the search engines Errors due to ISI inaccuracy
25
Nov. 4, 2005Charleston25 All Determinations based on Soton data (including ours): Source of possible confusion OA Journals consistently omitted (all journals with 100% OA)* Journals without OA consistently omitted (all journals with 0% OA) * thus, all his OA and OAA determinations are for "Green" self-archiving only, not including "Gold" OA Journals
26
Nov. 4, 2005Charleston26 At least some Soton Data: other known possible sources of confusion or error All journals given equal weight regardless of size Data averaged by journal Google not usually used in search Just arXiv used in some searches (whether or not appropriate) Inadequate testing of algorithms
27
Nov. 4, 2005Charleston27 Determinations of % OA and OAA Depend on the accuracy of identification of individual items Therefore, algorithmic determinations at best accurate only by accident
28
Nov. 4, 2005Charleston28 Conclusions: I. Accuracy is now still only possible a. with manual determinations (which are too tedious for practical use) or b. with well-defined searches in well-defined fields (such as particular journals or repositories) II. Generalized algorithmic search engines of acceptable accuracy have yet to be developed (and tested)
29
Nov. 4, 2005Charleston29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.