Download presentation
Presentation is loading. Please wait.
Published byMason Leaman Modified over 9 years ago
1
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 1 The Geography of arXiv.org Rui Carvalho and Michael Batty University College London rui.carvalho@ucl.ac.uk m.batty@ucl.ac.uk http://www.casa.ucl.ac.uk/secse/
2
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 2 What is arXiv.org? Founded by Paul Ginsparg in ‘91 at LANL, moved to Cornell in ‘01; Self-archive of physics, maths and computer science preprints since ‘91; Quantitative biology added Sep ‘03; Papers have a time stamp, so authors can claim ownership; Typically, papers appear in refereed journals about 12 months after journal submission; Some data for calendar year ‘04: –total number of submissions (Aug ’91 through Dec ’04): 303 614 –average submission rate (’04): 3644 papers/month –18 mirror-sites in 16 countries; –submission rates (’04): hep 20.5%, cond-mat 20.5%, astro-ph 18.9%, math 11.8 %, quant-ph 4.8%, gr-qc 4.3%, nucl 3.9%, physics(other) 3.1%, nlin 2.3%, cs 1.5%, q-bio 0.2%; –submissions by country (’00-’04): US edu and gov (27.5%), Germany (9.9%), Italy (6.3%), United Kingdom (5.8%), Japan (5.7%), France (5.6%), Russian Federation (3.2%);
3
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 3 arXiv monthly submission rate stats (Dec ’04) “hep” = High Energy Physics, “cond-mat” = Condensed Matter Physics, “astro-ph” = Astrophysics, cross-listings in clear
4
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 4 Why study the Geography of arXiv.org? Papers often submitted in LaTeX. LaTeX is a text-based document preparation system for high-quality typesetting (it’s not a word processor!); In that case, LaTeX source code available for download from arXiv.org; Typically (but not always!), LaTeX source encodes author and address data in specific fields; These fields can be parsed using custom scripts (e.g. written in Perl) to extract the geographical location of the authors; Problem: can we parse author/address fields, extract papers with one or more US authors, and map the zip codes in their addresses?
5
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 5 Problems with Zip Code extraction Identifying zip code look-alikes: –Easy: Kiev 03028, Ukraine Roma 00185, Italy –Not so easy: Iran 71454 Israel 84105 Could not process: –Physics Department, Northeastern University, Boston MA USA –address/author fields not found (as in PhD thesis or commentaries) Errors (found 6 in a random sample of 400 papers (1.5%)) –Fargo 58105, ND –Theoretical Division and Center for Nonlinear Studyes, Los Alamos, New Mexico~87545 –Zip not in database (found 1 in 400)
6
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 6 Mapping cond-mat in 2004 Total: 7957; one or more US authors: 2326 (29.2%); couldn’t process: 517 (6.5%)
7
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 7 The Geography of cond-mat
8
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 8 The Geography of cond-mat
9
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 9 The Geography of cond-mat
10
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 10 Rank-order plot of paper output by zip (preliminary)
11
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 11 Next Steps Extend study to larger sample of arXiv.org; Study spatial dynamics of arXiv papers for the period ’91—’05 (knowledge diffusion?); Compare with NSF, ARPA, etc data by state; Extract geography of collaboration networks.
12
AAG, Denver, 2005: Carvalho and Batty: The Geography of arXiv.org 12 To find out more http://www.casa.ucl.ac.uk/secse/ Spatially Embedded Complex Systems Engineering (SECSE): http://www.secse.net/ members: UCL, Leeds, Southampton, Sussex rui.carvalho@ucl.ac.uk m.batty@ucl.ac.uk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.