Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of URL References in ETDs: A Case Study at the University of North Texas Mark E. Phillips Assistant Dean for Digital Libraries.

Similar presentations


Presentation on theme: "Analysis of URL References in ETDs: A Case Study at the University of North Texas Mark E. Phillips Assistant Dean for Digital Libraries."— Presentation transcript:

1 Analysis of URL References in ETDs: A Case Study at the University of North Texas Mark E. Phillips mark.phillips@unt.edu Assistant Dean for Digital Libraries Daniel G. Alemneh daniel.alemneh@unt.edu Digital Curation Coordinator for Digital Libraries Brenda Reyes Ayala brenda.reyes@unt.edu Graduate Assistant, UNT Web Archiving Team

2 Background ETD at UNT Curating cited URLs URL references, linking patterns, Methods URL Extraction & Indexing Findings Summary Outline

3 Background: ETD at UNT

4 UNT & ETD  The University of North Texas (UNT) began accepting theses and dissertations in electronic format in 1999. ◦ UNT is one of the early adopters of what was to become the ETD movement in higher education ◦ One of the first three American universities to require ETDs for graduation.

5 UNT & ETD The UNT Libraries play an active role in facilitating access to UNT’s ETDs – Digital Projects Unit took on a stewardship role Develop appropriate Metadata Integrate Value added services into the ETDs – Multiple formats (PDF, JPG, ) – Integrate Related contents (Datasets, videos, audios e.g. recitals) – Started retrospective conversion projects: Digital retro-conversion (in-house project) for pre-1999 theses and dissertations previously available only in paper or microform.

6

7 Visits from 200+ Countries : http://digital.library.unt.edu/explore/collections/UNTETD/browse/ http://digital.library.unt.edu/explore/collections/UNTETD/browse/

8 Curating Cited URLs

9 The UNT Libraries carried out this research to better understand what effect this shift to the Web had on the use of Web resources as the research focus, or primary citation target of theses and dissertations. In order to answer this question, the authors analyzed the scope of referencing Web resources, how it differs between academic degree levels, and how it has changed over the past twelve years at UNT. Why Case Study

10 Degree Level Total # of Docume nts # of Documents without URLs # and % of ETDs that contain URLs Average URLs per item #% Doctoral 2,3477451,602 68.2% 14.02 Master’s 1,9888771,111 55.8% 11.12 Total 4,3351,6222,71362.6%12.83

11 URL Range# of ETDs% of ETDsCumulative % 01,62237.42% 13999.20%46.62% 2-91,29229.80%76.42% 10-3076917.74%94.16% 31+2535.84%100.00% UNT ETDs breakdown of several ranges of URLs per document

12 The average number of URLs per document in the overall UNT ETD dataset is 8.03 with a standard deviation of 21.6. These numbers represent documents which contained from 0 to 809 URLs each. Removing the documents that did not contain URLs and re- computing the average changed it to 12.83 URLs per document with a standard deviation of 26.19. UNT ETDs breakdown of URLs per document

13 Top-Level Domain # of Documents # of Documents with URLs Remark com1,81466.86% org1,57958.20% edu1,21244.67% gov1,08439.96% net44516.40% us36113.31% uk2699.92% ca1846.78% au1124.13% de1003.69% Top-Level Domain Reference

14 Second-Level Sub-Domain # of Documents # of Documents with URLs Remark ed.gov29510.87% state.tx.us27410.10% unt.edu2338.59% census.gov1856.82% cdc.gov1284.72% wikipedia.org963.54% nih.gov843.10% utexas.edu802.95% microsoft.com752.76% nytimes.com712.62% Ten Most Referenced Second Level Sub-Domain

15 Year# of ETDs# of ETDs with URLs% of ETDs with URLs 19991202823.33% 200031512740.32% 200129011640.00% 200229814648.99% 200332819860.37% 200430418159.54% 200528419970.07% 200632623572.09% 200734925873.93% 200833624272.02% 2009311132*42.44%* 201036628678.14% 201141833580.14% 201229023079.31% No. of ETDs with URLs by domain names

16 Year # of ETDs with URLs.com.org.edu.gov.net #%#%#%#%#% 1999281442.9%932.1%1139.3%414.3%4 20001276652.0%7055.1%5341.7%4132.3%1915.0% 20011166354.3%6757.8%5749.1%3933.6%2017.2% 200214610269.9%7350.0%6242.5%4732.2%1812.3% 200319813367.2%9648.5%8442.4%7035.4%2412.1% 200418112267.4%8949.2%8446.4%6636.5%2312.7% 200519914170.9%11256.3%10050.3%8341.7%4321.6% 200623515566.0%14360.9%11649.4%9841.7%4017.0% 200725818270.5%15760.9%12247.3%11645.0%3516.6% 200824216668.6%14057.9%9941.0%9137.6%4016.5% 20091328765.9%8362.9%6952.3%5037.9%2519.0% 201028619969.6%17059.4%12744.4%12945.1%5017.5% 201133523169.0%21564.2%13440.0%14643.6%6620.0% 201223015366.5%15567.4%9440.9%10445.2%3816.5% Longitudinal Data For ETDs with URLs

17  62% of the publications analyzed in this work included URLs.  Doctoral level publications at 68.2%  Master’s level at 55.8%  The percentage of ETDs that include URLs consistently increased  From 23% in 1999 to almost 80% in 2012 Summary

18  Across the years, there were more doctoral dissertations than masters’ theses with URLs referenced.  The.gov domain is the fourth most referenced top- level domain, however, it accounts for nearly half of the top ten most referenced domains  A further investigation at the domain or subdomain level could reveal additional patterns that may show more content based information about the URL references. Summary …

19 Looking Ahead

20  The URLs referenced in a large corpus of ETDs may be present interesting insight into the subjects, disciplines and patterns in these documents which warrants further investigation.  This research provides a preliminary framework for technical methods appropriate for approaching future analysis of the data. A deeper investigation into the scope of the target URLs across an entire ETD corpus could provide a better understanding of the content-based URL linking patterns  Additionally an investigation into how specific disciplines or subject areas are referencing URLs in their ETDs would be helpful in identifying particularly high areas of URL linking versus lower levels.  An analysis of URL inclusion in ETDs across institutions and even nations would make a logical follow-on investigation that would show if higher level trends exist in ETDs. Future Works

21  Finally a further investigation into URL extraction from text would be beneficial to the ETD community in several ways:  It would allow libraries to extract URLs not only from born digital ETDs but also from theses and dissertations that are being retrospectively digitized in institutions that have not had longstanding ETD policies.  It would allow for investigation into ways of normalizing or completing malformed URLs that may provide for better analysis of content referenced and its availability in Web archives. Future Works…

22 Thank You! Ameseginalehu! Gracias!


Download ppt "Analysis of URL References in ETDs: A Case Study at the University of North Texas Mark E. Phillips Assistant Dean for Digital Libraries."

Similar presentations


Ads by Google