Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,

Similar presentations


Presentation on theme: "1 Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,"— Presentation transcript:

1 1 Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD, www.ndltd.org fox@vt.edu http://fox.cs.vt.edu/talks/2014 Virginia Tech, Blacksburg, VA 24061 USA

2 Outline Acknowledgments Why, what, who, how Improving, quality Related technical contributions DLs and DL curriculum

3 Acknowledgments Family, mentors, teachers, students Dissertations: Sung He Park, Venkat Srinivasan, Seungwon Yang NSF: IIS-0535057, 0916733, 1319578 All those working with ETDs NDLTD, including its Members, Board, Committees, and Working Groups

4 Why, What, Who? Why? – enhance graduate education – expand global research collaboration What? – help students communicate more effectively – get ETDs for all TDs: next goal 5 million – help make ETDs open, accessible, preserved Who? – levels: students, faculty, staff, (grad) administrators – professions: CS, IT, LIS, librarians, archivists

5 How? Authoring systems, tools, methods Data and auxiliary information management aids Metadata creation software and techniques Submission, approval, refinement workflows Local access and information management Sharing, disseminating, discovering – OAI, data providers, harvesting – Regional/national, global institutions Services: access, preservation, adding value Add back files

6 Improving – 1 of 2 Context: Quality frameworks, references on quality Guidelines and documentation for all of this Works – XML + PDF + raw/original representations – Multimedia, software, simulations, websites, dynamic content Data, auxiliary information, references/bibliographies – Reproducibility Metadata – Completeness: subject classification, faculty by role – Authority info

7 Improving – 2 of 2 Local services – Training, assistance – IR, archives, archival consortia Global services – Browse, faceted search, full-text search – Recommend, CLIR, CBIR, summaries, topics – Linked data, hyperlinks, citation linking – Alerts, notifications, RSS feeds, filtering

8 Borgman et al. 1996 http://is.gseis.ucla.edu/research/dig_libraries/ 8 Information Life Cycle (adapted) Authoring Modifying Classifying Tagging Recommending Indexing Storing Retrieving Distributing Networking Retention / Mining Filtering Using Downloading Citing Discovering

9 Quality and the Information Life Cycle

10 Quality Dimensions

11 11 Digital Library Service Taxonomy

12 Improve related movements Make related efforts work for graduate researchers, ETDs, and university ETD activities: Open access, institutional repositories Sharing references and citations: Zotero, … Sharing data, datasets, workflows; reproducible science: reproducibleresearch.net, … Building author profiles: ORCID, ISNI, … Digital libraries and DL education (DL2014)

13 Related technical contributions Broadly: new/better systems, user/usage studies, added services, improved practices Automatically assign topics or categories to ETDs or to portions (e.g., chapters) to aid browsing and (faceted) searching Build a union reference collection: by aiding authors (e.g., Hiberlink) and/or by automatic ETD text mining Enhanced information retrieval: cross language IR, content based IR (image/video/music) …

14 Topic determination Given a document, extract or generate generalized description of its topics Statistical approaches, e.g., LDA Knowledge based approaches, e.g., Xpantrac – Take a webpage or document – Use portions of it to build queries to a knowledge source (Web, Wikipedia, and ETD collection) – Combine, analyze, and summarize the results – Seungwon Yang, "Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach", Jan. 2014, Ph.D. dissertation, http://hdl.handle.net/10919/25111

15 ETD Classification: Venkat Srinivasan Enhance metadata by adding subject categories Hierarchical classification of ETDs (and chapters thereof) using Library of Congress categories Training data – OCLC’s WorldCat: records from 1M books have good labels but little metadata; labels on ETDs not usable – Results coming from queries each designed to describe a category – Need to balance negative and positive examples throughout the LoC taxonomy

16 Category Tree Document Sets GoogleNaïve Bayes Classifiers Training Sets Web Interface ETD Collection Categorized ETDs Category label for each node used as query Top 50 webpages (for each node in the tree) Cleanup (stemming, stopword removal, etc.) Level-wise categorization ETD metadata used for categorization Browsing Training ETDs categorized into a node of the category tree (after classification) ETD Classification: Algorithm Pipeline

17 Reference Extraction and Databasing 1.How can we implement metadata schema for bibliographic information? 2.What machine learning methods are effective to extract reference sections including footnotes and chapter references? Sung Hee Park, "Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web", June 2013, VT CS Ph.D. dissertation

18 Dataflow of Reference Section Extraction Pdf2 txt ETD in PDF Feature Extraction Reference Section Extraction Learning Training data Tagged data Feature Extraction

19 ETD References: System Architecture ETD Repository Users Web App (e.g., ETD-db) https://github.com/VTUL/etddb2 Metadata with References Searching, Browsing, Manipulating Extracting Reference Sections Union ETD References ?

20 Discovery, Search Engines, Info. Retrieval (to be extended for images, etc.) Documents Search Ranking Q D Query Results Best matches (Q with D) selected Quality of many systems is low, with recall and precision at only around.5, as opposed to 1 at 1.

21 Search Module Detail (features can be about text, images, …) Query Q Document D1 Feature vector Q Similarity Function Feature vectors D1 Feature vectors D1 S = Sim(Q,D1) In CBIR (Content Based Image Retrieval), search is based on visual content of images – Color – Shape – Texture …

22 22 DL Definitions: Informal 5S DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) Use this as: checklist, design guidelines, basis for formal description, specification for software implementation; e.g., Spaces help re GIS, VR

23 Digital Library Books Edward A. Fox and Jonathan P. Leidig, eds. Digital Library Applications: CBIR, Education, Social Networks, eScience/Simulation, and GIS. Morgan & Claypool Publishers, 2014, 175 p., http://dx.doi.org/10.2200/S00565ED1V01Y201401ICR032 Edward A. Fox and Ricardo da Silva Torres, eds. Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security. Morgan & Claypool, 2014, 205 p., http://dx.doi.org/10.2200/S00566ED1V01Y201401ICR033 Rao Shen, Marcos Andre Goncalves, and Edward A. Fox. Key Issues Regarding Digital Libraries: Evaluation and Integration. Morgan & Claypool, 2013, 110 p., http://dx.doi.org/10.2200/S00474ED1V01Y201301ICR026 Edward A. Fox, Marcos Andre Goncalves, and Rao Shen. Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach. Morgan & Claypool, 2012, 180 p., http://dx.doi.org/10.2200/S00434ED1V01Y201207ICR022, supplementary website https://sites.google.com/a/morganclaypool.com/dlibrary/

24 DL Curriculum Project NSF awards to VT and UNC-CH: CS and LIS Project server: http://curric.dlib.vt.edu/ Wikiversity: http://en.wikiversity.org/wiki/Curriculum_on_Dig ital_Libraries Table 1: Core DL Curriculum Table 2: Information Retrieval Packages Table 3: LucidWorks Big Data Software Table 4: Multimedia Software 24

25 DL Curriculum Module Template 1. Module name 2. Scope 3. Learning objectives 4. 5S characteristics of the module (streams, structures, spaces, scenarios, society) 5. Level of effort required (in-class and out-of-class time required for students) 6. Relationships with other modules (flow between modules) 7. Prerequisite knowledge/skills required (what the students need to know prior to beginning the module; completion optional; complete only if prerequisite knowledge/skills are not included in other modules) 8. Introductory remedial instruction (the body of knowledge to be taught for the prerequisite knowledge/skills required; completion optional) 9. Body of knowledge (theory + practice; an outline that could be used as the basis for class lectures) 10. Resources (required readings for students; additional suggested readings for instructor and students) 11. Exercises / Learning activities 12. Evaluation of learning objective achievement (graded exercises or assignments) 13. Glossary 14. Additional useful links 15. Contributors (authors of module, reviewers of module) 25

26 DL Curriculum Framework 26

27 DL Curriculum Modules - examples Module 1-b: History of digital libraries and library automation Module 2-c: File Formats, Transformation, and Migration Module 3-b: Digitization Module 4-b: Metadata Module 5-a: Architecture overviews … 27

28 Summary Scene

29 Conclusion: Improving together Who will help? What can we do? What knowledge and education is needed? What connections, integrations, collaborations can help with ETDs? Please comment and share! – Ed Fox (fox@vt.edu, http://fox.cs.vt.edu/talks/2014)fox@vt.edu http://fox.cs.vt.edu/talks/2014


Download ppt "1 Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,"

Similar presentations


Ads by Google