Fox@vt.edu http://fox.cs.vt.edu ENGR 1014: Engineering Research Seminar 2 September 2016, Virginia Tech “Information Research” by Edward A. Fox fox@vt.edu.

Slides:



Advertisements
Similar presentations
Building an Ontology for Crisis, Tragedy, and Recovery Oct. 1, 2009 NKOS Workshop, ECDL 2009 Corfu, Greece Uma Murthy, Edward Fox, Naren Ramakrishnan,
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
ETANA-DL: Leveraging Digital Library Technologies to Support Archaeology Vanderbilt University Nashville, TN -- Sept. 8, 2006 Weiguo Fan, Edward A. Fox,
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
1 CHCI Visit by Dean Benson, Associate Dean Lesko KW II Rm – 10/10/2011 Digital Library Research Laboratory Torgersen Hall Rm 2030 –
Ontology Classifications Acknowledgement Abstract Content from simulation systems is useful in defining domain ontologies. We describe a digital library.
Digital Library Education in Computer Science Programs Jeffrey Pomerantz Barbara M. Wildemuth Sanghee Oh School of Info. & Library Science UNC Chapel Hill.
Digital Libraries. Synchronous Scholarly Communication Same time, Same or different place.
Modern Information Retrieval Chapter 1 Introduction.
Designing, Developing, and Evaluating an Interdisciplinary Digital Library Curriculum Jeffrey Pomerantz School of Information & Library Science University.
1 CS5604 October 13, 2010 “5S Overview for Modules” by Edward A. Fox and Lillian (Boots) Cassel (on Ensemble) Dept. of.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
1 Linking Research and Education in Digital Libraries TPDL 2011 Workshop September, Berlin “Living In the KnowlEdge Society: the double duty of.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Digital Library Research Laboratory Torgersen Hall 2030 – (part of IT at VT) and Department of Computer Science CS4624: Multimedia, Hypertext,
Seungwon Yang, Edward A. Fox, Barbara M. Wildemuth, Sanghee Oh and Jeffrey P. Pomerantz 1 JCDL/ICADL'10 Digital Libraries & Education Workshop.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Collaborative Research: Curriculum Development for Digital Library Education Presentation in May 1,2006
1 1 st Canadian ETD & Open Repositories Workshop May 10-11, 2010 Carleton University, Ottawa “Opening and Expanding Digital Library Services” by Edward.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
CTRnet: A Crisis, Tragedy, & Recovery Network ( Oct.16, 2009 VCOM Research Day Blacksburg, VA USA Edward Fox Bidisha.
CITIDEL: Computing & Information Technology Interactive Digital Educational Library Web Page: Contacts: Future.
1 NDLTD Welcome and Introduction ETD 2011: 14 th Int. Symp. on ETDs Cape Town, South Africa Edward A. Fox Executive Director, NDLTD,
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Computing Ontology Part II. So far, We have seen the history of the ACM computing classification system – What have you observed? – What topics from CS2013.
XXDL and CSTC and Virginia Tech NSDL Fall 2000 PI Meeting September 22-24, 2000 NSF, Arlington, VA Edward A. Fox CS DLRL.
Digital Libraries Lillian N. Cassel Spring A digital library An informal definition of a digital library is a managed collection of information,
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization Paper by: Rao Shen, Naga Srinivas Vemuri, Weiguo Fan, Ricardo da S. Torres,
1 Video Message: Welcome ETD 2015: 18 th Int’l Symposium on ETDs New Delhi, India Edward A. Fox Executive Director, Chairman of the Board NDLTD,
Introduction to Concept Maps Edward A. Fox and Rao Shen CS5604 Fall 2002 “Information Storage & Retrieval” Dept. of Computer Science Virginia Tech, Blacksburg,
Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization JCDL 2006, Chapel Hill, NC, June 12, 2006 Rao Shen, Naga Srinivas Vemuri,
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Visual Semantic Modeling of Digital Libraries Qinwei Zhu, Marcos André Gonçalves, Rao Shen, Edward A. Fox – Virginia Tech,, Blacksburg, VA, USA Lillian.
SCENARIO-BASED GENERATION OF DIGITAL LIBRARY SERVICES Rohit Kelapure, Marcos André Gonçalves, Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Crisis, Tragedy and Recovery Network (CTRnet) Slides by Kiran Chitturi, Edward A. Fox, and the CTRnet team
ELISQ Seminar Qatar National Library 20 May 2015 Introduction by Edward A. Fox Professor, Computer Science, Virginia Tech Blacksburg, VA USA
5S Perspective Digital Libraries Foundations Workshop at JCDL 2007 Vancouver – June 23 Edward A. Fox Virginia Tech, USA
GFURR seminar Can Collecting, Archiving, Analyzing, and Accessing Webpages and Tweets Enhance Resilience Research and Education? Edward A. Fox, Andrea.
CTRnet Digital Library for Disaster Information Services Seungwon Yang 1, Andrea Kavanaugh 1, Nádia P. Kozievitch 4, Lin Tzy Li 1,4,5, Venkat Srinivasan.
1 Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Review for CS4624 Spring 2001 E. Fox May 2, 2001 Virginia Tech, Dept. of CS.
Big Data Processing of School Shooting Archives
Data mining in web applications
CS6604 Digital Libraries Global Events Team Final Presentation
Curriculum Development for Digital Libraries
Big Data Science Workshop 12 January 2017, Virginia Tech Digital Libraries and Big Data Edward A. Fox Prashant Chandrasekar, Islam Harb,
Launch, Persevere, and Collaborate
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
VI-SEEM Data Repository
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Clustering and Topic Analysis
Clustering tweets and webpages
CS 1104 INTRODUCTION TO COMPUTER SCIENCE
ETDs for Life Panel ETD 2014: 17th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD,
Collection Management Webpages Final Presentation
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
NSF: Interested in education History: DLs dev for UG ed
Social Interactome Recommender Team
Presentation transcript:

fox@vt.edu http://fox.cs.vt.edu ENGR 1014: Engineering Research Seminar 2 September 2016, Virginia Tech “Information Research” by Edward A. Fox fox@vt.edu http://fox.cs.vt.edu Dept. of Computer Science, www.cs.vt.edu

Acknowledgements Mentors (Licklider, Kessler, Salton) Virginia Tech, CS, Digital Library Research Laboratory (DLRL) NSF and other sponsors Students, colleagues, co-investigators (selected): Monika Akbar, Hamed Alhoori, Pranav Angara, Warren Bickel, Boots Cassel, Prashant Chandrasekar, Yinlin Chen, Kiran Chitturi, Lois Delcambre, Noha ElSherbiny, Alexandre Falcao, Eric Fouh, Chris Franck, Rick Furuta, Lee Giles, Marcos André Gonçalves, Doug Gorton, Islam Harb, Tarek Kanan, Andrea Kavanaugh, Nadia Kozievitch, Spencer Lee, Sunshin Lee, Jonathan Leidig, Lin Tzy Li, Yi Ma, Mohamed Magdy, Uma Murthy, Pranav Nakate, Sung Hee Park, Sagnik Ray Choudhury, Rao Shen, Clifford Shaffer, Steve Sheetz, Don Shoemaker, Venkat Srinivasan, Ricardo Torres, Zhiwu Xie, Xiaoyan Yu, Xuan Zhang, ... DL Curriculum: Sanghee Oh, Jeffrey Pomerantz, Barbara Wildemuth, Seungwon Yang

Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information (bandwidth, connectivity) Communications Computing (flops) Digital content Note: we should consider 4 dimensions: computing, communications, content, and community (people) less more

Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place

Digital Libraries Shorten the Chain from Author Editor Reviewer Publisher A&I Consolidator Library Reader

DLs Shorten the Chain to Roles Digital Library Author Teacher User Reader Editor Learner Reviewer Librarian

Information Life Cycle Creation Active Authoring Modifying Social Context Using Creating Organizing Indexing Retention / Mining This is a simplification of the previous slide. Accessing Filtering Storing Retrieving Semi- Active Utilization Distributing Networking Inactive Searching

Wordle from Fox CV

INFORMATION Text WWW Data Knowledge Design of INFORMATION Access Extraction Representation Retrieval Systems Technology Theory Viz Libraries Archives Hypermedia Multimedia Text WWW Hypertext Images Search Engine Crawling Webpage Links Videos Mining Analytics Machine Learning Relational Statistics NLP AI Database Tables Data Knowledge

DL Curriculum Framework Introduction DL Curriculum Framework

Informal 5S & DL Definitions DLs are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)

5Ss Ss Examples Objectives Streams Structures Spaces Scenarios Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data (see DL Book 4 Ch. 1) Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content; supports annotations including with subdocuments (see DL Book 3 Ch. 2) Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

ETANA-DL Architecture DigBase and DigKit Search U S E R I N T F A C Lahav D A T B S E W R P Browse Nimrin Recommend Umayri ETANA-DL UNION CATALOG Note Hisban Personalize Megiddo Review Visualizations Jalul Archaeology Specific … New Sites

Data Mapping Framework in a Digital Library with Computational Epidemiology Datasets S.M.Shamimul Hasan, Sandeep Gupta, Edward A. Fox, Keith Bisset, Madhav Marathe --- Virginia Tech (CS, BI)

ETD Classification: Algorithm Pipeline Venkat Srinivasan ETDs categorized into a node of the category tree (after classification) Category Tree ETD Collection Category label for each node used as query ETD metadata used for categorization Categorized ETDs Google Naïve Bayes Classifiers Level-wise categorization Top 50 webpages (for each node in the tree) Browsing Training Web Interface Document Sets Training Sets Cleanup (stemming, stopword removal, etc.) Venkat Srinivasan

Funded Grants NSF CRISP : Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), PI Pamela Murray-Tuite, Co-PIs Edward Fox, Kris Wernstedt; U. Mich. Ann Arbor, PI Seth Guikema NSF IIS: Global Event and Trend Archive Research (GETAR), PI Fox, Co-PIs Alla Rozovskaya, Andrea L. Kavanaugh, Donald J. Shoemaker; Internet Archive, PI Jefferson Bailey. IMLS LG: Developing Library Cyberinfrastructure Strategy for Big Data Sharing and Reuse; Zhiwu Xie (PI), Tyler Walters, Edward Fox (20%), Pablo Tarazaga; with eval. from University of North Texas NSF CREST: Building Capacity in Information Management through a Partnership with Virginia Tech's Digital Library Technology Center, PI Fox (with main grant to UTEP) VT ARC. VT-Rnet: A 10-Gbps Research Network for Virginia Tech. In-kind support to connect the Digital Library Research Laboratory Hadoop Cluster to VT's 10 gigabits per second network NEH EH: Veterans in Society Summer Institute for College Teachers, PI James M. Dubinsky, co-PI Bruce E. Pencek, Investigator Fox NIH: The Social Interactome of Recovery: Social Media as Therapy Development; PI Warren K. Bickel (VTCRI), Fox as co-PI NSF IIS: Integrated Digital Event Archiving and Library (IDEAL); PI Fox, with co-PIs Donald Shoemaker, Andrea Kavanaugh, Steven Sheetz, and Kristine Hanna (Internet Archive)

IMLS: Developing Library Cyberinfrastructure Strategy for Big Data Sharing and Reuse 3 patterns for Library Big Data Services

Communication Analysis in the Social Interactome Abigail Bartolome, Advised by Dr. Edward A Fox NIH Grant: 1R01DA039456-01 The Social Interactome of Recovery: Social Media as Therapy Development Acknowledgements to Dr. Chris Franck, Prashant Chandrasekar, Lexie Mellis Virginia Tech CS 4994, April 2016 Text Classification Multinomial, naïve-Bayes classification considers the count for each feature name in making classifications Training the classifier: built a corpus of 150 documents– 75 of which were sentences that were clearly indicative of belonging to a success story and 75 of which were sentences that were not indicative of a success story Acknowledgements to Victoria Worrall for her efforts on this classifier last semester Network Structures Lattice Network Small-world Network 128 participants 22 users in the most connected component 4 users in the most connected component Queried the Friendica database to see who the participants wrote text to and who the participants received text from Generated graph of the private messaging communication in the lattice social network Lattice Network with Administrator Removed Small-Network with Administrator Removed Samples of Story Classification "Since being in recovery I have not been around any drugs or alcohol but if I had to, such as a wedding or something I wouldn't have a problem saying that I don't drink or I'm in recovery." => success 'Drove very drunk.' => not_success

IDEAL stakeholders Help affected communities to recover more quickly and effectively Provide global network with relevant information and resources Support the research community, emergency personnel, decision makers, and the public in reacting to and recovering from crises

Archiving and Analyzing using Bigdata Hadoop cluster

What Causes Water Main Breaks? Earthquakes (USGS) Mar. 1 – Apr. 5, 2012 Search earthquake Histogram: March 2014, May 2015 => not Winter Location Name: Fullerton, CA; La Habra, CA; Brea CA

Who is involved in a WMB ? Fix water pipe Traffic Affected Others … Water utility city/town utility Traffic Police Affected Citizen Others … Click “NewYork” in user_city_s See organization: FDNY, MTA (Metropolitan Transportation Authority), NYU Person name: De Blasio Hashtags, Mentions Lakewood, NJ, June. 2014 West Philadelphia, PA, June. 2015

GETAR Architecture - 1

GETAR Architecture - 2

GETAR: Areas, Investigators, Courses

Where Can You Fit in CS? CS Looking Outward: CS – Looking Inside: Interaction: Games, Graphics, HCI, VR/AR Programming: Algorithms, Languages, Problem Solving, Workflows Simulation: Agents, Modeling: Epidemiology KID: Knowledge, Information, Data: AI, Machine Learning HPC <-> PC <-> GPU Networking Programming Algorithms, Languages, Problem Solving Workflows Systems Theory