Event Focused URL Extraction from Tweets

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Web Archive Content Analysis: Disaster Events Case Study IIPC 2015 General Assembly Stanford University and Internet Archive Mohamed Farag Dr. Edward A.
UCI KDD Archive University of California at Irvine –
Siemens Big Data Analysis GROUP 3: MARIO MASSAD, MATTHEW TOSCHI, TYLER TRUONG.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
CTRnet: A Crisis, Tragedy, & Recovery Network ( Oct.16, 2009 VCOM Research Day Blacksburg, VA USA Edward Fox Bidisha.
Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,
Topical Categorization of Large Collections of Electronic Theses and Dissertations Venkat Srinivasan & Edward A. Fox Virginia Tech, Blacksburg, VA, USA.
Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.
Natural language processing tools Lê Đức Trọng 1.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
ProjFocusedCrawler CS5604 Information Storage and Retrieval, Fall 2012 Virginia Tech December 4, 2012 Mohamed M. G. Farag Mohammed Saquib Khan Prasad Krishnamurthi.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
Teaching Big Data Through Problem-Based Learning Richard Gruss, Business Information Technology, Virginia Tech Tarek Kanan Software Engineering Department.
GFURR seminar Can Collecting, Archiving, Analyzing, and Accessing Webpages and Tweets Enhance Resilience Research and Education? Edward A. Fox, Andrea.
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
4/13/2006CS4624: Multimedia, Hypertext, and Information Access 1 Superimposed Information By Uma Murthy and Edward Fox Source: NSF/NSDL project proposal.
Big Data Processing of School Shooting Archives
Sentiment and Topic Analysis
Stock Trading with Microblog Sentiments
Global Event Detector Final Project Presentation
CS6604 Digital Libraries Global Events Team Final Presentation
Collection Management (Tweets) Final Presentation
Collection Management Webpages
Common Crawl Mining Team: Brian Clarke, Tommy Dean, Ali Pasha, Casey Butenhoff Manager: Don Sanderson (Eastman Chemical Company) Client: Ken Denmark.
Identifying Drug Related Events from Social Media
Background Check Website for R4 OpSec, LLC
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
CLA Team Final Presentation CS 5604 Information Storage and Retrieval
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Virginia Tech Center for Drug Discovery Website Migration and Redesign
VR4GETAR CS4624: Multimedia, Hypertext and Information Access
Visualizations of School Shootings
Trail Study Kevin Cianfarini, Shane Davies, Marshall Hansen, Andrew Eason … CS4624: Multimedia, Hypertext, and Information Access Instructor: Dr. Edward.
Clustering and Topic Analysis
Tweet Collections Multimedia, Hypertext, and Information Access
Clustering tweets and webpages
Pathways Web CS4624 Multimedia, Hypertext, and Information Access
The Team Ernesto Cortes Kipp Dunn Sar Gregorczyk Alex Schmidt
Tracking Theatre/Cinema Production Experience
Graph Query Portal Amit Dayal David Brock
Team FE Final Presentation
Collection Management Webpages Final Presentation
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Final Presentation: Neural Network Doc Summarization
LucidWorks: Vectorize Workflow Module
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Paleontology Topic Trends
Computational Linguistic Analysis of Earthquake Collections
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
IDRgeneralization: Music Appreciation
Through the Fire and Flames
New Event Detection CS 4624 Virginia Tech Spring 2015
CS5984:Big Data Text Summarization
Autism Support Portal Members: Sib Quayum, Ryan Galliher, Ayumi Ritchie, Kenneth Nagies Course: Multimedia, Hypertext, and Information Access (CS 4624)
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Python4ML An open-source course for everyone
Presentation transcript:

Event Focused URL Extraction from Tweets By: Chris Bridges, Carter Tat, David Chun CS 4624: Multimedia, Hypertext, and Information Access Instructor: Edward A. Fox Client: Liuqing Li April 24, 2018 Virginia Tech, Blacksburg VA 24061 Slide Owner: Chris

Outline Project Goal Overall Design Testing /Evaluation Demo References Acknowledgements

Project Goal Link existing Twitter collections and Event Focused Crawler (EFC) Classify and rank relevance of URLs in Tweets to collection using deep learning and natural language processing techniques Provide client with program that ties it all together

Overall Design

Testing/Evaluation 80% Training and 20% Testing Classifiers Decision Tree Random Decision Forest Support Vector Classifier (SVC) Gaussian NB Cross-Validated using 10 subsamples

Results Classifier Decision Tree Random Forest Support Vector (SVC) GaussianNB Test Accuracy 0.970967 0.974193 0.969354 0.790322 Cross Validation Accuracy 0.94 (+/- 0.06) 0.95 (+/- 0.06) 0.75 (+/- 0.29)

Optimal Parameters

Demo Slide Owner: David

Demo “Future Florida Gators Softball Prodigy Is the Youngest NCAA Commit of All Time”

Demo “Kentucky school shooting: 2 students killed, 18 injured”

References “Sklearn.svm.SVC.” Sklearn.svm.SVC - Scikit-Learn 0.19.1 Documentation, Web. www.scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC. Accessed 23. Apr 2018. Moreira, Gabriel. “Discovering User's Topics of Interest in Recommender Systems.” LinkedIn SlideShare, 7 July 2016, Web. www.slideshare.net/gabrielspmoreira/discovering-users-topics-of-interest-in-recommender-systems-tdc-sp-2016. Accessed 23. Apr 2018 TextMiner. “Dive Into NLTK, Part IV: Stemming and Lemmatization.” Text Mining Online, 18 July 2014, Web. www.textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization. Accessed 23 Apr. 2018 "Events Archive (GETAR)." Events Archive. Web. https://www.arc.vt.edu/vt-rnet/edfox/. Accessed 23 Apr. 2018. “Software Stanford Named Entity Recognizer (NER)." The Stanford Natural Language Processing Group. Web. https://nlp.stanford.edu/software/CRF-NER.shtml. Accessed 23 Apr. 2018. "Natural Language Toolkit." Natural Language Toolkit - NLTK 3.2.5 Documentation. Web. https://www.nltk.org/. Accessed 23 Apr. 2018. "Gensim: Topic Modelling for Humans." Radim Řehůřek: Machine Learning Consulting. Web. https://radimrehurek.com/gensim/ . Accessed 23 Apr. 2018.

Acknowledgements Project Client: Liuqing, Li Instructor: Edward A. Fox Global Event and Trend Archive Research (GETAR) is supported by NSF (IIS-1619028 and 1619371)

Questions?