Teaching Big Data Through Problem-Based Learning Richard Gruss, Business Information Technology, Virginia Tech Tarek Kanan Software Engineering Department.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Web Archive Content Analysis: Disaster Events Case Study IIPC 2015 General Assembly Stanford University and Internet Archive Mohamed Farag Dr. Edward A.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
TextMOLE: Text Mining Operations Library and Environment Daniel B. Waegel and April Kontostathis, Ph.D. Ursinus College Collegeville PA.
Chapter 5: Information Retrieval and Web Search
Arabic Natural Language Processing: P-Stemmer, Browsing Taxonomy, Text Classification, RenA, ALDA, and Template Summaries — for Arabic News Articles Tarek.
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech Feb. 18, 2015 presentation for.
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech 21 May 2015 presentation for.
Basic Concepts in Big Data
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
CS 5604 Spring 2015 Classification Xuewen Cui Rongrong Tao Ruide Zhang May 5th, 2015.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Laboratory for Internet Computing Harnessing Distributed, Heterogeneous Information Sources –Data integration with different formats –Extraction of information.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Developing a Concept Extraction Technique with Ensemble Pathway Prat Tanapaisankit (NJIT), Min Song (NJIT), and Edward A. Fox (Virginia Tech) Abstract.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Reducing Noise CS5604: Final Presentation Xiangwen Wang, Prashant Chandrasekar.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Solr Team CS5604: Cloudera Search in IDEAL Nikhil Komawar, Ananya Choudhury, Rich Gruss Tuesday May 5, 2015 Department of Computer Science Virginia Tech,
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
1 IBM Academic Initiative Introduction for Pamplin School of Business Virginia Tech – October 13, 2011 “IBM Academic Skills Cloud and Computing Education.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Crisis, Tragedy and Recovery Network (CTRnet) Slides by Kiran Chitturi, Edward A. Fox, and the CTRnet team
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
ELISQ Seminar Qatar National Library 20 May 2015 Introduction by Edward A. Fox Professor, Computer Science, Virginia Tech Blacksburg, VA USA
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
GFURR seminar Can Collecting, Archiving, Analyzing, and Accessing Webpages and Tweets Enhance Resilience Research and Education? Edward A. Fox, Andrea.
Information Storage and Retrieval(CS 5604) Collaborative Filtering 4/28/2016 Tianyi Li, Pranav Nakate, Ziqian Song Department of Computer Science Blacksburg,
Big Data Processing of School Shooting Archives
CS6604 Digital Libraries Global Events Team Final Presentation
Collection Management Webpages
Computer Science Courses
Natural Language Processing (NLP)
Course Summary (Lecture for CS410 Intro Text Info Systems)
Floods Joe Acanfora, Myron Su, David Keimig and Marc Evangelista
Richard Gruss, Daniel Morgado, Nate Craun, Colin Shea-Blymyer
Virginia Tech Blacksburg CS 4624
CS 5604 Information Storage and Retrieval
Information Retrieval
Event Focused URL Extraction from Tweets
Collection Management Webpages Final Presentation
CS6604 Digital Libraries IDEAL Webpages Presented by
Information Storage and Retrieval
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Computational Linguistic Analysis of Earthquake Collections
Through the Fire and Flames
CSE 635 Multimedia Information Retrieval
How to publish in a format that enhances literature-based discovery?
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Chapter 5: Information Retrieval and Web Search
CS246: Information Retrieval
PURE Learning Plan Richard Lee, James Chen,.
Natural Language Processing (NLP)
Computer Science Courses in the Major
Presentation transcript:

Teaching Big Data Through Problem-Based Learning Richard Gruss, Business Information Technology, Virginia Tech Tarek Kanan Software Engineering Department Al Zaytonah University of Jordan Xuan Zhang, Mohamed Farag, Edward A. Fox, Computer Science, Virginia Tech Mary C. English, The Center for Advancing Teaching and Learning Through Research, Northeastern University Courses in Computational Linguistics and Information Retrieval

Big Data Gartner: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." IBM: 2.5 quintillion bytes of data every day, 90% of it created in the past two years. Merrill Lynch and Gartner: 85% of data is unstructured.

Who we are Richard Gruss, PhD student, Business Information Technology, Pamplin College of Business, Virginia Tech Ed Fox, Professor, Computer Science College of Engineering, Virginia Tech Tarek Kanan, Xuan Zhang, Mohamed Farag PhD Students of Dr. Fox Mary English PhD in Educational Psychology Associate Director, Center for Advancing Teaching & Learning through Research, Northeastern University (formerly Virginia Tech)

Problem-Based Learning (PBL) Theory John Dewey: “experiential learning” (1910s) Lev Vygotsky: “zone of proximal development” (1930s) Benjamin Bloom: “active learning” (1950s) Bloom’s Taxonomy (New Version) Listening != learning Students in lectures are twice as likely to leave engineering, 3 times as likely to drop out

Problem-Based Learning (PBL) A single question drives and organizes the learning activities. Learning is done “Just-In-Time.” Emphasis is placed on the process rather than the product. The task of the instructor is to provide a relevant and authentic question and serve as a facilitator.

Integrated Digital Event Archiving and Library (IDEAL)

Integrated Digital Event Archiving and Library (IDEAL) Over 11 terabytes of webpages and about 1 billion tweets natural disasters (earthquakes, storms, floods) man-made disasters (protests, terrorism, conflicts) Community events

Two courses: -Computational Linguistics (senior undergraduate) -Information Retrieval (intro graduate) Driving question Course Structure Concepts and Technologies Evaluation: Technology artifacts Student feedback

CS 4984: Computational Linguistics Undergraduate capstone course Driving Question: “What is the best summary that can be automatically generated for your type of event?” 7 teams, all performing the same analysis on different collections of text

CS 4984: Computational Linguistics Course Structure: Scaffolding

CS 4984: Computational Linguistics Concepts Technologies Linguistics concepts: morphology, semantics, inflection, meronymy, hypernymy Tokenization, stemming, lemmatization Word sense disambiguation Part of Speech tagging, deep parsing Named Entity Recognition, Topic Allocation Information extraction Natural language generation Machine learning: clustering, classification Python, Natural Language Tool Kit (NLTK) Natural Language Processing tools: Stanford NER, OpenNLP Hadoop Streaming HDFS

CS 4984: Computational Linguistics Evaluation: Technology Artifacts

CS 4984: Computational Linguistics VTechWorks (

CS 4984: Computational Linguistics Evaluation: Student Feedback Question%agree I have a deeper understanding of the subject matter 75 My interest in the subject matter was stimulated by this course 88 Overall, the instructor's teaching was effective 88 “ The instructor stimulated and encouraged independent thinking and questioning. This inspired us to research and come up with our own techniques to solve problems.” “I loved the free reign that we got to attack the problem on our own and read on our own. I think this is the best way to learn. A+”

CS 5604: Information Retrieval Introductory graduate level course Driving Question: “How can we best build a state-of-the-art IR system in support of a large digital library project?” 7 teams, all performing different tasks along a processing pipeline

CS 5604: Information Retrieval Course Structure: The Goal

CS 5604: Information Retrieval Course Structure: The Goal

CS 5604 Course Structure The Architecture

Concepts Technologies Indexing: inverted, in-memory, distributed, dynamic Vector Space Model: doc representation, TF-IDF, length normalization Result evaluation: precision, recall, F-Score Probabilistic Language Modeling Text classification and clustering Social Network Analysis Latent Semantic Analysis Hadoop: HDFS, MapReduce, HBase, AVRO Apache Mahout, Weka Solr, Velocity, Carrot 2 CS 5604: Information Retrieval

Evaluation: Technology Artifacts QueryTime (sec) Number of ResultsPrecision election , revolution.04513, uprising storm , ebola , disease shooting Performance of Information Retrieval System

CS 5604: Information Retrieval VTechWorks (

CS 5604: Information Retrieval Student Response 20 question poll, rate 1-5 on “Rate how well this approach helped you to…” QuestionScore Think independently4.4 Consider alternative solutions to problems 4.3 Identify gaps in your knowledge % said they would recommend this approach for future classes.

Acknowledgements US National Science Foundation, DUE US National Science Foundation, IIS

Supplementary Materials

CS 4984: Computational Linguistics Scholar Site

CS 4984: Computational Linguistics Piazza Site

CS 4984: Computational Linguistics {'date': ' ', 'source': ' ', 'cases': '810', 'location': 'Sierra Leone', 'deaths': '348'} {'date': ' ', 'source': ' ', 'cases': '127', 'location': 'West Africa', 'deaths': 0} {'date': ' ', 'source': ' ', 'cases': '784', 'location': 'Sierra Leone', 'deaths': 0} {'date': ' ', 'source': ' ', 'cases': '53', 'location': 'Liberia', 'deaths': 0} {'date': ' ', 'source': ' ', 'cases': '127', 'location': 'Guinea', 'deaths': 0} {'date': ' ', 'source': ' ', 'cases': 0, 'location': 'Guinea', 'deaths': '1400'} {'date': ' ', 'source': ' ', 'cases': 0, 'location': 'Liberia', 'deaths': '1400'} {'date': ' ', 'source': ' ', 'cases': '293', 'location': 'Sierra Leone', 'deaths': 0} Sample Results

CS 4984: Computational Linguistics Sample Results

CS 4984: Computational Linguistics Sample Results There has been an outbreak of Ebola reported in the following locations: Liberia, West Africa, Nigeria, Guinea, and Sierra Leone. In January 2014, there were between 425 and 3052 cases of Ebola in Liberia, with between 2296 and 2917 deaths. Additionally, In January 2014, there were between 425 and 4500 cases of Ebola in West Africa, with between 2296 and 2917 deaths. Also, In January 2014, there were between 425 and 3000 cases of Ebola in Nigeria, with between 2296 and 2917 deaths. Furthermore, In January 2014, there were between 425 and 3052 cases of Ebola in Guinea, with between 2296 and 2917 deaths. In addition, In January 2014, there were between 425 and 3052 cases of Ebola in Sierra Leone, with between 2296 and 2917 deaths. There were previous Ebola outbreaks in these areas. Ebola was found in 1989 in Liberia. As well, Ebola was found in 1989 in West Africa. Likewise, Ebola was found in 1989 in Nigeria. Additionally, Ebola was found in 1989 in Guinea. Also, Ebola was found in 1989 in Sierra Leone.

CS 5604: Information Retrieval Team Responsibilities

CS 5604: Information Retrieval Search Performance (first 1000 results)

CS 5604: Information Retrieval Custom Solr Search Field weights Custom result list processing