大数据科学与人才培养的互利关系 Education for Big Data and Big Data for Education: Towards Integration of Big Data and Education ChengXiang Zhai (翟成祥) Department of Computer.

Slides:



Advertisements
Similar presentations
PROBLEM-BASED LEARNING & CAPACITY BUILDING
Advertisements

Agent-Based Architecture for Intelligence and Collaboration in Virtual Learning Environments Punyanuch Borwarnginn 5 August 2013.
08/01/ Final Conference PEER-LEARNING AND BEYOND Krzysztof Gurba Patras, 2014.
MOOC overview | September 2014 MOOCs – 2 years later moocs.epfl.ch Karl Aberer Contributions from Patrick Jermann, Pierre Dillenbourg, Dimitris Noukakis.
Yvan Rooseleer – BiASC – MAY 2013
Supporting Classroom Interaction with Networked Tablet PCs Richard Anderson Professor of Computer Science and Engineering University of Washington.
Ryann Kramer EDU Prof. R. Moroney Summer 2010.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
The Common Core Curriculum By Dean Berry, Ed. D. Gregg Berry, B.A.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
12 November 2010 New Way forward to ICT Literacy Training.
INSTRUCTOR & FACULTY ORIENTATION Blackboard 9.1. What is Online Learning? The term online learning is used interchangeably with e-learning or electronic.
Network Comparison Week 2 Assignment – EDLD 5362 ET8012 Laura J. Lopez.
© 2009 All Rights Reserved Jody Underwood Chief Scientist
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
A free, world class education for anyone, anywhere KHANACADEMY.ORG.
Internet of Things in Industries
By: Dr. Vivek Gupta Lecturer in Physics Govt. Girls’ Sen. Sec. School Portmore (Shimla) Ph:
Professional Development How to improve teaching skills.
Training Data Scientists DELSA Workshop DW4 May Washington DC Geoffrey Fox Informatics, Computing.
Taeho Yu, Ph.D. Ana R. Abad-Jorge, Ed.D., M.S., RDN Kevin Lucey, M.M. Examining the Relationships Between Level of Students’ Perceived Presence and Academic.
Athabasca University COMP 683 Introduction to Learning and Knowledge Analytics Project Analytics Model Darin Hobbs
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
W HAT IS C OMPUTER ? A computer is a device that accepts information (in the form of digitalized data) and manipulates it for some result based on a program.
Introduction to Machine Learning, its potential usage in network area,
A different generation with a different future…
Coursera Online Degrees Overview
Learning type: Acquisition
Data Driven Instruction for Personalized Learning
The science of MOOCs.
CS510 Advanced Topics in Information Retrieval (Fall 2017)
Instructional Design Groundwork:
Visualizing Complex Software Systems
ChengXiang (“Cheng”) Zhai Department of Computer Science
CSPA & Digital Transformation
Linda Stewart, Karen King, Mark O’Reilly, Michael Stewart
Professor Jeff Haywood, Vice Principal, CIO & Librarian
We are: A Professional School The largest graduate school of Computer Science in the country We have: A University working with student and professional.
Online Driver Education and Virtual Classroom
The Power of Networks Six Principles That Connect Our Lives
Using DLESE: Finding Resources to Enhance Teaching
From Massive Open Online Courses to Meaningful Open Online Communities
Assignment No. 1.
University of Colombo School of Computing, Colombo, Sri Lanka
Unifying a Taxonomy to Reduce Customer Pain with Content Silos
Designing learner centric MOOCs
MOOCs at OSU Faculty Senate Forum Dave King, Associate Provost
CS410: Text Information Systems (Spring 2018)
School of Information Management Nanjing University China
Edmodo: Instant Connection with Students, Parents, and the Community
A Fully Integrated Print and Digital Program
Introduction to TIMAN: Text Information Managemetn & Analysis
MASSIVE OPEN ONLINE COURSES (MOOCs)
CS510 (Fall 2018) Advanced Topics in Information Retrieval
OMIS 665, Big Data Analytics
Smart Learning concepts to enhance SMART Universities in Africa
Analysis of Forum Discourse in Large Online Classes
Enabling ML Based Research
The Effect of Teaching on Student Learning in the Onsite and MOOC Version of the Nonprofit Governance Course June 1, 2016 Research Presentation 2016.
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Statistics Canada and Data’s New Realty
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Online Teaching & Learning Online Instructor
Online Driver Education and Virtual Classroom
A free, world class education for anyone, anywhere
Edmodo: Instant Connection with Students, Parents, and the Community
Irene-Angelica Chounta Senior Researcher
The African Virtual University
DIGITAL CLASSROOM ICT Enhanced Teaching-Learning
Presentation transcript:

大数据科学与人才培养的互利关系 Education for Big Data and Big Data for Education: Towards Integration of Big Data and Education ChengXiang Zhai (翟成祥) Department of Computer Science University of Illinois at Urbana-Champaign USA BDSE2016, May 25, 2016, Guiyang, China

The Big Data revolution: “DataScope” enhances human perception (数据镜) Microscope Telescope

of Real World Variables Joint Mining of Non-Text DataScope enables prediction & optimal decision making Predicted Values of Real World Variables Change the World Multiple Predictors (Features) … Predictive Model Teacher Student Joint Mining of Non-Text and Text Real World Sensor 1 … Non-Text Data Sensor k … Text Data

Big Data creates both challenges and opportunities for education Challenges for education: Educate many data scientists & engineers quickly and affordably Opportunities for education: Leverage Big Data technology to scale up and improve education Big Data and education are mutually beneficial  Integration! Education supplies workforce for developing innovative Big Data technology and applications Big Data supplies technology for scaling up and improving quality of education Education for Big Data Big Data for Education

Rest of the talk Education for Big Data Big Data for Education Integration of Big Data and Education

Part 1: Education for Big Data “….(in the next few years) we project a need for 1.5 million additional analysts in the United States who can analyze data effectively…“, -- McKinsey Big Data Study, 2012 The need is global …

Educating workforce for Big Data Question 1: What to teach in Big Data? Question 2: How to teach Big Data effectively at large scale with low cost? PhD, MS, BS in Data Science Massive Open Online Courses (MOOCs)

What to teach? New degrees in Data Science? Application Cloud computing Artificial intelligence Operations research Human-computer interactions … + Health, Medicine, Finance, Smart City, Education, … Analysis Highly interdisciplinary! Data mining Machine learning Statistical modeling Scalable systems … Acquisition Aggregation Databases Information retrieval NLP, Computer vision … Sensor network Internet of things Statistical sampling …

How to teach? Emergency of Massive Open Online Courses (MOOCs) Many platforms: Coursera, Edx, Udacity, 清华大学慕课平台,… Characteristics Free/affordable education at large scale on all kinds of topics Limited assessment support, but strong online community support Partnership with universities Early stage of “education revolution” enabled by IT & Big Data (more later)

My experience with MOOCs Taught 2 MOOCs in 2015 = CS410 Text Info Systems at UIUC Text Retrieval and Search Engines Text Mining and Analytics Coordinated Data Mining Specialization: 5 courses + Capstone Pattern Discovery Cluster Analysis Text Retrieval Text Mining Visualization Capstone Project

Text Retrieval & Text Mining MOOCs Each lasted 4 weeks Modularized video lectures Weekly quizzes Programming assignment (open challenge with a leaderboard) with auto grading Enrollment ~50,000 signed up > 10,000 seriously watched lecture videos 1,000~1,500 completed the course 700~900 did programming assignments

Students are from all over the world! 64,651 Learners 181 Countries

The majority of learners are 25~44 years old

US, India, and China have most of the learners United States India China

Most learners have full-time job and {BS, MS} degree

Challenges in teaching “big data” at large scale General challenges in MOOCs Variable student background Variable student needs Reliability of assessment Special challenges to “big data” Programming assignments are essential: variable student resources & background Availability of interesting real-world data sets Automated grading of programming assignments

Programming Assignments for Text Retrieval & Text Mining MOOCs Coursera provides no computing resource Students must work on programming assignments on their own computers Students download assignments to their own computers Auto grading is necessary Grading the output of a program, not the code Help students learn complex algorithms with minimum effort We provide a sophisticated toolkit to every student (through a virtual box image) Students only touch the key algorithm components in the toolkit Students can experiment with existing algorithms and explore new algorithms Leverage students to create data sets Crowdsource annotation of data sets to the students in the course Open competition with public leaderboard to encourage creative exploration

Self-Sustaining Data Set Annotations & Open Challenge Test Collection Open Challenge Competition Assignment ... Annotation Assignment ... Auto Grader Annotations ... Leaderboard #1 Team1 0.81 #2 Team 2 0.75 … Raw Data Set

Example of a new data set (for online course retrieval) High grades  More reliable annotations

Search Engine Contest: Leaderboard

Overall lessons from the MOOCs Learners of MOOCs are a different crowd than the on-campus students Practical mindset, self-motivated, but less background and less time Pre-quiz is necessary for such technical courses (set realistic expectation) Learners form self-supporting online communities Short modularized lecture videos are preferred Programming assignments are very much appreciated Crowdsourcing annotations and open competition worked well  MOOC goes beyond education to support research! Limitations of current MOOCs Lack of “individual care” (students don’t all get the needed help) Solely rely on peer grading of sophisticated assignments (unreliable grading & ineffective feedback to students)

Current Trend: Integration of MOOCs and Traditional Education Flipped/Blended classroom + Traditional Classrooms Quality LOW cost Online Degree High Engagement component + HIGH cost Campus Degree MOOC MINUM cost Specialization Certificate MINUM cost Course Certificate FREE No Certificate Scalability

A new online MOOC-based program: MCS-DS at UIUC MCS-DS = Master of Computer Science in Data Science Tuition = $20,000 Courses =MOOCs + High Engagement Components Interdisciplinary Courses mostly offered by Computer Science Department Data Mining Specialization Cloud Computing Specialization Machine Learning Other units include School of Information Science & Statistics Department

Part 2: Big Data for Education Quality Scalable Intelligent MOOC Small Classrooms Towards Intelligent MOOC “Big Data Technology” MOOC Automate grading with machine learning Automate question answering on forums Scalability

Traditional Manual Grading Submitted Assignments Graded Assignments Grade: 93 85 …. Proposed Automated Grading Graded Assignments Submitted Assignments Grade Verification Multi-dimensional Grade Predictor Detailed Grading Results Batch grading Clustering Improvement Performance & Behavior Analysis

Preliminary results on grading medical case assignments are promising [Geigle et al. 2016] Chase Geigle, ChengXiang Zhai, Duncan Ferguson, An Exploration of Automated Grading of Complex Assignments, ACM Learning at Scale 2016.

Towards Intelligent MOOC: Limitations of Current MOOC Instruction materials limited to those pre-defined by an instructor  can’t take advantage of useful materials on the Web Limited search capability inside a course  can’t easily find the most relevant video clip or discussion posts about a topic No understanding of students  can’t personalize the instruction and learning experience Limited support for collaborative learning  can’t leverage massive student behavior data to recommend materials for individual students Limited support for interactions with students  can’t engage students in a natural dialogue

Novel Features of an Intelligent MOOC Seamless integration of MOOC and Web search  enable students to learn from the Web Concept/Topic search, navigation, and summarization  enable students to quickly find all materials about a concept or topic Dynamic and adaptive student modeling  enable deep understanding of student state of knowledge Lifetime learning from student behavior data enable effective support of collaborative learning Interactive personalized teaching  enable personalized natural conversations between students and the system

Traditional MOOC Platform Current MOOC Student Record Traditional MOOC Platform MOOC Course Content … MOOC Activity Log

… An Intelligent MOOC Open Web MOOC Course Content Student Model Modeler Open Web Concept Recommender Personalized Search agent Topic/Concept Graph generator Interactive Teaching Interface … Concept/Topic Search agent Concept Navigator MOOC Course Content MOOC Activity Log

Many existing technologies can be applied Algorithms for intelligent information retrieval Interactive personalized search technologies Algorithms for dynamic topic map generation Algorithms for topic discovery, summarization, and analysis Algorithms for search log analysis Algorithms for opinion integration and summarization Algorithms for collaborative filtering and recommender systems …

Part 3: Integration of Big Data and Education Educate Intelligent MOOC Platform ? Scalability & Quality Improve Research & Develop Applied to MOOC Log Education Big Data Big Data Technology

Toward a Cloud-based Big Data Virtual Lab … Leaderboard #1 Team1 0.81 #2 Team 2 0.75 … Log Data Leaderboard #1 Team1 0.5 #2 Team 2 0.3 … App Data 1 App Data N … Big Data Tool 1 Big Data Tool 2 Big Data Tool 1 Big Data Education System …

Unification of education, research, and applications! 4. Industry data sets not released to students & researchers  Privacy-preserving Big Data education & research 3. Well-archived interaction history  Reproducibility of research 2. Encourage open exploration (research)  Remove gap between education & research 1. Directly work on industry data sets and problems  Remove gap between education & applications

Final Thoughts: Education Revolution & Automation Big Data and IT enable education revolution and automation toward more affordable high-quality education IT enables one teacher to teach many more students than before (efficiency) Big Data technology would enable “automated” TA/instructor (scalability) Intelligent MOOC would improve quality of education at low cost Implications: Many traditional boundaries will likely disappear! No strict distinction between a teacher and a student (everyone learns from each other) No strict distinction between grade levels or age groups (learn at your own pace) No inherent boundaries between different courses (due to high modularization) No boundaries of subject areas (due to high modularization) No boundaries of institutions (MOOCs unify all institutions!)

Thank You! Questions/Comments?