CS410: Text Information Systems (Spring 2018) Instructor: ChengXiang “Cheng” Zhai Full-Time TAs: Bingjie Jiang Qihao Shao Part-Time TAs: Chase Geigle Eddie Huang Dominic Seyler Sheng Wang
Motivation: Harnessing Big Text Data Text data is ubiquitous and growing rapidly Internet Blogs News Email Literature Twitter … Many applications! Knowledge
Humans as Subjective & Intelligent “Sensors” Sense Report Real World Sensor Data Weather Thermometer 3C , 15F, … Perceive Express “Human Sensor” Locations Geo Sensor 41°N and 120°W …. Network Sensor 01000100011100 Networks
Unique Value of Text Data Useful to all big data applications Especially useful for mining knowledge about people’s behavior, attitude, and opinions Directly express knowledge about our world Small text data are also useful! Data Information Knowledge Text Data
Main Techniques for Harnessing Big Text Data: Text Retrieval + Text Mining Small Relevant Data Small Relevant Data Knowledge Many Applications
Big Design of CS410: Overview MOOC 2 MOOC 1 Course Project Hi Online Videos + High Engagement MOOC 1 MOOC 2 Course Project Text Retrieval Text Mining Big Text Data Big Text Data Small Relevant Data Small Relevant Data Knowledge Many Applications
Design of CS410: Goals Emphasize both theory and practice Theory: basic concepts and general principles are applicable to all applications Lectures + Quizzes Practice: specific practical skills are immediately useful Programming assignments Integration of theory and practice Course projects
Design of CS410: Goals Personalized learning Self paced + Choices of project Collaborative learning Forum-based interactions and collaboration + Group projects
Prerequisites Required: Optional: Proficiency in programming (needed for assignments and projects) Comfortable with programming (ideally C++ or Java, but Python would also be okay) Optional: Knowledge of basic probability & statistics (helpful for understanding algorithms deeply) Contact the instructor if you aren’t sure
Textbook & Readings Textbook (available online) Text Data Management and Analysis: A practical introduction to text mining and information retrieval, by ChengXiang Zhai, Sean Massung, ACM and Morgan & Claypool Publishers, 2016 Notes and additional readings will be available online as needed
Design of CS410: Format & Grading Extra Credit: + 5% Synchronous Weekly Class Meetings & Office Hours Asychronous Question Answering & Discussion via Forums Hi MOOC 1 MOOC 2 Course Project Text Retrieval Text Mining 25% Proposal 15% 25% 60% 5% 45% 25% Lecture Videos Lecture Videos Quizzes Quizzes Presentation MP Assignment MP Assignment Report
You have Complete Control over Your Grade! B+: [80, 84] B: [75, 79] B-: [70,74] C: [60, 69] D: [55,59] F: <55 5% Extra Credit would help move your grade up by one bracket
Format Lecture Videos + In-class Discussions & Problem Solving No class meetings on most Thursdays, but we meet and give a quiz on almost every Tuesday Watch video before Thur Submit summary (questions/quizzes) before next Tue Get questions answered in class next Tue Take quizzes the Tue after Assignments: 4~5 assignments (experiments + coding) No exam Course project Literature review (only required for 4 credit hours)
Weekly Schedule You watch videos before Thursday You submit a summary of what you’ve learned in the subsequent weekend The difficult lecture segments + Specific questions (if you have difficulty in understanding some part), or A quiz question to test knowledge about the lecture (if you mastered the materials well) Summaries will be published to help all of you prepare for the quiz We’ll review the difficult concepts and answer your questions in the class meeting next Tuesday and give a quiz to test topics in a previous lecture We also discuss assignments and projects to help you finish them
Forum Discussion Forum (Piazza) is the primary way of interactions and engagement Asynchronous discussion enables participation of everyone at any time Enables faster question answering without waiting until a class meeting or office hour Facilitates identification of difficult concepts to be covered in class meetings and office hours
Protocol of Question Answering As soon as you have a question or issue to discuss, post it immediately on Forum and submit the question If the question is not answered in a timely manner on Forum or addressed adequately, email the question to all of us (i.e., the instructor and 4 TAs) using a subject line containing the keyword “CS410S18” If you don’t receive a reply from us by email in a timely manner, come to an office-hour.
Format of Office Hours TAs and the instructor will hold weekly office hours at published time slots Special office hour by instructor Thursdays, 11am-12:15pm, in 1404 SC (classroom) Priority list in descending order: High: Issues posted on Forum, but unresolved even after email communications with the TAs/Instructor Medium: Other unresolved issues on Forum Low: Any questions or issues not posted on Forum, brought by a student joining an office hour (first come, first serve)
How to Get the Most out of CS410? Watch every lecture video in a timely manner! Identify and ask questions before the Tuesday meeting Read in advance if possible Collaborative learning: help each other to get an “A”! Actively participate in forum discussions (you’ll learn from reading posts on Forums) Earn up to 5% extra credit by making effort to answer others’ questions on Forums (your effort will be logged on the Forum) Post questions on Forum immediately whenever you have difficulty in understanding any part of the course materials
Your Work Load … … Jan Feb Mar Apr May 1/19 Last day of instruction Spring Break Video Watching &Q A & Quiz Assign #1 Assign #2 … … Final Week Assign #k Project Literature Review
Questions? Course website: Course Piazza: https://courses.engr.illinois.edu/cs410/sp2018/ Course Piazza: https://piazza.com/illinois/spring2018/cs410 Course Compass space: https://compass2g.illinois.edu/webapps/blackboard/content/listContentEditable.jsp?content_id=_3006262_1&course_id=_36397_1