1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08.

Slides:



Advertisements
Similar presentations
CSc 2310 Principles of Programming (Java)
Advertisements

CPT S 317: Automata and Formal Languages
CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
CS 581: Introduction to the Theory of Computation Lecture 1 James Hook Portland State University
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
General course information Session 1: 7/08/
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
CS/CMPE 535 – Machine Learning Outline. CS Machine Learning (Wi ) - Asim LUMS2 Description A course on the fundamentals of machine.
June 13, Introduction to CS II Data Structures Hongwei Xi Comp. Sci. Dept. Boston University.
CS 331 / CMPE 334 – Intro to AI CS 531 / CMPE AI Course Outline.
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic.
COMP171 Data Structures and Algorithm Qiang Yang Lecture 1 ( Fall 2006)
COMP 110 Introduction to Programming Mr. Joshua Stough August 22, 2007 Monday/Wednesday/Friday 3:00-4:15 Gardner Hall 307.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
COMP 14 – 02: Introduction to Programming Andrew Leaver-Fay August 31, 2005 Monday/Wednesday 3-4:15 pm Peabody 217 Friday 3-3:50pm Peabody 217.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
July 16, Introduction to CS II Data Structures Hongwei Xi Comp. Sci. Dept. Boston University.
CSCE 3110 Data Structures and Algorithm Analysis.
Introduction to information theory
COP4020/CGS5426 Programming languages Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: T, H 10:00am – 11:30am Class website:
Computer Science 102 Data Structures and Algorithms V Fall 2009 Lecture 1: administrative details Professor: Evan Korth New York University 1.
Final review LING572 Fei Xia Week 10: 03/11/
© 2004 Goodrich, Tamassia CS2210 Data Structures and Algorithms Lecture 1: Course Overview Instructor: Olga Veksler.
CS223 Algorithms D-Term 2013 Instructor: Mohamed Eltabakh WPI, CS Introduction Slide 1.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
COMP Introduction to Programming Yi Hong May 13, 2015.
CSc 2310 Principles of Programming (Java) Dr. Xiaolin Hu.
CPS120: Introduction to Computer Science Fall: 2002 Instructor: Paul J. Millis.
CNS 4450 Syllabus. Context Language is a tool of thought. We rarely think without words. In solving problems by computer, we eventually get to the point.
Catie Welsh January 10, 2011 MWF 1-1:50 pm Sitterson 014.
CSCI 51 Introduction to Computer Science Dr. Joshua Stough January 20, 2009.
Introduction to Databases Computer Science 557 September 2007 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
1 Principles of Computer Science I Note Set 1 CSE 1341.
Introduction to Data Structures
Computer Science 102 Data Structures and Algorithms CSCI-UA.0102 Fall 2012 Lecture 1: administrative details Professor: Evan Korth New York University.
CS 6961: Structured Prediction Fall 2014 Course Information.
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
Programming In Perl CSCI-2230 Thursday, 2pm-3:50pm Paul Lalli - Instructor.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
1 Introduction LING 570 Fei Xia Week 1: 9/26/07. 2 Outline Course overview Tokenization Homework #1 Quiz #1.
CPS120: Introduction to Computer Science Winter 2002 Instructor: Paul J. Millis.
Introduction to ECE 2401 Data Structure Fall 2005 Chapter 0 Chen, Chang-Sheng
1 Introduction LING 570 Fei Xia Week 1: 9/30/09. 2 Outline Course overview Tokenization Homework #1 Questionnaire.
IST 210: Organization of Data
ICS202 Data Structures King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
COP4610/CGS5765 Operating Systems Syllabus. Instructor Xin Yuan Office: 168 LOV Office hours: W M F 9:10am – 10:00am, or by appointments.
CS Introduction to Computer Science Spring 2011 Dr. Angela Guercio (
1 Data Structures COP 4530 Spring 2010 MW 4:35 PM – 5:50 PM CHE 101 Instructor:Dr. Rollins Turner Dept. of Computer Science and Engineering ENB
CS151 Introduction to Digital Design Noura Alhakbani Prince Sultan University, College for Women.
Dr. Sajib Datta CSE Spring 2016 INTERMEDIATE PROGRAMMING.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Data Structures and Algorithms in Java AlaaEddin 2012.
COP4020 INTRODUCTION FALL COURSE DESCRIPTION Programming Languages introduces the fundamentals of the design and implementation of programming languages.
Computer Networks CNT5106C
CS6501 Advanced Topics in Information Retrieval Course Policy
CS101 Computer Programming I
CSC207 Fall 2016.
CSE1320 INTERMEDIATE PROGRAMMING
CSE1320 INTERMEDIATE PROGRAMMING
Cpt S 471/571: Computational Genomics
CSE1320 INTERMEDIATE PROGRAMMING
Cpt S 471/571: Computational Genomics
CSE1320 INTERMEDIATE PROGRAMMING
CS Problem Solving and Object Oriented Programming Spring 2019
Presentation transcript:

1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Outline General course information Course contents Reading assignment #1: due 1/10 Take-home exam #1: due 1/10 2

3 General info Course url: –Syllabus (incl. slides, assignments, and papers): updated every week. –GoPost: –Collect it: Please check your s and GoPost at least once per day.

4 Slides The slides will be online before class. The final version will be uploaded a few hours after class. “Additional slides” are not required and not covered in class.

5 Prerequisites CS 326 (Data Structures) or equivalent: –Ex: hash table, array, tree, … Stat 391 (Prob. and Stats for CS) or equivalent: Basic concepts in probability and statistics –Ex: random variables, chain rule, Bayes’ rule Programming in Perl, C, C++, Java, or Python Basic unix/linux commands (e.g., ls, cd, ln, sort, head): tutorials on unix LING570: If you don’t meet all the prerequisites above, you need to get permission from Fei before taking LING572.

LING570 prerequisites If you did not take LING570 last quarter, you need to understand all the material covered in that class. Especially, –the material in Weeks #9-#10 –hw #10, Quiz #4 –the Mallet package – 6

7 Topics covered in Ling570 Unit #1: –Formal languages and formal grammars –FSA, FST –Morphological analysis Unit #2: LM, ngram, and smoothing Unit #3: HMM and POS tagging Unit #4: Classification and sequence labeling tasks.

8 Grades for LING572 No midterm or final exams. Graded: Assignments (9): 65-75% Take-home exams (3-5): 15-25% Not graded: Reading assignments (5-9): 5-10% Class participation: 5%

9 Office hour Fei: – address: Subject line should include “ling572” The 48-hour rule: it works both ways –Office hour: Time: Fr: 10:30-11:30am Location: Padelford A-210G

10 TA hour Dan Jinguji – –Time: T: ?? –Location: Art 337

Assignments / Exams Assignments: the same as in ling570 Take-home exams: to replace in-class quizzes Reading assignments: some papers should be read before class. When there are take-home exams or reading assignments for the same period, the amount of assignments will be reduced accordingly. The total amount of time spent on Ling572 will be about hours. 11

Assignments Nine assignments Programming languages: C, C++, Java, Perl, or Python. Please follow the instructions in the assignments, including –command line format –file format –the probability model –… 12

The Mallet package Several assignments will use the Mallet package. If you don’t know how to use Mallet, you should get familiar with the package ASAP. You can start with the hw10 from LING570. The Mallet slides are at The LING570 hw10 is at

14 Assignment submission Use “Collect it”: submit the tar file. –E.g., tar –cvf hw1.tar hw1_dir Due date: every Thurs at 1pm unless specified otherwise The submission area is closed 4 days after the due date. There is 1% penalty for every hour after the due date.

15 Homework Submission (cont) Each submission includes –a note file: hw1.(txt|doc|pdf) for hw1. If your code does not work, explain in the note file what you have implemented so far. –a set of shell scripts: e.g., kNN.sh –source code: e.g., kNN.C –binary code (for C/C++/Java): kNN.out –data files if any. –The TA will NOT compile or debug your code. Time spent on an assignment: hours/week  I would appreciate it if you could tell me the time you spent on the homework.

Take-home exams Normally, it is handed out on Tuesday, and due on Thursday Bring the hardcopy of your solution to class. The dates on the tentative schedule are subject to change. Extension will be granted for the exams ONLY under extremely unusual circumstances. There are no makeup exams. If you know that you will miss a class in which an exam could be given, you need to inform me at least two hours before the class starts. 16

Take-home exams (cont) You should complete the exams on your own. No discussion among students is allowed. You can refer to anything that is available on the course url and on patas, but please don’t search the Web for answers. If you have any questions about the exams, please Fei. 17

Reading assignments You will answer some questions about the papers that will be discussed in next class. The questions are on teaching slides, and there are no separate documents for them. Your answers should be concise and no more than a few lines. Your answers are due before the next class. Bring the hardcopy of your answers to class. 18

Summary of assignments and exams Assignments (hw) Take-home exams Reading assignments Num DistributionDownload from the course url Distributed in class Download from the course url DiscussionAllowedDisallowedAllowed SubmissionCollect ItBring to class Not graded Due date1pm every ThursBefore next class Extension1% penalty per hour Normally disallowed Disallowed Estimate of hours10-20 hours1-5 hours2-5 hours Solution filesOn Patas Discussed in class 19

Patas If you need to have a patas account, you need to right away to get an account. The directory for LING572: ~/dropbox/07-08/572/ –hw1/, hw2/, ….: Assignments and solution –misc_slides/: Solution to exams and misc slides that are not on the course url. For jobs that run more than 5 minutes, use the cluster submission commands: see Hw1. 20

21 Course plan

Types of ML problems Classification problem Estimation problem Clustering Discovery …  A learning method can be applied to one or more types of ML problems.  We will focus on the classification problem. 22

Course objectives Covering basic statistical methods that produce state-of-the-art results Focusing on classification and sequence labeling problems Some ML algorithms are complex. We will focus on basic ideas, not theoretical proofs. 23

Main units Unit #1 (2 weeks): simple classification algorithms –kNN –Decision tree –Naïve Bayes Unit #2 (3 weeks): advanced classification algorithms –MaxEnt* –SVM** 24

Main units (cont) Unit #3 (2 weeks): sequence labeling algorithms –TBL* (if time permits) –CRF** Unit #4 (1 week): system combination Unit #5 (if time permits) –Introduction to semi-supervised learning ** –Introduction to EM ** 25

Other topics Information theory Feature selection Converting multi-class task to binary classification task 26

Three levels of discussion The default (i.e., unmarked): We will discuss the model, training, decoding, etc. –kNN, Decision tree, Naïve Bayes. *: We will discuss the model, but not the training and other implementation issues: –MaxEnt, TBL **: We will only go over the main intuition about the algorithms: –SVM, CRF, semi-supervised learning, EM 27

Questions for each ML method Modeling: –what is the model? –What kind of assumption is made by the model? –How many types of model parameters? –How many “internal” (or non-model) parameters? –…–… 28

Questions for each method (cont) Training: how to estimate parameters? Decoding: how to find the “best” solution? Weaknesses and strengths? –Is the algorithm robust? (e.g., handling outliners) scalable? prone to overfitting? efficient in training time? Test time? –How much data is needed? Labeled data Unlabeled data 29

Reading assignment #1 30

Reading assignment #1 Read M&S 2.2: Essential Information Theory Questions: For a random variable X, p(x) and q(x) are two distributions: Assuming p is the real distribution. –p(X=a)=p(X=b)=1/8, p(X=c)=1/4, p(X=d)=1/2 –q(X=a)=q(X=b)=q(X=c)=q(X=d)=1/4 (a) What is H(X)? (b) What is cross entropy H(X,q)? (c) What is KL divergence D(p||q)? (d) What is D(q||p)? 31

Next time Both Reading #1 and Exam #1 are due before class. Bring the hardcopy to class. Topics: –Information theory and hw #1 –Solution to Exam #1 (if time permits) –Recap on the classification problem (if time permits) 32