Download presentation
Presentation is loading. Please wait.
1
CS 8803 CVL: Vision and Language
Devi Parikh School of Interactive Computing
2
Welcome!
3
Plan for today Topic overview Introductions Course overview: Logistics
Requirements Lecture format Please interrupt at any time with questions or comments
4
Computer Vision Automatic understanding of images and video
Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation) Algorithms to mine, search, and interact with visual data (search and organization) Kristen Grauman 4
5
What does recognition involve?
Fei-Fei Li
6
Detection: Are there people?
7
Activity: What are they doing?
8
Object categorization
mountain tree building banner street lamp vendor people
9
Instance recognition Potala Palace A particular sign
10
Scene and context categorization
outdoor city …
11
Attribute recognition
gray made of fabric crowded flat
12
12
13
People coloring a street on a college campus
13
14
It was a great event! It brought families out, and the whole community together.
14
15
15
16
Q. What are they coloring the street with? A. Chalk
16
17
AI: What a nice picture! What event was this?
User: “Color College Avenue”. It was a lot of fun! AI: I am sure it was! Do they do this every year? User: I wish they would. I don’t think they’ve organized it again since 2012. … 17
18
Pictures are everywhere Words are how we communicate
Why Words and Pictures? 1 Pictures are everywhere Words are how we communicate
19
Why Words and Pictures? 1 Applications
20
Interact with, organize, and navigate visual data
Why Words and Pictures? 1 Applications Interact with, organize, and navigate visual data
21
Leverage multi-modal information on the web
Why Words and Pictures? 1 Applications Leverage multi-modal information on the web
22
Aid visually-impaired users
Why Words and Pictures? 1 Applications Aid visually-impaired users Microsoft
23
Aid visually-impaired users
Why Words and Pictures? 1 Applications Aid visually-impaired users
24
Summarize visual data for analysts
Why Words and Pictures? 1 Applications Summarize visual data for analysts
25
Why Words and Pictures? 2 Measuring and demonstrating AI capabilities
Image understanding Language understanding
26
Why Words and Pictures? 3 Beyond “bucket” recognition
Language is compositional “A steam engine is coming out of a fireplace.” René Magritte (1938)
27
Why Words and Pictures? 4 “Vision is our best sensor, and language is our best invention.” -- Viraj Prabhu
28
My goals (for you) Be well-versed in the latest in vision + language
Critique research papers in vision + language Identify interesting open questions and applications Execute a research project in vision + language
29
Introductions Devi Parikh Ph.D., Carnegie Mellon University, 2009
Research Assistant Professor, TTI-Chicago, 2013 Assistant Professor, ECE, Virginia Tech, 2016 Assistant Professor, School of Interactive Computing, Georgia Tech (currently) Research Scientist, Facebook AI Research (currently)
30
Introductions Arjun Chandrasekaran (your TA) CS Ph.D. Student
Georgia Tech CV, ML, NLP, AI language and vision making human-AI interaction more natural and efficient
31
Introductions Larry He (your second TA) CS MS Student Georgia Tech
32
Introductions Which program are you in? How far along?
Have you taken a computer vision course before? Have you taken a machine learning course before? Do you know how CNNs and LSTMs work? Have you used a deep learning package before? What are you hoping to get out of this class?
33
This course CS 8803 CVL Klaus 2456, TR 1:30 pm to 2:45 pm
Course webpage: Piazza: Focus on topics at the intersection of vision and language Cutting edge research 33
34
No “Assignments”, Exams, etc.
Requirements Paper reviews each class [30%] Leading discussion (~once) on papers [10%] Project [60%] No “Assignments”, Exams, etc. 34
35
Prerequisites Course in computer vision Course in machine learning
Basic knowledge of deep learning 35
36
Paper reviews For each class Submit by midnight before class
Review one paper Submit by midnight before class Submission workflow: TBD Skip reviews the class you are leading discussion Late reviews will not be accepted Will drop three lowest grades on reviews 36
37
Paper review guidelines
One page Detailed review: Brief (2-3 sentences) summary Main contribution Strengths? Weaknesses? How convincing are the experiments? Suggestions to improve them? Extensions? Applications? Additional comments, unclear points Relationships observed between the papers we are reading Pull out most interesting thought Look at class webpage Write in your own words Write well, proof read 37
38
Leading Discussion ~ One of you will be assigned to argue for the paper ~ One of you will be assigned to argue against the paper Come prepared with 5 points Sign up here by August 29th: 38
39
Projects First few lectures: introductory talks
Image captioning Visual question answering Visual dialog By lead authors of representative works in this space 39
40
Projects Possibilities: Design and evaluate a novel approach
A novel application, use case Extension of a technique studied in class Be creative! Think: research paper at a good conference Work in teams of ~4 (at most 15 teams in the class) Sign up for teams by September 8th 40
41
Project timeline Four in-class presentations (see class schedule)
Project ideas / proposal [10%] Update 1 [10%] Update 2 [10%] Final presentation [15%] Project video (1 minute) [15%] December 5th 41
42
Tips Make sure you are saying everything we need to know to understand what you are saying. Make sure you know what you are talking about. Think about your audience. Make your talks visual, animated (images, video, not lots of text). Stick to the time limit! 42
43
Tips Clearly define the problem statement (input, output)
Place your work in the context of existing work you know of Lay out the set of experiments you’ll conduct to demonstrate the efficacy of your approach Present a timeline Concrete goals for next update in ~2.5 weeks Long shots Present updates along this plan See more details on class webpage Stick to the time limit! 43
44
Implementation Use any language / platform / package you like
No support for code / implementation issues will be provided Possibility of consulting with lead authors who gave the introductory talks 44
45
Miscellaneous Best presentation, best project and best discussion prizes! We will vote Feedback welcome and useful 45
46
Context Deep Learning (CS 7643) This course is complementary to it
47
Coming up Read the class webpage Schedule is up
Select 6 dates (topics) you would like to lead the discussion on (by August 29th) Sign up sheet shows how many people have already signed up for a topic Select those that have fewer selections Probability of dropping class? Start thinking about project teams Pointers to good presentations, reviews, etc. are on the class webpage.
48
Moving forward No class on Thursday Three lectures after that
No paper reading, no review, no discussion Introductory talks covering spectrum of vision + language tasks
49
Each lecture after that
You will have read and summarized a paper the night before ~ 15 minute discussion on paper we read Led by two students: “for” and “against” 10-minute presentation by 3 teams on projects 10-minute discussion on each presentation
50
Last two lectures Final project presentations
51
Final comments Goal: submit a paper to a good AI conference (CV, NLP, ML) Read the class webpage
52
Questions? See you next week!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.