Advanced Topics in Computer Vision Devi Parikh Electrical and Computer Engineering
Plan for today Topic overview: Introductions Course overview: What does the visual recognition problem entail? Why are these hard problems? What works today? Introductions Course overview: Logistics Requirements Please interrupt at any time with questions or comments
Computer Vision Automatic understanding of images and video Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation) Algorithms to mine, search, and interact with visual data (search and organization) Kristen Grauman 3
What does recognition involve? Fei-Fei Li
Detection: are there people?
Activity: What are they doing?
Object categorization mountain tree building banner street lamp vendor people
Instance recognition Potala Palace A particular sign
Scene and context categorization outdoor city …
Attribute recognition gray made of fabric crowded flat
Object Categorization Task Description “Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label.” Which categories are feasible visually? German shepherd animal dog living being “Fido” K. Grauman, B. Leibe
Visual Object Categories Basic Level Categories in human categorization [Rosch 76, Lakoff 87] The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects the entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members K. Grauman, B. Leibe
Visual Object Categories Basic-level categories in humans seem to be defined predominantly visually. There is evidence that humans (usually) start with basic-level categorization before doing identification. Basic-level categorization is easier and faster for humans than object identification! How does this transfer to automatic classification algorithms? K. Grauman, B. Leibe
Visual Object Categories … animal Abstract levels … … four-legged … Basic level dog cat cow German shepherd Doberman Individual level … “Fido” … K. Grauman, B. Leibe
How many object categories are there? ~10,000 to 30,000 Biederman 1987 Source: Fei-Fei Li, Rob Fergus, Antonio Torralba. 15
~10,000 to 30,000 16
Other Types of Categories Functional Categories e.g. chairs = “something you can sit on” K. Grauman, B. Leibe
Why recognition? Recognition a fundamental part of perception e.g., robots, autonomous agents Organize and give access to visual content Connect to information Detect trends and themes Where are we now? Kristen Grauman 18
Computer Vision … 45 years later “spend the summer linking a camera to a computer and getting the computer to describe what it saw” - Marvin Minsky (1966), MIT … 45 years later
Computer Vision OR Vision is HARD!
We’ve come a long way…
We’ve come a long way…
We’ve come a long way…
Posing visual queries Yeh et al., MIT Belhumeur et al. Kooaba, Bay & Quack et al. Kristen Grauman
Finding visually similar objects Kristen Grauman 25
Discovering visual patterns Sivic & Zisserman Objects Lee & Grauman Categories Wang et al. Actions Kristen Grauman
Auto-annotation Gammeter et al. T. Berg et al. Kristen Grauman
Exploring community photo collections Snavely et al. Kristen Grauman Simon & Seitz 28
Autonomous agents able to detect objects Kristen Grauman http://www.darpa.mil/grandchallenge/gallery.asp
We’ve come a long way… Fischler and Elschlager, 1973
We’ve come a long way…
We’ve come a long way… Dollar et al., BMVC 2009
Still a long way to go… Dollar et al., BMVC 2009
Dollar et al., BMVC 2009
Dollar et al., BMVC 2009
Challenges
Challenges: robustness Illumination Object pose Clutter Intra-class appearance Occlusions Viewpoint Kristen Grauman
context and human experience Challenges: context and human experience Context cues Kristen Grauman
context and human experience Challenges: context and human experience Context cues Function Dynamics Kristen Grauman Video credit: J. Davis
Challenges: scale, efficiency Half of the cerebral cortex in primates is devoted to processing visual information ~20 hours of video added to YouTube per minute ~5,000 new tagged photos added to Flickr per minute Thousands to millions of pixels in an image 30+ degrees of freedom in the pose of articulated objects (humans) 3,000-30,000 human recognizable object categories Kristen Grauman
Challenges: learning with minimal supervision Less More Unlabeled, multiple objects Classes labeled, some clutter Cropped to object, parts and classes labeled Kristen Grauman
Slide from Pietro Perona, 2004 Object Recognition workshop
Slide from Pietro Perona, 2004 Object Recognition workshop
What kinds of things work best today? Frontal face detection Reading license plates, zip codes, checks Recognizing flat, textured objects (like books, CD covers, posters) Fingerprint recognition Kristen Grauman
Inputs in 1963… L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph.D. thesis, MIT Department of Electrical Engineering, 1963. Kristen Grauman
… and inputs today Personal photo albums Movies, news, sports Surveillance and security Medical and scientific images Slide credit; L. Lazebnik 46
Understand and organize and index all this data!! … and inputs today Images on the Web Movies, news, sports 350 mil. photos, 1 mil. added daily 1.6 bil. images indexed as of summer 2005 916,271 titles 10 mil. videos, 65,000 added daily Understand and organize and index all this data!! Satellite imagery City streets Slide credit; L. Lazebnik 47
Introductions What is your name? Which program are you in? How far along? What is your research area and current project about? Take a minute to explain it to us In a way that we can all follow Have you taken a computer vision course before? Machine learning or pattern recognition? Do you know what classifiers are / do? What are you hoping to get out of this class?
This course ECE 6504 TR 5:00 pm to 6:15 pm McBryde Hall (MCB) 216 Office hours: by appointment (email) Course webpage: http://filebox.ece.vt.edu/~S13ECE6504/ (Google me My homepage Teaching) 49
This course Focus on current research in computer vision High-level recognition problems, innovative applications. 50
Goals Understand current approaches Analyze and critique Identify interesting research questions Present clearly and methodically 51
Expectations Discussions will center on recent papers in the field [15%] Paper reviews each class [25%] Can have 3 late days over the course of the semester Presentations (2-3 times) [25%] Papers and background reading Experiments Project [35%] 52
Prerequisites Ability to analyze high-level conference papers Courses in computer vision and machine learning are a plus 53
Paper reviews For each class, review two of the assigned papers. Email me by 9:00 pm the day before class (MW) Skip reviews the classes you are presenting. 54
Paper review guidelines Less than a page (½ – ¾) Brief (2-3 sentences) summary Main contribution Strengths? Weaknesses? How convincing are the experiments? Suggestions to improve them? Extensions? Applications? Additional comments, unclear points Relationships observed between the papers we are reading 55
Paper presentation guidelines Papers Experiments 56
Papers Read selected papers in topic area and background papers as necessary Well-organized talk, 45 minutes What to cover? Problem overview, motivation Algorithm explanation, technical details Any commonalities, important differences between techniques covered in the papers. See class webpage for more details. 57
Experiments Implement/download code for a main idea in the paper and show us toy examples: Experiment with different types of (mini) training/testing data sets Evaluate sensitivity to important parameter settings Show (on a small scale) an example to analyze a strength/weakness of the approach Share links to any tools or data. 58
Tips Look up papers and authors. Their webpage may have data, code, slides, videos, etc. Make sure talk flows well and makes sense as a whole. Cite ALL sources. Don’t forget the high-level picture. Give a very clear and well-organized and thought out talk. 59
Timetable for presenters Meet me three days before your presentation to do a dry run Fridays and Mondays Email me to set up an appointment at least a couple of days ahead of time This is a hard deadline. 60
Projects Possibilities: Extend a technique studied in class Analysis and empirical evaluation of an existing technique Comparison between two approaches Design and evaluate a novel approach Talk to me if want to work with a partner Talk to me if you need help with ideas 61
Project timeline Project proposals (1 page) [10%] February 12th Mid-semester presentations (10 minutes) [20%] March 19th and 21st (after Spring break) Final presentations (20 minutes) [35%] April 25th to May 7th Project reports (8 pages) [35%] May 15th 62
Implementation Use any language / platform you like No support for code / implementation issues will be provided 63
Miscellaneous Best presentation, best project and best discussion prizes! We will vote Dinner Feedback welcome and useful No laptops, phones, etc. in class please I will interrupt if something in your talk is not clear. 64
Tips Make sure you are saying everything we need to know to understand what you are saying. Make sure you know what you are talking about. Imagine your audience, better yet, imagine talking to a complete stranger. Make your talks visual (images, video, not lots of text). Look up examples. 65
Other courses Introduction to Machine Learning and Perception Dhruv Batra Offered again in the Fall (?) Advanced Machine Learning Spring semesters (?) Computer Vision Fall semesters (?) 66
Coming up Read the class webpage Schedule is up Tour of schedule Select 6 dates (topics) you would like to present Email me by Thursday Whoever signs up for Tuesday next week Need not do an experimental section for that day Meet me on Friday with at least rough outline of slides Meet me on Monday with the final version First-come-first-serve Overview of my research on Thursday
Questions? See you Thursday!