Announcements Homework 5 due tonight (11:59pm) via Carmen Homework 6 out now: Due 12/1 @ 11:59pm
Today’s learning goals At the end of today, you should be able to Describe AI applications to image classification and gestural control Describe AI applications to problems in robotics
Information retrieval Flip side of information extraction Input Query (structured data or unstructured text) Output List of records (documents), ranked by relevance to query “Information about platypi” IR
Example
IR is more than just language Information retrieval can be mostly structured E.g., PageRank (Google’s early search algorithm) Uses page connections via hyperlinks to define “important” pages Does some keyword-based matching Or mostly unstructured Example medical document retrieval system: Query: string of disease names (”emphysema, breast cancer, diabetes”) Method: find documents that frequently mention these words Rank by date entered (recent first)
IR example Query: “emphysema AND diabetes” Extra knowledge base info: “chronic cough” is related to “emphysema” “insulin” is related to “diabetes” Document F(emphysema) F(diabetes) F(cough) F(insulin) D1 14 2 D2 3 1 4 D3 D4 10 7 12 6
IR example Query: “emphysema AND diabetes” Linear ranking model: 𝑅𝑎𝑛𝑘 𝐷 𝑖 =2𝐹 𝑒𝑚𝑝ℎ +𝐹 𝑑𝑖𝑎𝑏 +0.3𝐹 𝑐𝑜𝑢𝑔ℎ +0.7𝐹(𝑖𝑛𝑠𝑢𝑙𝑖𝑛) Document F(emphysema) F(diabetes) F(cough) F(insulin) Score D1 14 2 28.6 D2 3 1 4 6.1 D3 D4 10 7 12 6 34.9 Return documents in ranked order: [D4, D1, D2, D3]
Sample IR experimental questions How does a unigram (1-word) frequency model compare to a bigram (2-word) model? How important are hyperlinks vs text matches? Does my new way of representing documents give me better IR generalization to different kinds of documents? How do I efficiently rank tens of millions of documents?
Computer vision Video from Joshua Mosley
Vision data Data come in different forms Single images Video (image sequence) Grayscale/RGB/CMYK/etc Multiple cameras Other factors in data Lighting Lens shape etc. Distance to subject as
Vision applications Image classification Motion tracking Motion capture Gestural control Sports highlights/analysis …
Image classification Input Single image Classifier Output Label(s) describing image content Llama
Feature extraction – edge detection Use pixel value gradients to find sharp changes Thresholding for a certain value gives you edges Original Edges Overlaid Images from Jim Davis
Feature extraction – region segmentation Break the image into contiguous regions Can use clustering methods like k-means k-means (k=16) Original Segmented
Convolutional Neural Networks Say we want to learn to extract useful features for classification May be edges, faces, color patterns, etc Use a neural network to get arbitrary image features
Convolutional neural networks Normal fully-connected neural net Linear combination of every single pixel in the image Way too many parameters!
Convolutional neural networks Use local connections instead Important information w.r.t. one pixel is usually nearby Local statistics can be similar at different locations A nose is a nose Translation invariance So slide local “windows” around the image
Convolutional operation (single 3x3 filter) Images from Jim Davis
Convolutional operation (single 3x3 filter) Images from Jim Davis
Convolutional operation (single 3x3 filter) Images from Jim Davis
Convolutional operation (single 3x3 filter) Images from Jim Davis
Convolutional operation (single 3x3 filter) Images from Jim Davis
Convolutional operation (single 3x3 filter) Images from Jim Davis
Different filters for different features Use many different filters over the same image Each filter is just a linear combination of pixels Parameters are the same no matter where you apply Ideally, each learns different features Llama faces, color shifts, noses, etc.
Actually doing image classification General process: Extract features from labeled images Plug them into some classification model Logistic regression Neural network Support Vector Machine, etc Profit Features Classifier Llama
Experimental questions for image classification Which features work best for different classification tasks? Labeling animals, night vs day, city where image was taken, etc. Can I automatically find the best threshold for edge detection on pictures of faces? …
Gestural control Main idea: use hand gesture input to control computer applications Track user’s hand in 3D Pre-defined gestures act as discrete computer commands One or more cameras Need real-time processing! Slides adapted from Jim Davis
Simple example 3 kinds of gestures: Point, Reach, and Click Recognize using edges of hand against background Track how edge shape changes between video frames Example from Kumar and Segen (1999) Slides adapted from Jim Davis
Application 1: Flight simulator Example from Kumar and Segen (1999) Slides adapted from Jim Davis
Application 2: Doom
Modern application Toshiba concept demo (2009) Combination of intelligent heuristics (tracking the hand) and learning (classifying gestures)
Robotics More Youtube (new robot dog)
Robotics What kinds of problems do you have to deal with? Physical environment: partially observable, full of stuff Physical components: unreliable, may fail Sensors: limited ability to perceive environment Two examples of very different robotics settings: Industrial robotics and swarm robotics
Industrial robotics
Industrial environments Controlled environment Agent typically has limited or no movement Know what other agents (robots, humans) will be around Know what physical conditions should be
Industrial applications Applications/autonomy vary Highly specialized for a single task with hard input/output constraints Pre-defined routines, no agency Bottling, welding, etc Multiple tasks or variable environment; multi-agent Some conditional behavior, possibly learning routines Warehouse robots Reuters
Swarm robotics Hundreds of small robots operating together Each agent has Limited (and noisy) sensors Very limited compute capability Communication with other agents General idea: support complex environment interaction with minimal resources
Swarm robotics Demo
Very different problems from industrial Highly controlled environment Complex actuators Special-purpose sensors Open environment Very simple actuators Simple, general sensors
Multi-agent systems For each agent, environment includes many other agents All agents act concurrently, must predict what others will do One solution: single overall controller Problem: inflexible, hard to maintain Better solution: minimax-like Problem: too many other agents! And not turn-based. Best solution: robust to unexpected outcomes (bumps)
Application notes: Problems mix together Information retrieval gets easier and faster when you have good information extraction Errors in POS tagging mean errors farther down the line in information extraction or other applications Speech recognition requires good (text- based) language modeling
Many problems span multiple areas One popular task: image captioning Requires both good vision and language components
Robotics: combining everything A general-purpose, learning robot needs to combine many tasks Perception: vision, speech recognition Understanding/answering: information retrieval, language generation Acting: reinforcement learning, probabilistic modeling
Next time AI philosophy: Chinese room and society of mind