OverLay: Practical Mobile Augmented Reality Hello everyone, Good morning! It is my great pleasure in speaking here at XXX about my work. My name is Puneet Jain. Thank you for attending my talk and inviting me here. And I will look forward meeting many of you later today. Puneet Jain Duke University/UIUC Justin Manweiler IBM Research Romit Roy Choudhury UIUC
Last year’s tax statements Idea Mobile Augmented Reality Last year’s tax statements Allow random indoor object tagging Others should be able to retrieve Faulty Monitor Wish Mom Birthday Return CDs Mobile AR refers to ones ability to scan the surroundings via smartphones camera and see virtual information associated to the object on phone’s screen. The information could be anything from text annotations, web URLs to audio or video. This vision is certainly not new and have existed for a while now. Many designers, science fiction writers, and researchers have imagined several applications which could exist if this is realized into a holistic system.
Introduction Going forward, I would set ones expectation from Mobile AR. I have video demonstration of an AR system which help in understanding our objective better.
Why not a solved problem? Need to understand today’s approaches Vision Sensing The obvious question is why? – to answer why.. We need to look at current generation approaches.. Mobile AR is currently done in two ways.. Vision/Image based AR or Sensing based AR… both of these approaches are necessary but none of them are sufficient .. Both necessary but not sufficient
Accurate Algorithms are Slow Vision Sensing Feature Extraction Feature Matching Note than accuracy is most important in case of Mobile AR. Unlike google search where any match similar to a given image is OK, mobile AR require exact match. Also, no two similar looking things are. One exit sign is different from another exit sign in the same building since they can indicate different things. Accurate Algorithms are Slow
Matching latency too high for real-time Vision Sensing Offloading + GPU For 100 image DB Matching ≈ 1 s Extraction ≈ 29 ms Network ≈ 302 ms GPU on Cloud Matching latency too high for real-time
Vision Sensing Not possible indoors Requires User Location Brunelleschi's dome Requires User Location Requires Precise Orientation Talk about how new objects would be added and how inaccuracies in sensing quickly detail this. Requires Object Location Not possible indoors
Vision Sensing Accurate/Slow Quick/Inaccurate Indoor Location Clearly there are tradeoffs between accuracy and latency … sensing and computer vision… offload or not to offload…and in todays talk.. That’s what our primary agenda is … Indoor Location But, Indoor Localization is not always available Can accelerate Vision Prerequisites for Sensing
Location-free AR Natural pause, turn, walk indicate spatial-relationships between tags 10 seconds C D 110° 5 seconds 80° B 7 seconds Lets look a how people would use in a museum scenario. An user walks across the museum and looks at paintings on the way. Few natural usage patterns emerge here -- possibly indicating separation between the tags. A Sensors can help in building such geometric layouts Geometry, instead of location, can be used to reduce computation burden on vision
Primary Challenge: Matching Latency Temporal Relationships Rotational Relationships
Temporal Relationships ROTATIONAL Temporal Relationships E D T=21, saw E C T=15, saw C Temporal separations can be captured on cloud B TAB ≤ 7 + ETAB TAB ≥ 7 – ETAB TAC ≤ 15 + ETAC TAC ≥ 15 – ETAC ETAB, ETAC, TAB, TAC≥ 0 When phone is moving toward C, C can be prioritized for the matching.. Similarly when phone turns away from D, it can be removed from the candidate set T=7, saw B A T=0, saw A
Solving for Typical Time TEMPORAL ROTATIONAL Solving for Typical Time
Using temporal Relationships ROTATIONAL Using temporal Relationships E D T=TCURRENT C B EAB TCURRENT - TA A T=TA if ((TCURRENT – TA) + ETAB > TAB ) - Shortlist Time when the object is viewed
Rotational Relationships TEMPORAL ROTATIONAL Rotational Relationships E D C Gyroscope captures angular changes 90° clockwise B When phone is moving toward C, C can be prioritized for the matching.. Similarly when phone turns away from D, it can be removed from the candidate set 110° anti-clockwise RB – RA ≤ 20° + ERBA RB – RA ≥ 20° – ERBA RC – RA ≤ 130° + ERCA RC – RA ≥ 130° – ERCA ERBA, ERCA, RA, RB, RC ≥ 0 A 20° anti-clockwise
Using Rotational Relationships TEMPORAL ROTATIONAL Using Rotational Relationships E D RCURRENT = RA + Gyro Gyro RD B RB RE RA RB RCURRENT ERB/2 A B’s rotational distance = RB – RCURRENT + ERB/2 - Pick tags closer in rotational distance
OverLay: Converged Architecture Selected candidates GPU Optimized Pipeline SURF Refine Match frame “Botanist” N E T W O R K Blur? Hand Motion? Frame Diff? Macro-trajectory Linear Program Sensory Geometry (time, orientation) Learning Update modules (frames, sensors) Annotation DB SURF Annotation DB Retrieve Micro-trajectory Spatial reasoning Visual Geometry Select Candidates Annotate This talk (image, “Botanist”)
Evaluation Android App/Samsung Galaxy S4 Server: GPU on Cloud 12 Cores, 16G RAM, 6G NVidia GPU 11 Volunteers 100+ Tags 4200 Frame Uploads
System Variants Approximate (Quick Computer Vision) Matching using approximate schemes e.g., KDTree Conservative (Slow Computer Vision) Matching using brute-force schemes OverLay Conservative + Optimizations
Optimizations lead to 4 fold improvement Latency Optimizations lead to 4 fold improvement
Accuracy: Precision OverLay ≈ Bruteforce
Approximate < OverLay < Bruteforce Accuracy: Recall Approximate < OverLay < Bruteforce
Conclusion Vision and Sensing based ARs Geometric layouts: Accelerated Vision OverLay: Practical Mobile AR
synrg.csl.illinois.edu/projects/MobileAR Thank you synrg.csl.illinois.edu/projects/MobileAR Puneet Jain Duke University/UIUC Justin Manweiler IBM Research Romit Roy Choudhury UIUC
3D-OBJECTS
Handling 3D Objects: Learning Tagged from particular angle Retrieving from different angle
Accuracy: After Learning Recall > Bruteforce and Precision ≈ Bruteforce