LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

Slides:



Advertisements
Similar presentations
a ridge a valley going uphill
Advertisements

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
NA-MIC National Alliance for Medical Image Computing BRAINSCut General Tutorial Eun Young(Regina) Kim University of Iowa
Style & Architecture Introduction. Local Context: What plan and style is your local church? Think about the shape your local church forms on the ground.
Deep Visual Analogy-Making
Geometric Transformations Sang Il Park Sejong University Many slides come from Jehee Lee’s.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Line Conventions Line Conventions Introduction to Engineering DesignTM
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
End-To-End Memory Networks
Action-Grounded Push Affordance Bootstrapping of Unknown Objects
DeepFont: Identify Your Font from An Image
Data Mining, Neural Network and Genetic Programming
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Jure Zbontar, Yann LeCun
DO NOW W ( , ) X ( , ) Y ( , ) Z ( , ) YES NO YES NO YES NO
Ch10 : Self-Organizing Feature Maps
Automatic Lung Cancer Diagnosis from CT Scans (Week 4)
Mean Euclidean Distance Error (mm)
Unsupervised Learning and Autoencoders
CS6890 Deep Learning Weizhen Cai
Presenter: Hajar Emami
Adversarially Tuned Scene Generation
4.2 Data Input-Output Representation
Name: _______________________________
MATH 8 – UNIT 1 REVIEW.
Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.
Goodfellow: Chapter 14 Autoencoders
CS Fall 2016 (Shavlik©), Lecture 2
Shape representation in the inferior temporal cortex of monkeys
Outline Background Motivation Proposed Model Experimental Results
Deep Cross-media Knowledge Transfer
Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.
Representation Learning with Deep Auto-Encoder
Anastasia Baryshnikova  Cell Systems 
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Introduction to transformational GEOMETRY
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Dynamic Coding for Cognitive Control in Prefrontal Cortex
Figure 11-1.
Advances in Deep Audio and Audio-Visual Processing
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Distinct Eligibility Traces for LTP and LTD in Cortical Synapses
Mulugeta H Tedla University of Cincinnati, April 22, 2008
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Figure Overview.
Figure Overview.
Heterogeneous convolutional neural networks for visual recognition
Surface Area of Rectangle Prisms
Dynamic Shape Synthesis in Posterior Inferotemporal Cortex
Relation (a set of ordered pairs)
Translate 5 squares left and 4 squares up.
Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Rotations Day 120 Learning Target:
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Personalized machine learning for robot perception of affect and engagement in autism therapy by Ognjen Rudovic, Jaeryoung Lee, Miles Dai, Björn Schuller,
Week 7 Presentation Ngoc Ta Aidean Sharghi
Learning to Detect Human-Object Interactions with Knowledge
SDSEN: Self-Refining Deep Symmetry Enhanced Network
Learning Dual Retrieval Module for Semi-supervised Relation Extraction
Directional Occlusion with Neural Network
Tensorflow in Deep Learning
Goodfellow: Chapter 14 Autoencoders
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

LOGAN: Unpaired Shape Transform in Latent Overcomplete Space Our deep neural network, LOGAN, learns shape transforms from unpaired domains. The same network can transform from skeletons to shapes, from cross-sectional profiles to surfaces, between chairs and tables; it can also add/remove parts, such as the armrests of a chair.

To learn shape transforms from unpaired domains, the key challenge: It is unknown what features are to be preserved/altered

Overview of our network architecture Our network is designed with two novel features to overcome the challenge: • Our autoencoder is jointly trained by the two domains. It encodes shape features into an overcomplete latent code. Our motivation is that an overcomplete code for the shape transform would leave more degree-of-freedom to the translator network to facilitate an implicit disentangling of the preserved and altered shape features. • In addition, our chair-to-table translator is not only trained to turn a chair code to a table code, but also trained to turn a table code to the same table code, as shown in Figure (b). Our motivation for the second translator loss, which we refer to as the feature preservation loss, is that it would help the translator preserve table features (in an input chair code) during chair-to-table translation.

Result: Table ↔ Chair input table output chair retrieved shape input chair output table retrieved shape

Either content transfer or style transfer, depending on the training domains. G → R R → G Depending on the input (unpaired) domains, our network LOGAN can learn both: Content Transfer (G ↔ R ), and Style Transfer (thick ↔ thin ), without any change to the network architecture. thick → thin thin → thick

Ablation study on datasets of P2P-NET We compare our results to P2P-NET, as well as results of different network configurations. Note that these shape transforms, e.g., cross-sectional profiles to shape surfaces, are of a completely different nature compared to table-chair translations. Our network is able to produce satisfactory results, as shown in column (b), which are visually comparable to results obtained by P2P-NET, a supervised method

Visualization 1. Joint embeddings of table and chair latent codes red = original chair code ; blue = original table code magenta = chair code after translation; blue = original table code We visualize the common latent spaces constructed by our autoencoder and two baselines by jointly embedding the chair and table latent codes. We can observe that, compared to the two baselines, our default autoencoder brings the chairs and tables closer together in the latent space since it is able to better discover their common features.

Visualization 2. Latent code preservation/alteration Visualizing “disentanglement” in latent code preservation (blue color) and alteration (orange color) during translation, where each bar represents one dimension of the latent code. Orange bars: top 64 latent code dimensions with the largest average changes. Blue bars: dimensions with smallest code changes. We can observe that our network automatically learns the right features to preserve (e.g., more global or coarser-scale features for skeleton→surface and more local features for G→R ), solely based on the input domain pairs. For the chair→table translation, coarse- and fine-level level features are both impacted.