LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

Slides:

Advertisements

Similar presentations

a ridge a valley going uphill

Advertisements

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

NA-MIC National Alliance for Medical Image Computing BRAINSCut General Tutorial Eun Young(Regina) Kim University of Iowa

Style & Architecture Introduction. Local Context: What plan and style is your local church? Think about the shape your local church forms on the ground.

Deep Visual Analogy-Making

Geometric Transformations Sang Il Park Sejong University Many slides come from Jehee Lee’s.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Line Conventions Line Conventions Introduction to Engineering DesignTM

Automatic Lung Cancer Diagnosis from CT Scans (Week 1)

End-To-End Memory Networks

Action-Grounded Push Affordance Bootstrapping of Unknown Objects

DeepFont: Identify Your Font from An Image

Data Mining, Neural Network and Genetic Programming

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Jure Zbontar, Yann LeCun

DO NOW W ( , ) X ( , ) Y ( , ) Z ( , ) YES NO YES NO YES NO

Ch10 : Self-Organizing Feature Maps

Automatic Lung Cancer Diagnosis from CT Scans (Week 4)

Mean Euclidean Distance Error (mm)

Unsupervised Learning and Autoencoders

CS6890 Deep Learning Weizhen Cai

Presenter: Hajar Emami

Adversarially Tuned Scene Generation

4.2 Data Input-Output Representation

Name: _______________________________

MATH 8 – UNIT 1 REVIEW.

Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.

Goodfellow: Chapter 14 Autoencoders

CS Fall 2016 (Shavlik©), Lecture 2

Shape representation in the inferior temporal cortex of monkeys

Outline Background Motivation Proposed Model Experimental Results

Deep Cross-media Knowledge Transfer

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.

Representation Learning with Deep Auto-Encoder

Anastasia Baryshnikova Cell Systems

Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng

Introduction to transformational GEOMETRY

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Dynamic Coding for Cognitive Control in Prefrontal Cortex

Advances in Deep Audio and Audio-Visual Processing

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Distinct Eligibility Traces for LTP and LTD in Cortical Synapses

Mulugeta H Tedla University of Cincinnati, April 22, 2008

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Figure Overview.

Figure Overview.

Heterogeneous convolutional neural networks for visual recognition

Surface Area of Rectangle Prisms

Dynamic Shape Synthesis in Posterior Inferotemporal Cortex

Relation (a set of ordered pairs)

Translate 5 squares left and 4 squares up.

Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Rotations Day 120 Learning Target:

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Personalized machine learning for robot perception of affect and engagement in autism therapy by Ognjen Rudovic, Jaeryoung Lee, Miles Dai, Björn Schuller,

Week 7 Presentation Ngoc Ta Aidean Sharghi

Learning to Detect Human-Object Interactions with Knowledge

SDSEN: Self-Refining Deep Symmetry Enhanced Network

Learning Dual Retrieval Module for Semi-supervised Relation Extraction

Directional Occlusion with Neural Network

Tensorflow in Deep Learning

Goodfellow: Chapter 14 Autoencoders

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

LOGAN: Unpaired Shape Transform in Latent Overcomplete Space Our deep neural network, LOGAN, learns shape transforms from unpaired domains. The same network can transform from skeletons to shapes, from cross-sectional profiles to surfaces, between chairs and tables; it can also add/remove parts, such as the armrests of a chair.

To learn shape transforms from unpaired domains, the key challenge: It is unknown what features are to be preserved/altered

Overview of our network architecture Our network is designed with two novel features to overcome the challenge: • Our autoencoder is jointly trained by the two domains. It encodes shape features into an overcomplete latent code. Our motivation is that an overcomplete code for the shape transform would leave more degree-of-freedom to the translator network to facilitate an implicit disentangling of the preserved and altered shape features. • In addition, our chair-to-table translator is not only trained to turn a chair code to a table code, but also trained to turn a table code to the same table code, as shown in Figure (b). Our motivation for the second translator loss, which we refer to as the feature preservation loss, is that it would help the translator preserve table features (in an input chair code) during chair-to-table translation.

Result: Table ↔ Chair input table output chair retrieved shape input chair output table retrieved shape

Either content transfer or style transfer, depending on the training domains. G → R R → G Depending on the input (unpaired) domains, our network LOGAN can learn both: Content Transfer (G ↔ R ), and Style Transfer (thick ↔ thin ), without any change to the network architecture. thick → thin thin → thick

Ablation study on datasets of P2P-NET We compare our results to P2P-NET, as well as results of different network configurations. Note that these shape transforms, e.g., cross-sectional profiles to shape surfaces, are of a completely different nature compared to table-chair translations. Our network is able to produce satisfactory results, as shown in column (b), which are visually comparable to results obtained by P2P-NET, a supervised method

Visualization 1. Joint embeddings of table and chair latent codes red = original chair code ; blue = original table code magenta = chair code after translation; blue = original table code We visualize the common latent spaces constructed by our autoencoder and two baselines by jointly embedding the chair and table latent codes. We can observe that, compared to the two baselines, our default autoencoder brings the chairs and tables closer together in the latent space since it is able to better discover their common features.

Visualization 2. Latent code preservation/alteration Visualizing “disentanglement” in latent code preservation (blue color) and alteration (orange color) during translation, where each bar represents one dimension of the latent code. Orange bars: top 64 latent code dimensions with the largest average changes. Blue bars: dimensions with smallest code changes. We can observe that our network automatically learns the right features to preserve (e.g., more global or coarser-scale features for skeleton→surface and more local features for G→R ), solely based on the input domain pairs. For the chair→table translation, coarse- and fine-level level features are both impacted.