Artificial Intelligence Project 2 : Cross-modal Generation between Image and Text using Hypernetworks 2009. 11. 9. Prepared by Kim, Byoung-Hee and Ko, Younggil Presented by Heo, Min-Oh Biointelligence laboratory
(C) 2009, SNU Biointelligence Laboratory Contents Overview Theme of the Project: MMG Task Description Data set Guide to Writing Reports Style, mandatory contents, optional contents Submission guide / Marking scheme Brief guide to the MMG tool (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Overview Goal Understand Hypernetworks & AI deeper Practice research and technical writing Multimodal Memory Game (MMG) Simulation of recall memory & cross-modal matching (of human) Image-to-text (I2T) & text-to-image(T2I) generation Data Set (Screenshot, sentence) pairs from ‘Lost’ (American TV drama) (C) 2009, SNU Biointelligence Laboratory
Ultimate Goal: Human Level AI Creative Adaptive Many-Talented Friendly / Social Uncertain Careless Emotional Non-logical 1 + 2 = 5 ! 100 < 10 ? 수정했음. To reach ‘Human-Level Intelligence’, we need to imitate/reproduce various human attributes 2018-11-12 © 2009, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Molecular Self-assembly and Cognitive Associative Memory DNA Hypernetworks: Self-assembly based cognitive memory Molecular Structure (DNA) Molecular Recognition Self Assembly Byoung-Tak Zhang, Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory, IEEE Computational Intelligence Magazine, 3(3): 49-63, August 2008.
Review of the Lecture on Hypernetworks (C) 2009, SNU Biointelligence Laboratory
Theme of the Project: Multimodal Memory Game (MMG) The game consists of a machine learner and two or more human learners in a digital cinema All the participants including the machine watch the movies. After watching, the humans play the game by question-and-answering about the movie scenes and dialogues. There are two human players, called I2T and T2I. The task of player I2T (forimage-to-text) is to generate a text given a movie cut (image). Player T2I (for text-to-image) is to generate an image given a text from the movie captions. one machine learner (learning by viewing) watching video movies in a digital cinema. The goal of the machine learner is to perform crossmodal translation, e.g., generating a sentence given an image out of the movie or vice versa. The machine learner gets hints from the human learners playing the game by asking questions and answering them in different modalities. (C) 2009, SNU Biointelligence Laboratory
MMG : Text-to-Image Case (C) 2009, SNU Biointelligence Laboratory
© 2009, SNU Biointelligence Lab, http://bi.snu.ac.kr/ One of the Motivations of MMG: Video Search based on Vision-Word Crossmodal Information © 2009, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Example: Text-to-Image Search Query Extracted images Extracted Video Patch Selected images © 2009, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Example: Image-to-Text Search Query Extracted Texts Extracted Video Patch © 2009, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Another Motivation of MMG In the point-of-view of Cognitive Science: Imitating recall(memory) of human brain When discussing memory, recall is the act of retrieving from long term memory a specific incident, fact or other item. Three types of recall Free recall: when no clues are given to assist retrieval Serial recall: when items are recalled in a particular order Cued recall: when some clues are given to assist retrieval (C) 2009, SNU Biointelligence Laboratory
Training Hypernetworks for MMG Initialization by sampling (refer the next slide) for each (sentence, image) pair in the training set In the case of image-to-text (I2T): generate a sentence based on the given image In the case of text-to-image (T2I): generate an image based on the given text Evaluation of the generation ability as a recall memory: compare generated results with the original Correctly matched hyperedge: increase its weight Incorrectly matched hyperedge: remove it and add a newly sampled hyperedge Iteration Step B ‘epoch’ times (C) 2009, SNU Biointelligence Laboratory
Initialization of Hypernetworks by Random Sampling from Dataset xi1=1 Image Text xi4=1 xi2=0 xi3=1 xi(n-3)=0 xi(n-1)=0 xi(n-2)=1 xin=0 can you we help to house the …… hyperedge1 xiy2=0 xiy1=1 Xiyn-1=1 xiyn=0 hyperedge2 xiz2=1 xizn-1=0 xizn=0 hyperedge3 xik2=1 xik1=1 Xikn-1=0 xikn=1 xiz1=0 Randomly selected pixels text order image order
(C) 2009, SNU Biointelligence Laboratory Tasks for the Project Build hypernetworks that do T2I/I2T generation using given dataset Check the effects of the parameters for training hypernetworks Order of hyperedges Learning rate Sampling (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Data Set Dataset preparation 349 pairs of image & sentence from ‘Lost’ Each sentence was translated to integer form based on the dictionary file (text.txt, dic.txt) “This is not even a date” “33,34,35,36,27,37” (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Data Set (cont’d) Each screenshot has been converted to 80 by 60 size b/w bitmap image. One image per one line in the data file (image.txt) 1,0,1,0,0,0,0,1,1,1,1,1,1,0,0,0,…… (C) 2009, SNU Biointelligence Laboratory
Report Contents – Mandatory System description Used software and running environments Basic experiments Text2Image: try various ‘epoch’ values and check the (subjective or objective) quality of the generated images Image2Text: set ‘epoch=1’, and check the quality of generated sentences while increasing ‘sampling count’ Analysis & discussion Text2Image: analysis & discussion about the relation between ‘epoch’ and resulting image set (C) 2009, SNU Biointelligence Laboratory
Report Contents – Optional Analysis on the effect of various parameters Idea/suggestion about the way of learning for cross-modal generation Idea/suggestion about the application of MMG (C) 2009, SNU Biointelligence Laboratory
Reports Style English only, Scientific journal-style How to Write A Paper in Scientific Journal Style and Format http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html Experimental process Section of Paper What did I do in a nutshell? Abstract What is the problem? Introduction How did I solve the problem? Materials and Methods What did I find out? Results What does it mean? Discussion Who helped me out? Acknowledgments (optional) Whose work did I refer to? Literature Cited Extra Information Appendices (optional) (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Submission Guide Due date: December 2, 18:00 Submit both ‘hardcopy’ and ‘email’ Hardcopy submission to the office (302-314-1) E-mail submission to ykko@bi.snu.ac.kr Subject : [AI Project2 Report] Student number, Name Length: report should be summarized within 12 pages. If you build a program by yourself, submit the source code with comments Objective: NOT the accuracy and your programming skill, but your creativity and research ability. Individual project! You have to do it by yourself. (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Marking Scheme 40 points for experiment & analysis Extra 3 points per additional experiment 20 points for the report 6 points for overall organization Late work (- 10%) per one day (-8 points) Maximum 7 days (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Demo – How to Start Unzip 2009_AI_Project2.zip file MultiModal Game program Generating program for Image and Sentence (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory MMG Program Inside the Hypernetwork folder, there are four files. Execution file Configuration file Data file (C) 2009, SNU Biointelligence Laboratory
MMG Program(Configuration&run) Setting parameters using “configure.txt” file And execute! “Hypernetwork.exe” (C) 2009, SNU Biointelligence Laboratory
Explnation on Parameters Text order : number of words in one hyperedge (default: 3, suggested values: 2~3, Integer) Image order : number of pixels in one hyperedge (default: 30, suggested value: 10~50, Integer) Max epoch : number of learning iteration (Integer) Weight Update Rate : Initially, weights are setting to 1. In learning process, weights will be updated according to this value (+0.1 or -0.1) Sampling count : number of generated hyperedges in one data pair (default: 10, suggested values: 5~30, Integer) (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Image Generation MMG Program gives binary file, not image file. This process is for converting binary file image file Write down, binary file name Finally, execute “ImageGenearation.exe” !! (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory Sentence Generation Text result also not the sentence form. This program is for converting InputFile name And OutputFile name Finally, execute “SentenceGenearation.exe” !! (C) 2009, SNU Biointelligence Laboratory
(C) 2009, SNU Biointelligence Laboratory FAQ Parameters that affects severely to the running time Epoch, sampling count Current program does not allow making new training files Dictionary file is fixed. If you want to, make dictionary file too. If you have any question about the program, visit the office 302-314-1 (Tel. 880-1835) Youngkil, Ko (ykko@bi.snu.ac.kr) (C) 2009, SNU Biointelligence Laboratory