Content-Based Image Retrieval: Reading One’s Mind and Making People Share Oral defense by Sia Ka Cheung Supervisor: Prof. Irwin King 31 July 2003.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
K Means Clustering , Nearest Cluster and Gaussian Mixture
ARNOLD SMEULDERS MARCEL WORRING SIMONE SANTINI AMARNATH GUPTA RAMESH JAIN PRESENTERS FATIH CAKIR MELIHCAN TURK Content-Based Image Retrieval at the End.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Small-world Overlay P2P Network
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Young Deok Chun, Nam Chul Kim, Member, IEEE, and Ick Hoon Jang, Member, IEEE IEEE TRANSACTIONS ON MULTIMEDIA,OCTOBER 2008.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
February, Content-Based Image Retrieval Saint-Petersburg State University Natalia Vassilieva Il’ya Markov
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Presented by Zeehasham Rasheed
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Introduction Widespread unstructured P2P network
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Content-Based Image Retrieval
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Distributed Content Based Visual Information Retrieval System On Peer To Peer Network by SAMEER ABROL Source ACM Transactions on Information Systems.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Competence Centre on Information Extraction and Image Understanding for Earth Observation 29th March 2007 Category - based Semantic Search Engine 1 Mihai.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Yixin Chen and James Z. Wang The Pennsylvania State University
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
A Genetic Algorithm-Based Approach to Content-Based Image Retrieval Bo-Yen Wang( 王博彥 )
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CHAPTER 3 Architectures for Distributed Systems
Early Measurements of a Cluster-based Architecture for P2P Systems
DHT Routing Geometries and Chord
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Presentation transcript:

Content-Based Image Retrieval: Reading One’s Mind and Making People Share Oral defense by Sia Ka Cheung Supervisor: Prof. Irwin King 31 July 2003

2 Flow of Presentation Content-Based Image Retrieval Reading One’s Mind  Relevance Feedback Based on Parameter Estimation of Target Distribution Making People Share  P2P Information Retrieval  DIStributed COntent-based Visual Information Retrieval

31 July Content-Based Image Retrieval How to represent and retrieve images?  By annotation (manual) Text retrieval Semantic level (good for picture with people, architectures)  By the content (automatic) Color, texture, shape Vague description of picture (good for pictures of scenery and with pattern and texture)

31 July Feature Extraction R B G

31 July Indexing and Retrieval Images are represented as high dimensional data points (feature vector) Similar images are “close” in the feature vector space  Euclidean distance is used

31 July Typical Flow of CBIR Images Database Index and Storage Feature Extraction Query Result Query Image Lookup

31 July Reading One’s Mind Relevance Feedback

31 July Why Relevance Feedback? The gap between semantic meaning and low-level feature  the retrieved results are not good enough Images Database Index and Storage Feature Extraction Result Query Image Lookup Better Result Feedback Better Result Feedback

1 st iteration User Feedback Display 2 nd iteration Display User Feedback Estimation & Display selection Feedback to system

31 July Problem Statement Assumption: images of the same semantic meaning/category form a cluster in feature vector space Given a set of positive examples, learn user’s preference and find better result in the next iteration

31 July Former Approaches Multimedia Analysis and Retrieval System (MARS)  IEEE Trans CSVT 1998  Weight updating, modification of distance function Pic-Hunter  IEEE Trans IP 2000  Probability based, updated by Bayes’ rule  Maximum Entropy Display

31 July Comparisons AspectModelDescription Modeling of user’s target MARSWeighted Euclidean distance Pic-HunterProbability associated with each image Our approachUser’s target data point follow Gaussian distribution Learning method MARSWeight updating, modification of distance function Pic-HunterBayes’ rule Our approachParameter estimation Display selection MARSK-NN neighborhood search Pic-HunterMaximum entropy principle Our approachSimulated maximum entropy principle

31 July Estimation of Target Distribution Assume the user’s target follows a Gaussian distribution Construct a distribution that best fits the relevant data points into some “specific” region Data points selected as relevant

31 July Estimation of Target Distribution Assume the user’s target follows a Gaussian distribution Construct a distribution that best fits the relevant data points into some “specific” region Data points selected as relevant

31 July Estimation of Target Distribution Assume the user’s target follows a Gaussian distribution Construct a distribution that best fits the relevant data points into some “specific” region Data points selected as relevant

31 July Expectation Function Best fit the relevant data points to medium likelihood region The estimated distribution represents user’s target

31 July Updating Parameters After each feedback loop, parameters are updated  New estimated mean = mean of relevant data points  New estimated variance  found by differentiation  Iterative approach

31 July Display Selection Why maximum entropy principle?  K-NN is not a good way to learn user’s preference  The novelty of result set is increased, thus allowing user to browse more from the DB How to use maximum entropy?  PicHunter – Select a subset of images which entropy is maximized.  Our approach – data points inside boundary region (medium likelihood) are selected

31 July Simulating Maximum Entropy Display Data points around the region of 1.18 δ away from μ are selected Why 1.18?  2P(μ+1.18 δ)=P(μ) Query target cluster center Selected by knn search Selected by Max. Entropy P(μ+1.18 δ) P(μ)

31 July Experiments Synthetic data forming mixture of Gaussians are generated Feedbacks are generated based on ground truth (class membership of synthetic data) Investigation  Does the estimated parameters converge?  Does it performs better? DimensionNo. of classNo. of data points in each class Range of μRange of δ 450 [-1,1][0.2,0.6] 67050[-1.5,1.5][0.2,0.6] 88550[-1.5,1.5][0.15,0.45]

31 July Convergence of Estimated Parameters More feedbacks are given, estimated parameters converge to original parameters used to generate mixtures

31 July Precision-Recall Red – PE Blue – MARS More experiments in later section

31 July Precision-Recall

31 July Problems What if user’s target distribution forms several cluster?  Indicated in Qcluster (SIGMOD’03)  Parameters estimation failed because single cluster is the assumption Qcluster solve it by using multi-points query Merge different clusters into one cluster !!

31 July The Use of Inter-Query Feedback Relevance feedback information given by users in each query process often infer a similar semantic meaning (images under the same category) Feature vector space can be re-organized  Relevant images are moved towards to the estimated target  Similar images no longer span on different clusters Parameters estimation method can be improved

31 July st Stage of SOM Training Large number of data points   SOM is used to reduce data size   Each neuron represent a group of similar images   original feature space is not changed directly

31 July Procedure of Inter-query Feedback Updating User marked a set of images as relevant or non-relevant in a particular retrieval process The corresponding relevant neurons are moved towards estimated target Where  M ’ R – set of relevant neurons  c – estimated target  α R – learning rate The corresponding non-relevant neurons are moved away from estimated target

31 July SOM-based Approach Neuron Class 1 Neuron Class 2 Neuron Class 3

31 July SOM-based Approach After each query process Relevant Neuron Non- Relevant Neuron

31 July SOM-based Approach Estimated Target

31 July SOM-based Approach Relevant neurons are moved towards estimated target

31 July SOM-based Approach

31 July SOM-based Approach Feature vector space re-organized

31 July SOM-based Approach After several iterations (users’ queries)

31 July SOM-based Approach

31 July SOM-based Approach Similar images cluster together instead of spanning across different clusters in the new, re-organized feature vector space

31 July Experiments Real data from Corel image collection  4000 images from 40 different categories Feature extraction methods  RGB color moment (9-d)  Grey scale cooccurence matrix (20-d) 80 queries are generated evenly among 40 classes Evaluations  MARS  PE without SOM-based inter-query feedback training  PE with SOM-based inter-query feedback training

31 July Precision vs Recall

31 July Conclusion We propose a parameters estimation approach for capturing user’s target as a distribution A display set selection scheme similar to maximum entropy display is used to capture more user’s feedback information A SOM-based inter-query feedback is proposed  Overcome the single cluster assumption of most intra- query feedback approach

31 July Making People Share DIStributed COntent-based Visual Information Retrieval

31 July P2P Information Retrieval Images Peer databases Feature Extraction Query Result Query Image Lookup … How to locate relevant images In an efficient manner?

31 July Contributions Migrate centralized architecture of CBIR to distribution architecture Improve existing query scheme in P2P applications A novel algorithm for efficient information retrieval over P2P  Peer Clustering  Firework Query Model (FQM)

31 July Existing P2P Architecture Centralized  Napster, SETI (Berkeley), ezPeer (Taiwan)  Easy implementation  Bottleneck, single point failure  Legal problems update query answer transfer

31 July Existing P2P Architecture Decentralized Unstructured  Gnutella (AOL, Nullsoft), Freenet (Europe)  Self-evolving, robust  Query flooding Peer TCP connection

31 July Existing P2P Architecture Decentralized Structured  Chord (SIGCOMM’01), CAN(SIGCOMM’01), Tapestry (Berkeley)  Efficient retrieval and robust  Penalty in join and leave Distributed Hash Table (DHT) Peer in the network Files shared by peers CAN model TCP connection

31 July DISCOVIR Approach Decentralized Quasi-structured  DISCOVIR (CUHK)  Self-organized, clustered, efficient retrieval attractive connections random connections

31 July Design Goal and Algorithms used in DISCOVIR Peers sharing “similar” images are interconnected Reduce flooding of query message Construction of self-organizing network 1. Signatures calculation 2. Neighborhood discovery 3. Attractive connections establishment Content-based query routing 1. Route selection 2. Shared file lookup

31 July Construction of Self-Organizing Network Signatures calculation Signatures discovery of neighborhoods Comparison of signatures Attractive connection establishment

31 July Signatures Calculation Feature vector space

31 July Signatures Calculation Peer A Peer B Centroid of peer

31 July Signatures Calculation Peer A A1A1 A2A2 A3A3 Peer B B1B1 B2B2 B3B3 Centriod of sub-cluster Centroid of peer

31 July Signatures Calculation Peer A A1A1 A2A2 A3A3 Peer B B1B1 B2B2 B3B3 Centriod of sub-cluster Centroid of peer

New attractive connection Random connection Existing clustered P2P network (1.5,1.2) (1.3,2.4) (1.6,1.8) (5.2,8.5) (5.6,8.8) (4.9,9.7) (4.7,9.3) (2.9,6.5) (2.7,6.0) (2.6,5.9) (2.0,5.8) Random connection Attractive connection (x,y) Signature value

31 July Content-based Query Routing Incoming query Signature value left? Similarity < threshold? Forward to random link if not forwarded before Forward to attractive link End N Y Y N

31 July Content-based Query Routing Similarity < threshold Similarity > threshold Random connection Attractive connection

31 July Comparison of Content-based Query Routing and Address-based Query Routing AspectSchemeDescription ApplicationABRInternet Protocol (IP), Domain Name System (DNS) CBRWide Area Information System (WAIS), DISCOVIR ProblemABRWe know where to go, but not the path CBRWe don’t know where to go, nor the path EmphasisABRCorrectness, speed CBRRelevance of result retrieved GoalABRAvoid unnecessary traffic CBRAvoid unnecessary traffic

31 July Experiments Dataset  RBG color moment, 9-d images from Corel database, 100 classes  Synthetic data, 9-d, points, 100 classes Operation  Distribute data-points into peers (1 class per peer)  Simulate network setup and query (averaged 50 queries) Investigation  Scalability (against number of peers)  Property (against TTL of query message)  Data resolution (different number of signatures per peer)

31 July Network Model Small world characteristic, power-law distribution  Few peers are connected with many peers  Many peers are connected with few peers

31 July Performance Metrics Relevance of retrieved result  Recall Number of query traffic generated  Query scope Effectiveness of query routing scheme  Query efficiency Number of retrieved relevant result Total number of relevant result Number peers visited by query message Total number of peers

31 July Recall vs Peers

31 July Recall vs TTL

31 July Query Scope vs Peers

31 July Query Scope vs TTL

31 July Query Efficiency vs Peers

31 July Query Efficiency vs TTL

31 July Difference Between Synthetic Data and Real Data Inter-cluster distanceMean of variances realsyntheticrealsynthetic max max min min avg avg

31 July Effects of Data Resolution Assign 2-4 classes of image to each peer  High data resolution (use 3 signatures)  Low data resolution (use 1 signatures)

31 July Conclusion CBIR is migrated from centralized server approach to peer-to-peer architecture Efficient retrieval is achieved by  Constructing a self-organizing network  Content-based query routing Scalability, property and effects on data resolution are investigated Query efficiency are at least doubled under the proposed architecture

Questions and Answers

31 July Precision, Recall, Novelty Precision  range [0,1] Recall  range [0,1] Set of retrieved result Set of relevant result Set of retrieved known to user before hand

31 July Update Equation, Learning Rate Update equation of non-relevant neurons Update equation of neighboring neurons

Original SOM MModified SOM M’ Modified model vector space data-size reduced to |M’| Original feature vector space data-size is |I| 1-1 mapping function – f -1 Table lookup Retrieval process with SOM to capture feedback information

31 July Multiple Clusters Version

31 July DISCOVIR – System Architecture Built on LimeWire, Java-based Plug-in architecture for feature extraction module Query by example, sketch, thumbnail previewing Connection Manager Packet Router Plug-in Manager HTTP Agent Feature Extractor Image Indexer DISCOVIR Network Shared Collection DISCOVIR Core DISCOVIR User Interface WWW Image Manager

31 July DISCOVIR – Screen Capture

DISCOVIR-Protocol Modification Minimum SpeedFeature Name0Feature Vector0 Image Query 0x … Number of HitsPortIP AddressSpeedResult Set Image Query Hit 0x Servant Identifier …nn+16 File IndexFile SizeFile NameThumbnail information, similarity … Minimum SpeedDISCOVIRSIGNATURE 0 0 DISCOVIR Signature Query 0x Number of HitsPortIP AddressSpeedResult Set DISCOVIR Signature Query Hit 0x Servant Identifier …nn+16 Dummy Feature Extraction nameSignature value …

31 July Query Message Utilization

31 July Average Reply Path Length