Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008

04/03/20082 Semi-Supervised Learning - + Very Few Abundant

04/03/20083 Outline ► Background ► Existing Methods ► Proposed Method  Ideal Case  General Case ► Experimental Results ► Conclusion

04/03/20084 Overview Semi- Supervised Learning Feature based Graph based Gradually generate class labels Collectively generate class labels Mincut [Blum, ICML01] Gaussian Random Fields [Zhu, ICML03] Local and Global Consistency [Zhou, NIPS04] Generative Model [He, IJCAI07] Self-Training, [Yarowsky, ACL95] Co-Training, [Blum, COLT98] TSVMs [Joachims, ICML99] EM-based, [Nigam, ML00]

04/03/20085 Self-Training [Yarowsky, ACL95] - +

04/03/20086 Co-Training [Blum, COLT98] Sufficient to train a good classifier Conditionally independent given the class

04/03/20087 Transductive SVMs [Joachims, ICML99] - + Inductive SVMs Transductive SVMs Classification Boundary: Away from the Dense Regions!

04/03/20088 EM-based Method [Nigam, ML00] Text Corpus Computer Science Medicine Politics

04/03/20089 - + - - --- ++++ Graph-Based Semi-Supervised Learning - - + + + +

04/03/200810 Graph-Based Methods ► G={V,E} ► Estimating a function f on the graph  f should be close to the given labels on the labeled nodes  f should be smooth on the whole graph ► Regularization

04/03/200811 Graph-Based Methods cont. ► Mincut [Blum, ICML01]  ► Gaussian Random Fields [Zhu, ICML03]  ► Local and Global Consistency [Zhou, NIPS04]  ► Discriminative in Nature!

04/03/200813 Motivation ► Existing Graph-Based Methods:  : NO justification  Discriminative: inaccurate proportion in the labeled set greatly AFFECTS the performance ► Proposed Method:  : WELL justified  Generative: estimated class priors COMPENSATES for the inaccurate proportion in the labeled set

04/03/200814 Notation ► n training examples: ► labeled examples, ► unlabeled examples ► Affinity matrix: ► similarity between and ► Diagonal matrix D : ► ► : set to 1 for labeled examples

04/03/200815 Ideal Case ► Two classes far apart

04/03/200816 Derivation Sketch Relate to Relate eigenvector to Relate to

04/03/200817 Class Conditional Probability ► Theorem 1  As,  Similar to kernel density estimation ► Unlabeled data  ? ?

04/03/200818 Class Conditional Probability cont. ► Eigenvectors of S  ; ► Element-wise: ► ;

04/03/200819 Class Conditional Probability cont. ► To get and, iterate: , ► Upon convergence , ► After normalization ,

04/03/200820 Example of the Ideal Case

04/03/200821 General Case ► Two classes not far apart ► S not block diagonal Upon Convergence

04/03/200822 Class Conditional Probability ► Iteration process  The labeled examples gradually spread their information to nearby points ► Solution  Stop the iteration when certain criterion is satisfied

04/03/200823 Stopping Criterion ► Average probability of the negative labeled examples in the positive class

04/03/200824 Stopping Criterion cont. Pre- maturity Excessive Propagation

04/03/200825 Stopping Criterion cont. ► Average probability of the positive labeled examples in the negative class

04/03/200826 Example of the General Case

04/03/200827 Estimating Class Priors ► Theorem 2: in the general case, as  ► To get estimates of  

04/03/200828 Prediction ► To classify a new example  Calculate the class conditional probabilities  According to Bayes rule

04/03/200830 Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Balanced classification 1 vs 2odd vs even Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

04/03/200831 Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Unbalanced classification Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency 1 vs 2odd vs even

04/03/200832 Genre Data Set [Liu, ECML03] ► Classification between random partitions balancedunbalanced Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

04/03/200833 Genre Data Set [Liu, ECML03] ► Unbalanced classification newspapers vs otherbiographies vs other Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

04/03/200834 Conclusion ► A new graph-based semi-supervised learning method  Generative in nature  Ideal case: theoretical guarantee  General case: reasonable estimates  Prediction: easy and intuitive

Questions?

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Similar presentations

Presentation on theme: "Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Similar presentations

Presentation on theme: "Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008."— Presentation transcript:

Similar presentations

About project

Feedback