A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu Email:

A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces
——Based on a Semantic information theory Chenguang Lu Homepage:

Shannon’s Mutual Information Formula
The classical information formula: Shannon’s channel Shannon’s mutual information formula： It means the saved average coding length for X because of prediction Y. 𝐼( 𝑥 𝑖 ; 𝑦 𝑗 )=log 𝑃( 𝑥 𝑖 | 𝑦 𝑗 𝑃( 𝑥 𝑖 Source channel destination

Maximum Mutual Information (MMI) Classifications on Factor Spaces
Z: one-dimensional factor space Z: two-dimensional factor space A model for medical tests, signal detections, watermelon classifications, junk classifications…. We need to optimize partitioning boundaries z’ We call the Z-space is the factor space proposed by Peizhuang Wang Z-space is also called the feature space. We use “Factor space” to emphasize that not any set of attributions can be used for the feature space. Z is a laboratory datum or feature vector Z: A laboratory datum

Maximum Mutual Information (MMI) Classifications of Unseen Instances ——A Most Difficult Problem Left by Shannon We can only see Z without seeing X. Shannon uses distortion criterion instead of the MMI criterion. Why? To optimize z’, the problem is that without z’ we cannot express mutual information I(X;Y); without the expression of I(X; Y), we cannot optimize z’. The partition z’ and I(X;Y) are interdependent.

The Popular Methods for MMI Classifications and Estimations
Using parameters to construct boundaries Writing I(X;Y) Then optimizing parameters by Gradient Descent or Newton’s method. Disadvantages: Complicated Slow Convergence is not reliable

My Story about Looking for z*: Similar to Catching a Cricket
When I tried to optimize z’, for any start z’, my excel file told me: The best dividing point for MMI is next one! After I used the next one, it still said: The best point is the next one! ……Fortunately, the z’ converged！It is similar to catching … Can this method converge in any case? Let’s prove the convergence by my semantic information theory.

My Semantic Information Theory is a Natural Generalization of Shannon’s information Theory
Several semantic information theories: Carnap and Bar-Hillel’s Floridi’s Yixin Zhong’s Mine: A Chinese Book published in 1993: An English paper published in 1999: Lu C., A generalization of Shannon's information theory[J].Int. J. of General Systems, 1999，28 (6):

The New Method: the Channel Matching Algorithm Based on the Semantic Information Theory
It uses two types of channels: transition probability matrix Shannon’s Channel: which consists of a set of transition probability functions. Semantic Channel: true value martrix which consists of a set of truth functions

The Bayes’ Theorem Can Be Distinguished into Three Types
Bayes’ Theorem I between two logical probabilities proposed by Bayes: T(B|A)=T(A|B)T(B)/T(A) Bayes’ Theorem II between two statistical probabilities, used by Shannon : Baye’s Theorem III between a statistical probability and a logical probability to link the likelihood function and the truth function： Membership function or truth function： Semantic likelihood function Logical probability

Sematic Information Measure
The classical information formula Semantic Information of yj about xi is defined with log-normalized-likelihood The less the logical probability is, the more the information there is. The larger the true value is, which means that the hypothesis can survive the test, the more information there is. A tautology or a contradiction conveys no information. Reflects Popper’s falsification thought. If T(θj |x)≡1 then it becomes Carnap and Bar-Hillel’s formula I=log(1/T(θj)]. 𝐼( 𝑥 𝑖 ; 𝑦 𝑗 )=log 𝑃( 𝑥 𝑖 | 𝑦 𝑗 𝑃( 𝑥 𝑖

Semantic Mutual Information Formula
Averaging I(xi;θj), we have semantic mutual information If T(θj|X)=exp[-k(xi-xj)2], a Gaussian function without coefficient, then It is easy to find that the maximum semantic information criterion is a special Regularized Least Square (RLS) criterion, and the cross-entropy is the regularizer.

R(G) Function: the Matching Function of Shannon’s Mutual Information and Semantic mutual information
To develop Rate-Distortion function R(D), we obtain function R(G) R(G) means the minimum of R=I(X;Y) for given G=I(X;θ) G(R) means the maximum of G for given R. Rmax: MMI Matching point

T*(θj|X)=P(yj|X)/max[P(yj|X)]
Channels’ Matching Match I: semantic channel matches Shannon’s channel Label learning or training: To obtain the optimized semantic channel T(θj|X), j=1,2,… from Shannon’s channel P(yj|X) by T*(θj|X)=P(yj|X)/max[P(yj|X)] Or by Matching II：Shannon’s channel matches the semantic channel Classification or reasoning By the classifier It encourages us select a compound label with least denotation.

Channels’ Matching (CM) Iteration Algorithm for MMI Classifications of Unseen Instances
Given P(X), P(Z|X) and start dividing point z’, repeat the two steps： Matching I：T(θj|X) matches P(yj|X) Matching II：For given z’, there are information lines I(X;θj|Z), j=1,2,… Classifier for new z’: If z’ unchanges, end; else, Goto Matching I. Fast convergence, need 3-5 iterations. Convergence proof： To optimize z’ 有病没病阳性阴性 I(X;θ0|Z) I(X;θ1|Z)

Using R(G) Function to Prove the CM Algorithm’s Convergence
Iterative steps and convergence reasons: 1)Matching I: For each Shannon channel, there is a matched semantic channel that maximizes average log-likelihood; 2)Matching II: For given P(X) and semantic channel, we can find a better Shannon channel; 3)Repeating the two steps can obtain the Shannon channel that maximizes Shannon mutual information and average log-likelihood. A R(G) function serves as a ladder letting R climb up, and find a better semantic channel and a better ladder.

An Example Shows the Speed and Reliability
Two iterations can make I(X;Y) reach 99.9% of the MMI.

An Example of the MMI classification with a Bad Initial Partition

Comparison For MMI classifications on high-dimensional feature spaces, we need to combine the CM algorithm with the Neural Network.

Summary End Thanks for your listening! Welcome to exchange ideas.
The MMI classification is a difficult problem left by Shannon; it can be solved by the semantic information method. Channel Matching Algorithm: Matching I: improving the semantic channel by T(θ|X)∝P(Y|X) or Matching II: improving the Shannon channel by classifier: Repeat the above two steps until R=Rmax. End Thanks for your listening! Welcome to exchange ideas. For more papers about semantic information theory and machine learning, see or

A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu Email:

Similar presentations

Presentation on theme: "A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu Email:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu Email:

Similar presentations

Presentation on theme: "A New Iteration Algorithm for Maximum Mutual Information Classifications on Factor Spaces ——Based on a Semantic information theory Chenguang Lu Email:"— Presentation transcript:

Similar presentations

About project

Feedback