Overview of our studies Ryo Yamada April, 2017
Data set Records are discrete Values {0,1},{0,1,2,…},R. {A,B,C,…} Dimensions No. samples
Not homogeneous values We care “values” heterogeneous
N=1 One value A value set Space Voxel type Point set type Essentially same : Discrete observation of “distribution”
N = 1, Discrete observation of distribution Spectral Decomposition Distribution -> Point Dimension reduction from ∞ -> k Moments Fourrier, Wavelets, Deep learning Partition
N > 1 We care non-homogeneity among N 2^N –(N+1) combinations We separate information into 2 parts Common part Difference part N(N-1)/2 pairs in particular Distance/ divergence cares “Different part” only MDS; dimension reduction to ~ (N-1) dimension N distributions -> N points = 1 sample of N points distribution Go back to Slide 4
Partition Whole can be separated into multiple parts The way of partition mean something from heterogeneity standpoint A value : Integer partition, Real value partition … Space : Subsets Classification, Clustering, Segregation How classify Parts Single part Go to slide 5 A set of parts Go to slide 6
Parts in the whole Subsets are embedded in the whole set Submanifold Subsets/submanifolds themselves are something in lower dimension but recorded with full dimension Treat subsets/submanifolds in their own dimension Curve as a curve When they are parameterized in the original space, go to slide3 (Spectral decomposition) When not, new parameterization is necessary
Subsets/submanifolds Parameterization is complex Simplification/standardization A unit segment, A unit circle, A unit sphere surface, A unit disc,…. Once standardized space with parameterization N = 1 -> go to slide 5 N > 1 -> go to slide 6
Subsets/submanifolds Subsets/submanifolds themselves are something in lower dimension but recorded with full dimension This dimension reduction is “geometric”.
Submanifolds Information geometry treat all-the-all distributions as points in the infinite-dimensional space Particular types of distributions, such as normal distribution is a subset/submanifold of the whole space Submanifolds have expressions parameterized Then, subsets/submanifolds, incl. cell shapes and FACS distributions, corresponds to “distributions parameterized with finite number of paramters??”
Time series analysis Where does time come in??? Time is (usually) a special dimensional axis; independent from others and unidirectional. Therefore, parameterization with time is almost always possible and “manifold-like” complex parameterization does not come in. Points, distributions, subsets/submanifolds in space can be traced along time.
Spaces in general 1-d n-d Simplex : {A,B,C,…}, categorical {0,1} R -> Discrete records: 1-dimensional lattice n-d Discrete records: n-dimensional lattice Simplex : {A,B,C,…}, categorical Network: A substructure of simplex Space structures are in the shape of “graph”, that determines “adjacency/neighbor” and “weight” of edge give information of “distance/diversity”.
STAN group P(theta|x) = P(x|theta)P(theta)/P(x) P(theta|x1,x2) = P(x1,x2|theta)P(theta)/P(x1,x2) P(x1,x2) : joint distribution When P(x1,x2) = P(x1)P(x2) and P(x1,x2|theta) = P(x1|theta)P(x2|theta), P(theta|x1,x2) = P(x1|theta)P(x2|theta)P(theta)/P(x1)P(x2) = P(x1|theta)/P(x1) P(x2|theta)/P(x2) P(theta) No need to care STAN with joint distributions
When P(x1,x2) != P(x1)P(x2) or P(x1,x2|theta) != P(x1|theta)P(x2|theta) STAN cares joint distributions that is the target of slide 5 Model of STAN determines “Joint distribution” Conditioned sampled data records are available and STAN estimates posterior distribution(s) Meta-analysis project Same sample set and multiple observations One joint distribution with multiple projections to different planes. Different sample sets and multiple observations One joint distribution with multiple projections to similar planes in “some-independent” occasions
Decision theory Parameterized stochastic phenomena Each stochastic event depends on the past with heavy memory This should be submanifolds in information theory space but not easily defined without numeric labor The difficulty is similar to “self-avoiding” path simulation. Some stochastic rules are known to automatically generate self- avoiding path… that is related the “curve” … therefore… ????