Overview of our studies

Slides:



Advertisements
Similar presentations
The divergence of E If the charge fills a volume, with charge per unit volume . R Where d is an element of volume. For a volume charge:
Advertisements

Distance Preserving Embeddings of Low-Dimensional Manifolds Nakul Verma UC San Diego.
Dimension reduction (1)
Graphs of the form y = x n for n ≥ 1. Graphs of the form y = x n all go through the point (1, 1) The even powers, y = x 2, x 4, x 6 … also go through.
Taylor Expansion Diagrams (TED): Verification EC667: Synthesis and Verification of Digital Systems Spring 2011 Presented by: Sudhan.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Complex Model Construction Mortenson Chapter 11 Geometric Modeling
Slide 1 Statistics Workshop Tutorial 7 Discrete Random Variables Binomial Distributions.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Diffusion Maps and Spectral Clustering
2-9 Operations with complex numbers
Absolute error. absolute function absolute value.
General (point-set) topology Jundong Liu Ohio Univ.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
1.4 Parametric Equations. Relations Circles Ellipses Lines and Other Curves What you’ll learn about… …and why Parametric equations can be used to obtain.
Big Ideas Differentiation Frames with Icons. 1. Number Uses, Classification, and Representation- Numbers can be used for different purposes, and numbers.
Applying Neural Networks Michael J. Watts
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to statistics I Sophia King Rm. P24 HWB
One Function of Two Random Variables
Steps in Development of 2 D Turbine Cascades P M V Subbarao Professor Mechanical Engineering Department A Classical Method Recommended by Schlichting.……
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Data Transformation: Normalization
Lecture Slides Elementary Statistics Eleventh Edition
Applying Neural Networks
Transformations By: Christina Chaidez Math 1351.
Introduction to Functions of Several Variables
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological.
Instance Based Learning
Data Mining K-means Algorithm
4.4: Complex Numbers -Students will be able to identify the real and imaginary parts of complex numbers and perform basic operations.
Copyright © Cengage Learning. All rights reserved.
Machine Learning Basics
© University of Wisconsin, CS559 Fall 2004
Jianping Fan Dept of CS UNC-Charlotte
K Nearest Neighbor Classification
Ryo Yamada(1), Takahisa Kawaguchi(2)
Hidden Markov Models Part 2: Algorithms
Nearest-Neighbor Classifiers
REMOTE SENSING Multispectral Image Classification
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Lecture 2 – Monte Carlo method in finance
SMEM Algorithm for Mixture Models
Goodfellow: Chapter 14 Autoencoders
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Express each number in terms of i.
Chapter 3 Linear Algebra
Dimension reduction : PCA and Clustering
Lecture Slides Elementary Statistics Twelfth Edition
Probabilistic Map Based Localization
Lecture 4 - Monte Carlo improvements via variance reduction techniques: antithetic sampling Antithetic variates: for any one path obtained by a gaussian.
CSCI B609: “Foundations of Data Science”
Functions and Graphing
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
Ajay S. Pillai, Viktor K. Jirsa  Neuron 
8. One Function of Two Random Variables
G1 Parallel and perpendicular lines
Lecture Slides Essentials of Statistics 5th Edition
Berlin Chen Department of Computer Science & Information Engineering
8. One Function of Two Random Variables
Lesson 4: Application to transport distributions
Lecture Slides Essentials of Statistics 5th Edition
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Overview of our studies Ryo Yamada April, 2017

Data set Records are discrete Values {0,1},{0,1,2,…},R. {A,B,C,…} Dimensions No. samples

Not homogeneous values We care “values” heterogeneous

N=1 One value A value set Space Voxel type Point set type Essentially same : Discrete observation of “distribution”

N = 1, Discrete observation of distribution Spectral Decomposition Distribution -> Point Dimension reduction from ∞ -> k Moments Fourrier, Wavelets, Deep learning Partition

N > 1 We care non-homogeneity among N 2^N –(N+1) combinations We separate information into 2 parts Common part Difference part N(N-1)/2 pairs in particular Distance/ divergence cares “Different part” only MDS; dimension reduction to ~ (N-1) dimension N distributions -> N points = 1 sample of N points distribution Go back to Slide 4

Partition Whole can be separated into multiple parts The way of partition mean something from heterogeneity standpoint A value : Integer partition, Real value partition … Space : Subsets Classification, Clustering, Segregation How classify Parts Single part Go to slide 5 A set of parts Go to slide 6

Parts in the whole Subsets are embedded in the whole set Submanifold Subsets/submanifolds themselves are something in lower dimension but recorded with full dimension Treat subsets/submanifolds in their own dimension Curve as a curve When they are parameterized in the original space, go to slide3 (Spectral decomposition) When not, new parameterization is necessary

Subsets/submanifolds Parameterization is complex Simplification/standardization A unit segment, A unit circle, A unit sphere surface, A unit disc,…. Once standardized space with parameterization N = 1 -> go to slide 5 N > 1 -> go to slide 6

Subsets/submanifolds Subsets/submanifolds themselves are something in lower dimension but recorded with full dimension This dimension reduction is “geometric”.

Submanifolds Information geometry treat all-the-all distributions as points in the infinite-dimensional space Particular types of distributions, such as normal distribution is a subset/submanifold of the whole space Submanifolds have expressions parameterized Then, subsets/submanifolds, incl. cell shapes and FACS distributions, corresponds to “distributions parameterized with finite number of paramters??”

Time series analysis Where does time come in??? Time is (usually) a special dimensional axis; independent from others and unidirectional. Therefore, parameterization with time is almost always possible and “manifold-like” complex parameterization does not come in. Points, distributions, subsets/submanifolds in space can be traced along time.

Spaces in general 1-d n-d Simplex : {A,B,C,…}, categorical {0,1} R -> Discrete records: 1-dimensional lattice n-d Discrete records: n-dimensional lattice Simplex : {A,B,C,…}, categorical Network: A substructure of simplex Space structures are in the shape of “graph”, that determines “adjacency/neighbor” and “weight” of edge give information of “distance/diversity”.

STAN group P(theta|x) = P(x|theta)P(theta)/P(x) P(theta|x1,x2) = P(x1,x2|theta)P(theta)/P(x1,x2) P(x1,x2) : joint distribution When P(x1,x2) = P(x1)P(x2) and P(x1,x2|theta) = P(x1|theta)P(x2|theta), P(theta|x1,x2) = P(x1|theta)P(x2|theta)P(theta)/P(x1)P(x2) = P(x1|theta)/P(x1) P(x2|theta)/P(x2) P(theta) No need to care STAN with joint distributions

When P(x1,x2) != P(x1)P(x2) or P(x1,x2|theta) != P(x1|theta)P(x2|theta) STAN cares joint distributions that is the target of slide 5 Model of STAN determines “Joint distribution” Conditioned sampled data records are available and STAN estimates posterior distribution(s) Meta-analysis project Same sample set and multiple observations One joint distribution with multiple projections to different planes. Different sample sets and multiple observations One joint distribution with multiple projections to similar planes in “some-independent” occasions

Decision theory Parameterized stochastic phenomena Each stochastic event depends on the past with heavy memory This should be submanifolds in information theory space but not easily defined without numeric labor The difficulty is similar to “self-avoiding” path simulation. Some stochastic rules are known to automatically generate self- avoiding path… that is related the “curve” … therefore… ????