A Few Projects To Share Javad Azimi May 2015.

Slides:



Advertisements
Similar presentations
Bayesian inference of normal distribution
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Supervised Learning Recap
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Crash Course on Machine Learning
Bayesian Optimization with Experimental Constraints Javad Azimi Advisor: Dr. Xiaoli Fern PhD Proposal Exam April
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lecture 2: Statistical learning primer for biologists
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Chapter 12 – Discriminant Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Oliver Schulte Machine Learning 726
Machine Learning with Spark MLlib
Chapter 7. Classification and Prediction
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
CEE 6410 Water Resources Systems Analysis
Classification of unlabeled data:
Computer vision: models, learning and inference
Analyzing Redistribution Matrix with Wavelet
Personalized Social Image Recommendation
Let’s do a Bayesian analysis
Machine Learning Basics
Watermarking with Side Information
Data Mining Lecture 11.
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
Dirichlet process tutorial
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Akio Utsugi National Institute of Bioscience and Human-technology,
More about Posterior Distributions
Discrete Event Simulation - 4
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
The free-energy principle: a rough guide to the brain? K Friston
Dimensionality Reduction
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Jia-Bin Huang Virginia Tech
Semi-Supervised Learning
Jia-Bin Huang Virginia Tech
Presentation transcript:

A Few Projects To Share Javad Azimi May 2015

Data Clustering Separating the data into similar groups without any supervision My master thesis project Clustering ensemble Aggregate the results of different clustering algorithms into one single result Constraint clustering Clustering based on must-link and cannot-link information Several publications in IJCAI2009, IDEAL 2007, CSICC 2006 and …

Unsupervised Anomaly Detection Summer intern project at Biotronik MSE Test system identified some devices (pacemakers) as safe while there were not (10 out of 20k) More than 2000 tests in each device How can we find those bad devices based on the results of tests Key: Some tests have significant correlation together Implemented in Statistica and Visual Basic A US patent has been submitted

Visual Appearance of Display Ads and Its Effect on Click Through Rate Summer intern at Yahoo! Labs Is it possible to predict the CTR based on creative design? We developed 43 visual features Based on the generated features we are able to: Predict CTR up to 3 times better than weighted sampling method Introduce a set of recommendations to Ads designer in order to optimize their design Support Vector Regression (SVR) 3 papers(WWW2012, KDD 2012 and CIKM 2012 ) and one US patent MATLAB and C++ implementation ( also used LibSVM, CVX and NCut) Log inverse CTR histogram

A few more The sensitivity of CTR to the user entrance time to the website The earlier they come, the higher CTR is likely to get Advertiser(EA and Insurance) email targeting To whom we should send email? Estimating the income based on zip code, browser, OS, and others browsing features Porsche or Hundai? Which one we should place

Keyword Transformation(1) Cold Start listings usually have non- common keywords that is hard to find in the searched queries. Keywords are scraped using Bing Search engine Algorithm: N-gram Extraction N-gram frequency filtration Entity detection POS filtration DSSM filtration

Keyword Transformation(2) Generated NenaKey (alphabetic sorted) 35fcread20bk card memory reader mig50q7csa0x intelligent power toshiba canine Iris melanoma cancer dogs eye boots size 70 mark nason mark nason shoes break 2014 xmas bargain 2014 cheap christmas vacations casinoroulettegame casino game roulette buy www.seatgeek.com show ticket buy seatgeek show ticket 3m gold privacy filters gpfmr13 - notebook privacy filter 3m filter gold privacy 56 harbour breeze low profile ceiling fans breeze ceiling fan harbor a4 hammered and linen brilliant white paper suppliers a4 linen paper white apply for parents private loan for school for kids loans parent student

Bayesian Optimization: Motivating Application This is how an MFC works Nano-structure of anode significantly impact the electricity production. e- e- SEM image of bacteria sp. on Ni nanoparticle enhanced carbon fibers. Fuel (organic matter) O2 bacteria H+ H2O Oxidation products (CO2) Anode Cathode We should optimize anode nano-structure to maximize power by selecting a set of experiment.

Parameters Tuning Suppose you have n different learning algorithms which generates n different prediction (p1,p2,…,pn) for a given input query. The final prediction would be: pf= (a1*p1)+(a2*p2)+…(an*pn) where a1,a2,…an are constant. Challenge: What should be the set of (a1,a2,…an)? Extensive search is not possible since every evaluation will take some times. What is the best way to set a1,a2,…an?

Other Applications Financial Investment Reinforcement Learning Drug test Mechanical Engineering And …

Bayesian Optimization: Steps We have a black box function and we don’t know anything about its distribution We are able to sample the function but it is very expensive We are interested to find the maximizer (minimizer) of the function Assumption: lipschitz continuity Interested to optimize a function Function distribution is not available We are able to sample the function which is very costly Forget about gradient decent approaches, therefore no dense sampling We need to choose our points smartly Lipchitz assumption If the function is not forget about the rest Example

Bayesian Optimization: Big Pictures Current Experiments Posterior Model Select Experiment(s) Run Experiment(s)

Bayesian Optimization: Main Steps Surrogate Function(Response Surface, Posterior Model) Make a posterior over unobserved points based on the prior. Its parameter might be based on the prior. Remember it is a BAYESIAN approach. Acquisition Criteria( Selection Function) Which sample should be selected next.

Surrogate Function Simulates the unknown function distribution based on the prior. Deterministic (Classical Linear Regression,…) There is a deterministic prediction for each point x in the input space. Stochastic (Bayesian regression, Gaussian Process,…) There is a distribution over the prediction for each point x in the input space. (i.e Normal distribution) Example Deterministic: f(x1)=y1, f(x2)=y2 Stochastic: f(x1)=N(y1,0.1) f(x2)=N(y2,5)

Gaussian Process(GP) Gaussian Process is used to build the posterior model The prediction output at any point is a normal random variable Variance is independent from observation y Points with high output expectation Points with high output variance

Selection Criterion Maximum Mean (MM) Selects the points which has the highest output mean Purely exploitative Maximum Upper bound Interval (MUI) Select point with highest 95% upper confidence bound Purely explorative approach Maximum Probability of Improvement (MPI) It computes the probability that the output is more than (1+m) times of the best current observation , m>0. Explorative and Exploitative Maximum Expected of Improvement (MEI) Similar to MPI but parameter free It simply computes the expected amount of improvement after sampling at any point MM MUI MPI MEI

Bayesian Optimization: Results

Questions ja_azimi@yahoo.com