A Few Projects To Share Javad Azimi May 2015
Data Clustering Separating the data into similar groups without any supervision My master thesis project Clustering ensemble Aggregate the results of different clustering algorithms into one single result Constraint clustering Clustering based on must-link and cannot-link information Several publications in IJCAI2009, IDEAL 2007, CSICC 2006 and …
Unsupervised Anomaly Detection Summer intern project at Biotronik MSE Test system identified some devices (pacemakers) as safe while there were not (10 out of 20k) More than 2000 tests in each device How can we find those bad devices based on the results of tests Key: Some tests have significant correlation together Implemented in Statistica and Visual Basic A US patent has been submitted
Visual Appearance of Display Ads and Its Effect on Click Through Rate Summer intern at Yahoo! Labs Is it possible to predict the CTR based on creative design? We developed 43 visual features Based on the generated features we are able to: Predict CTR up to 3 times better than weighted sampling method Introduce a set of recommendations to Ads designer in order to optimize their design Support Vector Regression (SVR) 3 papers(WWW2012, KDD 2012 and CIKM 2012 ) and one US patent MATLAB and C++ implementation ( also used LibSVM, CVX and NCut) Log inverse CTR histogram
A few more The sensitivity of CTR to the user entrance time to the website The earlier they come, the higher CTR is likely to get Advertiser(EA and Insurance) email targeting To whom we should send email? Estimating the income based on zip code, browser, OS, and others browsing features Porsche or Hundai? Which one we should place
Keyword Transformation(1) Cold Start listings usually have non- common keywords that is hard to find in the searched queries. Keywords are scraped using Bing Search engine Algorithm: N-gram Extraction N-gram frequency filtration Entity detection POS filtration DSSM filtration
Keyword Transformation(2) Generated NenaKey (alphabetic sorted) 35fcread20bk card memory reader mig50q7csa0x intelligent power toshiba canine Iris melanoma cancer dogs eye boots size 70 mark nason mark nason shoes break 2014 xmas bargain 2014 cheap christmas vacations casinoroulettegame casino game roulette buy www.seatgeek.com show ticket buy seatgeek show ticket 3m gold privacy filters gpfmr13 - notebook privacy filter 3m filter gold privacy 56 harbour breeze low profile ceiling fans breeze ceiling fan harbor a4 hammered and linen brilliant white paper suppliers a4 linen paper white apply for parents private loan for school for kids loans parent student
Bayesian Optimization: Motivating Application This is how an MFC works Nano-structure of anode significantly impact the electricity production. e- e- SEM image of bacteria sp. on Ni nanoparticle enhanced carbon fibers. Fuel (organic matter) O2 bacteria H+ H2O Oxidation products (CO2) Anode Cathode We should optimize anode nano-structure to maximize power by selecting a set of experiment.
Parameters Tuning Suppose you have n different learning algorithms which generates n different prediction (p1,p2,…,pn) for a given input query. The final prediction would be: pf= (a1*p1)+(a2*p2)+…(an*pn) where a1,a2,…an are constant. Challenge: What should be the set of (a1,a2,…an)? Extensive search is not possible since every evaluation will take some times. What is the best way to set a1,a2,…an?
Other Applications Financial Investment Reinforcement Learning Drug test Mechanical Engineering And …
Bayesian Optimization: Steps We have a black box function and we don’t know anything about its distribution We are able to sample the function but it is very expensive We are interested to find the maximizer (minimizer) of the function Assumption: lipschitz continuity Interested to optimize a function Function distribution is not available We are able to sample the function which is very costly Forget about gradient decent approaches, therefore no dense sampling We need to choose our points smartly Lipchitz assumption If the function is not forget about the rest Example
Bayesian Optimization: Big Pictures Current Experiments Posterior Model Select Experiment(s) Run Experiment(s)
Bayesian Optimization: Main Steps Surrogate Function(Response Surface, Posterior Model) Make a posterior over unobserved points based on the prior. Its parameter might be based on the prior. Remember it is a BAYESIAN approach. Acquisition Criteria( Selection Function) Which sample should be selected next.
Surrogate Function Simulates the unknown function distribution based on the prior. Deterministic (Classical Linear Regression,…) There is a deterministic prediction for each point x in the input space. Stochastic (Bayesian regression, Gaussian Process,…) There is a distribution over the prediction for each point x in the input space. (i.e Normal distribution) Example Deterministic: f(x1)=y1, f(x2)=y2 Stochastic: f(x1)=N(y1,0.1) f(x2)=N(y2,5)
Gaussian Process(GP) Gaussian Process is used to build the posterior model The prediction output at any point is a normal random variable Variance is independent from observation y Points with high output expectation Points with high output variance
Selection Criterion Maximum Mean (MM) Selects the points which has the highest output mean Purely exploitative Maximum Upper bound Interval (MUI) Select point with highest 95% upper confidence bound Purely explorative approach Maximum Probability of Improvement (MPI) It computes the probability that the output is more than (1+m) times of the best current observation , m>0. Explorative and Exploitative Maximum Expected of Improvement (MEI) Similar to MPI but parameter free It simply computes the expected amount of improvement after sampling at any point MM MUI MPI MEI
Bayesian Optimization: Results
Questions ja_azimi@yahoo.com