PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014.

Slides:



Advertisements
Similar presentations
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Advertisements

A*-tree: A Structure for Storage and Modeling of Uncertain Multidimensional Arrays Presented by: ZHANG Xiaofei March 2, 2011.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Presented by Zeehasham Rasheed
Scalable Text Mining with Sparse Generative Models
Latent Tree Models Part II: Definition and Properties
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Information Retrieval in Practice
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Google News Personalization: Scalable Online Collaborative Filtering
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
Weekly Project Dashboard: Project Name: Name: Qinyun Zhu Date: 5/17/2012 4/20/2012 R Key Accomplishments for this Reporting Period Read the AI book Chapter.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Bayesian Networks Optimization of the Human-Computer Interaction process in a Big Data Scenario Candidate: Emanuele Charalambis University of Modena and.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Prediction of the Foreign Exchange Market Using Classifying Neural Network Doug Moll Chad Zeman.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
PhD Dissertation Defense Scaling Up Machine Learning Algorithms to Handle Big Data BY KHALIFEH ALJADDA ADVISOR: PROFESSOR JOHN A. MILLER DEC-2014 Computer.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
NTU & MSRA Ming-Feng Tsai
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Data Mining with Big Data IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014 Xiangyu Cai ( )
Introduction on Graphic Models
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Daphne Koller Introduction Motivation and Overview Probabilistic Graphical Models.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CHAPTER 16: Graphical Models
Sentiment analysis algorithms and applications: A survey
Efficient Estimation of Word Representation in Vector Space
Learning Markov Blankets
MapReduce.
Luger: Artificial Intelligence, 5th edition
Pairwise Markov Networks
Introduction to XYZ using hierarchical models
Learning Probabilistic Graphical Models Overview Learning Problems.
Introduction to XYZ using hierarchical models
Conditional Random Fields
Introduction to XYZ using hierarchical models
A Review of Researches on Deep Learning in Remote Sensing Application
Presentation transcript:

PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014

Agenda Introduction Motivation Model Structure Progressive Learning Use Cases –Automate MS Annotation (Multi-label Classification) –Latent Semantic Discovery Conclusion IEEE Big Data 2014

Introduction Probabilistic graphical models (PGM) consist of a structural model and a set of conditional probabilities. Graphical models can be classified into two major categories: –(1) directed graphical models (Bayesian networks) – (2) undirected graphical models (Markov Random Fields) IEEE Big Data 2014

Motivation MS1 MS2 MS ,979,334 Frag1Frag 2.. GOG1 GOG2 … MS1 MS * 2,979,334 = 3,873,134, * 2,979,334 = 3,873,134,200 MS3 IEEE Big Data 2014

Model Structure GOG1 GOG2 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 MS1 MS2 MS3 P(GOG1 | F1,F3,F7) = P(GOG1|F1) * P(GOG1|F3) * P(F3|F7)) = 50/50 * 20/60 * 10/25 IEEE Big Data 2014

Progressive Learning This learning technique is very attractive in the big data age for the following reasons: – Training the model does not require processing all data upfront. –It can easily learn from new data without the need to re-include the previous training data in the learning. –The training session can be distributed instead of doing it in one long-running session. IEEE Big Data 2014

Automate MS Annotation (Multi-label Classification) Data Set Includes: ItemCount Scan1974 Peak Edges10743 Root450 MS2 Fragment Node5983 MS3 Fragment Node201 IEEE Big Data 2014

Results IEEE Big Data 2014

Results IEEE Big Data 2014

Results

Latent Semantic discovery Java Developer.NET Developer Nurse Health Care Java J2EE C# Care giver RN Senior Home P(Java,J2EE| Java Developer) = P(Java|Java Developer) * P(J2EE|Java Developer) = 5/7 * 10/10 P(Java,C#|Java Dev,.NET Dev) = P(Java|Java Dev)*P(Java|.NET Dev) * P(C#|Java Dev) * P(C#|.NET Dev) IEEE Big Data 2014

Results IEEE Big Data 2014

Conclusion we propose an efficient and scalable probabilistic graphical model for massive hierarchical data (PGMHD). we successfully applied PGMHD to the bioinformatics domain to automatically classify and annotate high-throughput mass spectrometry data. we successfully applied this model to large-scale latent semantic discovery by using 1.6 billion search log entries provided by CareerBuilder.com within a Hadoop Map/Reduce framework. IEEE Big Data 2014

Questions IEEE Big Data 2014