Presentation is loading. Please wait.

Presentation is loading. Please wait.

PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014.

Similar presentations


Presentation on theme: "PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014."— Presentation transcript:

1 PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014

2 Agenda Introduction Motivation Model Structure Progressive Learning Use Cases –Automate MS Annotation (Multi-label Classification) –Latent Semantic Discovery Conclusion IEEE Big Data 2014

3 Introduction Probabilistic graphical models (PGM) consist of a structural model and a set of conditional probabilities. Graphical models can be classified into two major categories: –(1) directed graphical models (Bayesian networks) – (2) undirected graphical models (Markov Random Fields) IEEE Big Data 2014

4 Motivation MS1 MS2 MS3 1300 2,979,334 Frag1Frag 2.. GOG1 GOG2 … MS1 MS2 13000* 2,979,334 = 3,873,134,200 13000* 2,979,334 = 3,873,134,200 MS3 IEEE Big Data 2014

5 Model Structure 50 20 40 50 30 50 10 5 20 15 GOG1 GOG2 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 MS1 MS2 MS3 P(GOG1 | F1,F3,F7) = P(GOG1|F1) * P(GOG1|F3) * P(F3|F7)) = 50/50 * 20/60 * 10/25 IEEE Big Data 2014

6 Progressive Learning This learning technique is very attractive in the big data age for the following reasons: – Training the model does not require processing all data upfront. –It can easily learn from new data without the need to re-include the previous training data in the learning. –The training session can be distributed instead of doing it in one long-running session. IEEE Big Data 2014

7 Automate MS Annotation (Multi-label Classification) Data Set Includes: ItemCount Scan1974 Peak266571 Edges10743 Root450 MS2 Fragment Node5983 MS3 Fragment Node201 IEEE Big Data 2014

8 Results IEEE Big Data 2014

9 Results IEEE Big Data 2014

10 Results

11

12 Latent Semantic discovery Java Developer.NET Developer Nurse Health Care Java J2EE C# Care giver RN Senior Home 5 10 3 50 5050 100 10 15 1 P(Java,J2EE| Java Developer) = P(Java|Java Developer) * P(J2EE|Java Developer) = 5/7 * 10/10 P(Java,C#|Java Dev,.NET Dev) = P(Java|Java Dev)*P(Java|.NET Dev) * P(C#|Java Dev) * P(C#|.NET Dev) IEEE Big Data 2014

13 Results IEEE Big Data 2014

14 Conclusion we propose an efficient and scalable probabilistic graphical model for massive hierarchical data (PGMHD). we successfully applied PGMHD to the bioinformatics domain to automatically classify and annotate high-throughput mass spectrometry data. we successfully applied this model to large-scale latent semantic discovery by using 1.6 billion search log entries provided by CareerBuilder.com within a Hadoop Map/Reduce framework. IEEE Big Data 2014

15 Questions IEEE Big Data 2014


Download ppt "PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems IEEE Big Data 2014."

Similar presentations


Ads by Google