What Are They Talking About These Days?

Slides:



Advertisements
Similar presentations
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
Advertisements

© InLoox GmbH InLoox Web App product presentation The web client for project management on the Internet.
Spring PowerPoint Basics Why is it a good tool to use? Easier than 3 X 5 Note Cards Organized presentations are “heard” better.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Google Confidential and Proprietary 1 Intro to Docs Google Apps Apps.
Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
1Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 Reporting from Contract.
4 Copyright © 2009, Oracle. All rights reserved. Designing Mappings with the Oracle Data Integration Enterprise Edition License.
David Eastwood Maddox Ford Ltd.
NCIP Hub Webinar Series: Collections April 24, 2014.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
COORENOR COORENOR Web Portal COORENOR Agenda Where we are? (Summarize features of the COORENOR web portal.) Where are we going? (Show how to.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
After FactFinder: The future of data dissemination at Census Bureau December 17,
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. How Can RDF and OWL Coexist with Property Graph Zhe Wu Architect Oracle Spatial and.
Data Visualization with Tableau
GraalVM Scott Lynn Director of Product Strategy, Oracle Linux
Fundamental of Databases
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Building Enterprise Applications Using Visual Studio®
Patrick Desbrow, CIO & VP of Engineering October 29, 2014
Oracle Tools for SQL Server?
Implementing Cloud-based Agile Team Development - Lessons Learned
Develop in the Cloud, Accelerate Software Evolution
Welcome: Hands-On Lab Plug in to the network.
This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your.
Application Workload Performance Validation for EPM Cloud
GraalVM Scott Lynn Director of Product Strategy, Oracle Linux
Building Regression Tests With PeopleSoft Test Framework
Machine Learning Turbo-Charges the Ops Portion of DevOps
This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your.
NCIP Hub Webinar Series: Collections
What's There and What's Coming with BICS & Data Viz
Oracle JavaOne 2017 – Hands-On Labs (HOL) Get Started on Oracle Cloud: Java Apps with Containers and DevOps Plug in to the network Connect via WiFi. Connect.
The importance of being Connected
Mark V. Janikas Marjean Pobuda
Spark Presentation.
Oracle and CERN openlab
HISTORY Of API.
Steering Group Member, Link Digital
Abstract/Session Description:
VMware és KVM környezetek változtatás nélkül a felhőben
PRG 421 Competitive Success-- snaptutorial.com
PRG 421 Education for Service-- snaptutorial.com
PRG 421 Teaching Effectively-- snaptutorial.com
Optimize your research performance using SciVal
Oracle Analytic Views Enhance BI Applications and Simplify Development
FORCES HCM Cloud Overview
How to Cure Those Digital Adoption Blues: Oracle Guided Learning
Your Finance Cloud End User Adoption and Enablement Starts Here
How Social Technologies Connect Learners at Qualcomm
Enhance BI Applications and Simplify Development
Discover what’s new and what’s coming to SharePoint Modern Team sites
Web Site Analytics with Google Analytics
Social Media Marketing Strategy Template
Your code is not just…your code
Node.js Test Automation using Oracle Developer Cloud- Simplified
Managing CPQ Performance Proactively
INSTRUCTOR NOTES/LINKS
JD Edwards Update HCM SIG
Links Launch Outlook Launch Skype Place Skype on Do Not Disturb.
Solution Demonstrations
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Social Media Marketing Strategy Template
Your code is not just…your code
Palestinian Central Bureau of Statistics
Presentation transcript:

What Are They Talking About These Days? Analyzing Topics with Graphs Davide Basilio Bartolini <davide.bartolini@oracle.com> Oracle Labs Zürich Damien Hilloulin <damien.hilloulin@oracle.com> Oracle Labs Zürich

This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your presentation covers material affected by Oracle’s Revenue Recognition Policy To learn more about this policy, e-mail: Revrec-americasiebc_us@oracle.com For internal communication, Safe Harbor Statements are not required. However, there is an applicable disclaimer (Exhibit E) that should be used, found in the Oracle Revenue Recognition Policy for Future Product Communications. Copy and paste this link into a web browser, to find out more information.   http://my.oracle.com/site/fin/gfo/GlobalProcesses/cnt452504.pdf For all external communications such as press release, roadmaps, PowerPoint presentations, Safe Harbor Statements are required. You can refer to the link mentioned above to find out additional information/disclaimers required depending on your audience. Confidential – Oracle Internal/Restricted/Highly Restricted

In a Nutshell What are the topics people are discussing and how do they evolve? time Who are the experts in a given topic? politics

Session Agenda Dataset and Objectives Lightning Primer on Topic Modeling Looking at This as a Graph Tools Demo time

Dataset and Objectives Dataset info Anonymized dump of all user-contributed content on the Stack Exchange network [1] The data contains posts, comments, user profiles, tags, ... In the demo, will look at the politics site from the beginning until September 2016 Objectives Topic modeling Identify broader discussion topics from tags and posts Analyze how topics change over time (topic evolution) and their importance (topic strength) User analysis Find who are the experts in a given topic [1] The data is available at https://archive.org/details/stackexchange

Lightning Primer on Topic Modeling “A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents.” [1] Classic approach (e.g., LDA) quick overview: Collection of documents cat banana …. Model Doc1 100% Doc2 30% 70% Topic 1 Topic 2 …… Topic K cat banana broccoli mouse Parameter: K (Desired # of topics) [1] https://en.wikipedia.org/wiki/Topic_model

Our Dataset has More Structure than a Doc. Collection Stackexchange dump original data in xml format <row Id=“1" AcceptedAnswerId=“42" CreationDate=“…” Body=“…" OwnerUserId=“37" Title=“…" Tags=“t1;t2;…" /> Posts.xml <row Id="1" TagName="electoral" Count="2" /> Tags.xml <row Id="15" DisplayName="..." /> Users.xml

Graph for This Demo User Tag Post hasTag wrote acceptedAnswerOf Preprocessed the data to create the property graph structure from the xml files (won’t focus on this step in this demo) The result is a graph in “EDGE_LIST” format, that we can load into PGX [1] Also add “reverse edges” for Tag <-> Post and Post <-> Post, to highlight community structure [1] https://docs.oracle.com/cd/E56133_01/latest/reference/loader/index.html

Topic Analysis Approach Find communities of vertices in a graph via the “map equation” [1] Metric: minimize entropy in vertex labeling using communities as prefixes Using Relaxmap [2], a parallel version of the algorithm Each community can be interpreted as a topic We find “clear-cut” topics, not a probabilistic mixture Posts / tags are assigned to one and only one topic [1] http://www.mapequation.org/ [2] http://uwescience.github.io/RelaxMap/

Oracle PGX A graph analysis package developed at Oracle Labs Fast and easy application of graph analysis to the dataset Already integrated with Oracle products Big Data Spatial and Graph (BDSG) Spatial and Graph (RDBMS 12.2c) Advanced Analytics (OAA) – upcoming Confidential – Oracle Internal

Property Graph Analysis Strategies Two major approaches Computational graph analytics Iterate the graph and compute properties / stats Graph pattern matching Query the graph to find sub-graphs that match a pattern PGX supports both approaches Green Marl / Java API PGQL And offers an environment to mix them Groovy shell, OL Data Studio (under development) Why Graphs: “Graph analysis is the true killer app for Big Data.” -- Forrester “Forrester estimates that over 25% of enterprises will be using graph databases by 2017.” -- Forrester “Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” -- Gartner “By the end of 2018, 70% of leading organizations will have one or more pilot or proof-of-concept efforts underway utilizing graph databases. ” – Gartner spouse friend friend Notes are Confidential -Oracle Internal

Demo Time! The demo is based on the Oracle Labs Data Studio frontend A notebook-like environment supporting %pgx and %pgql (and others) Not yet released (but it’s “just a frontend”, can do the same analysis with PGX alone) We are planning to release this demo in an upcoming tech preview Contact me [1] or poll the PGX OTN website [2] for updates Demo focuses on the “politics” forum from stackexchange User Post Tag hasTag wrote acceptedAnswerOf [1] <davide.bartolini@oracle.com> [2] http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytics/overview/index.html