Benchmarking Interactive Social Networking Actions Shahram Ghandeharizadeh Director of Database Lab Computer Science Department University of Southern.

Slides:



Advertisements
Similar presentations
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Advertisements

Performance Testing - Kanwalpreet Singh.
OULU ADVANCED RESEARCH ON SOFTWARE AND INFORMATION SYSTEMS Teppo Räisänen | Oulu University of Applied Sciences Facebook programming Teppo Räisänen
August 23, 2013 Social Media Audit. Overview  Goals –Evaluate current social networking status –Identify trending topics and social influencers –Provide.
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
IBM SPSS Solutions A SELECT INTERNATIONAL COMPANY.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Information Retrieval in Practice
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
“Dueling Databases: Which is best?” [1] Group 20: SeungHwan Chung Pronay Mukherjee April 20th, 2011 (last modified in July 16th, 2011 by TA)
What’s the Difference? Groups or Pages?. What are Groups and Pages? Facebook Groups are pages that you create within the Facebook.
Business Intelligence
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
NIST BIG DATA WG Reference Architecture Subgroup Meeting Agenda Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
Overview of Search Engines
Big Data A big step towards innovation, competition and productivity.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Efficient BI Solution Presented by: Leo Khaskin, PowerCubes Lab Value of Information as Business Asset.
Tyson Condie.
[what is big data?]: “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
A Survey on Social Network Search Ranking. Web vs. Social Networks WebSocial Network Publishing Place documents on server Post contents on social network.
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 Big Data 5.3 The Database Approach 5.4 Database Management Systems 5.5.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Page 1 GADD Software & GADD Analytics 1.5 Public version, January 2015, gaddsoftware.com GADD Analytics.
Assessing the Suitability of UML for Modeling Software Architectures Nenad Medvidovic Computer Science Department University of Southern California Los.
material assembled from the web pages at
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Arben Asllani University of Tennessee at Chattanooga Business Analytics with Management Science Models and Methods Chapter 1 Business Analytics with Management.
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Business Plug-In B18 Business Intelligence.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Server to Server Communication Redis as an enabler Orion Free
User Interfaces 4 BTECH: IT WIKI PAGE:
Facilitating Document Annotation using Content and Querying Value.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence)
NIST BIG DATA WG Reference Architecture Subgroup Draft Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) August.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
Big Data – Big Opportunity Mohammad Khansari ITRC President Jan 2015 ITRC, Tehran, Iran.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
IoT Meets Big Data Standardization Considerations
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS DISCOVER | PLAN | EXECUTE JANUARY 14, 2016.
Machine Learning. Definition Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
What is the Big Data Challenge? Organizations are seeking solutions that combine the real-time analytics capabilities of SAP HANA and accessibility to.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Lecture 1: Overview of CSCI 485 Shahram Ghandeharizadeh Associate Professor Computer Science Department University of Southern California Presented by:
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
SAS users meeting in Halifax
Search Engine Architecture
1&1 Internet AG: Optimizing Debt Management
How to Operationalize Big Data Security Analytics
Big Data Young Lee BUS 550.
GOOGLE + Google+ (pronounced Google plus) is a Google social networking project. It lunched in June 2011 and there are more than 212 million active users.
Big DATA.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Presentation transcript:

Benchmarking Interactive Social Networking Actions Shahram Ghandeharizadeh Director of Database Lab Computer Science Department University of Southern California

Outline Motivation   Research questions Survey use cases BG Benchmark FORSEE Future research

Motivation Data Stores Cloud Services Person-to-person cloud services

Research Questions What is the tradeoff between alternative data models?   E.g., Is JSON superior to the relational data model? How do alternative architectures compare with one another?   E.g., Is cache augmented SQL as good as a document/extensible store? Do NewSQL data stores scale as well as NoSQL data stores?

Survey Use Case S. Barahmand and S. Ghandeharizadeh. BG: A Benchmark to Evaluate Interactive Social Networking Actions. CIDR ‘13, Asilomar, CA.

Data Model Accounts Friend Members Pages Follow Resources Own Share News Feed Displays Own d

BG Architecture Scalable Emulates User Behavior Service Level Agreement Quick and Efficient Rating Visualization Tool S. Barahmand and S. Ghandeharizadeh. Expedited Benchmarking of Social Networking Actions with Agile Data Loading Techniques. CIKM ‘13, SF, CA.

Good Benchmark = FORSEE Focus on an important debate & provide relevant metrics to facilitate progress. One number to describe alternative designs/solution. Runs in a reasonable amount of time. Scalable. Effective abstraction with meaningful requests. Extendible.

Good Benchmark = FORSEE F One number to describe alternative designs/solution. Runs in a reasonable amount of time. Scalable. Effective abstraction with meaningful requests. Extendible. + Unpredictable data

Good Benchmark = FORSEE F O Runs in a reasonable amount of time. Scalable. Effective abstraction with meaningful requests. Extendible. + Unpredictable data SoAR

Good Benchmark = FORSEE F O R Scalable. Effective abstraction with meaningful requests. Extendible. + Unpredictable data SoAR 4 months to rate =1 Week to rate =

Good Benchmark = FORSEE F O R S Effective abstraction with meaningful requests. Extendible. + Unpredictable data SoAR 4 months to rate =1 Week to rate =

Good Benchmark = FORSEE F O R S E Extendible. + Unpredictable data SoAR 4 months to rate =1 Week to rate = Only when two members are NOT friends!

Good Benchmark = FORSEE FORSEEFORSEE + Unpredictable data SoAR 4 months to rate =1 Week to rate = Only when two members are NOT friends! FORSEE = PREDICT

Good Benchmark = FORSEE FORSEEFORSEE + Unpredictable data SoAR 4 months to rate =1 Week to rate = Only when two members are NOT friends! A good benchmark helps settle debates quickly to enable its discipline to make rapid progress.

Future Research: Data Sciences Challenge: Wide variety of science applications with diverse debates. Hypothesis: A benchmark generator. Benchmark Generator ER diagram Actions & their dependencies Key Metrics Application (data science) Specific Benchmark

Future Reseach Evaluate the hypothesis using BG. Extend to other data science applications. Benchmark Generator Unpredictable data

Big Data: Operations SimpleComplex Off-line Interactive Ad-hoc Pre-specified

Big Data: Google Analytics SimpleComplex Off-line Interactive Ad-hoc Pre-specified 1.Gather click stream data: Optimized for writes, 2.Compute aggregated data: MapReduce/Hadoop Objective: 1.Advertising ROI 2.Frequency of access to pages

Big Data: Google Analytics SimpleComplex Off-line Interactive Ad-hoc Pre-specified 1.Gather click stream data: Optimized for writes, 2.Compute aggregated data: MapReduce/Hadoop 3.Enable users to view aggregated data. Objective: 1.Advertising ROI 2.Frequency of access to pages

Big Data: Facebook Show profile page of Farah Fawcett Follow Barak Obama Friend Lady Gaga SimpleComplex Off-line Interactive Ad-hoc Pre-specified

3 Vs: Facebook High Volume:   1.2 billion user profiles, 150 billion friend connections, 1.13 trillion likes, 17 billion tagged locations, 240 billion photos, …. High Velocity:   700 million active users daily, 4.5 billion likes daily, 350 million photos uploaded daily, … High Variety:   Mix of data types: Structured records, multimedia content, text. Source: posted on Oct 6,

Expertise/Contributions BG Benchmark to evaluate performance of alternative data stores: SQL, NoSQL, NewSQL. A high performance CASQL solution that minimizes software development life cycle. KOSAR, a prototype of a CASQL solution. SimpleComplex Off-line Interactive Ad-hoc Pre-specified

BG, Joint work with Sumita Barahmand Benchmark for interactive social networking actions. Consists of 11 actions:

CASQL Joint work with Jason Yap. Key insight: Query result look up is faster than query processing. Contribution is physical data independence in CASQL systems:   Transparent caching   Serial schedules   Detection of race conditions and prevention of inconsistent states.

KOSAR Joint work with Reihane Boghrati, Lakshmy Mohanan and Neeraj Narang. A software prototype of CASQL   Scalable   Highly available   Elastic Boosts performance of a leading industrial strength RDBMS vendor from 2 actions per second to more than 300,000 actions per second.

BG Coordinator Delta Analyzer BGClient 2 BGClient N BGClient 1 Experiment Load Agile Data Loading Techniques Experiment … Data Store Server …