Building BI App on Cloud Rohit Chatter Sr.

Slides:



Advertisements
Similar presentations
Zamano Solutions Mobile Messaging, mPayments, Mobile Advertising.
Advertisements

■ Google’s Ad Distribution Network ■ Primary Benefits of AdWords ■ Online Advertising Stats and Trends ■ Appendix: Basic AdWords Features ■ Introduction.
Driving Adoption Through Patient Awareness.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Search Optimisation Clinic Ewan Swain Search Media Analyst.
SIMS Online advertising Hal Varian. SIMS Online advertising Banner ads (Doubleclick) –Standardized ad shapes with images –Normally not related to content.
Search Engine Optimization (SEO)
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.
Google Online Marketing Challenge (GOMC)
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Apache Spark and the future of big data applications Eric Baldeschwieler.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
Windows Azure SQL Database and Storage Name Title Organization.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials:
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
1 NETE4631 Using Google Web Services and Using Microsoft Cloud Services Lecture Notes #7.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
The Evolution of Big Data Netflix
© 2009 Eyeblaster. All rights reserved Current and Future Integrations Presented by: Geoffrey King ● Sales Engineer ● 3 rd February 2009 Eyeblaster and.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Copyright © 2002 Pearson Education, Inc. Slide 8-1.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Why I LIKE the Facebook Database… Sharon Viente May 2010.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
MalStone:Towards A Benchmark for Analytics on Large Data Clouds Collin Bennett Open Data Group 400 Lathrop Ave Suite 90 River Forest IL Robert L.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
The Why and How of Web Analytics By Eric Poulin Web Analytics: Because Your Intuition is Wrong.
Online Advertising Core Concepts are Identical to traditional advertising: –Building Brand Awareness –Creating Consumer Demand –Informing Consumers of.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
A NoSQL Database - Hive Dania Abed Rabbou.
9-1 Chapter 9 The Internet.
Processing of the WLCG monitoring data using NoSQL J. Andreeva, A. Beche, S. Belov, I. Dzhunov, I. Kadochnikov, E. Karavakis, P. Saiz, J. Schovancova,
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Analytics from 330 million smartphones Sean Byrnes CTO & Co-founder.
Powered by. Sengen US based firm, founded in 1989 Software product, Application development and Consulting 100+ customers across the globe in diversified.
Microsoft adCenter Add-in Beta for Excel The adCenter Add-in Beta for Excel 2007 Helps you choose the right keywords to target the right audience,
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Advertising Opportunities with IAC Search and Media.
1 Ad Testing Across Networks Panel: Compare & Contrast: Ad Program Strategies Search Engine Strategies Conference Chicago, IL - December 4, 2006 Presented.
Oracle OLAP Option Bud Endress Director of Product Management, OLAP.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Internet Marketing Strategies Proposal for Lucas Color Cards.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Chapter 5: Paid Search Marketing
SAS BI ONLINE TRAINING Contact our Support Team : SOFTNSOL India: Skype id : softnsoltrainings id:
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
5/7/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Still a Toddler but growing fast
Native Ads by YeahMobi.
Hadoopla: Microsoft and the Hadoop Ecosystem
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Social Media Marketing Analytics 社群網路行銷分析
Left Click to view the next slide.
Let’s Build a Tabular Model in Azure
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
UNIT 6 RECENT TRENDS.
Presentation transcript:

Building BI App on Cloud Rohit Chatter Sr.

Yahoo is the most Visited Site on the Internet – 600M+ Unique Visitors per Month – Billions of Page Views per Day – Billions of Searches per Month – Billions of s per Month – Terabytes of Data per Day! And we crawl the Web – 100+ Billion Pages – 5+ Trillion Links – Petabytes of data Reading 100 Terabytes could be overwhelming Yahoo! BigData Scale

Types in a search query on Yahoo or affiliate site (aka the Publisher) Passes search query to the ad platform for servable ad listings Manages campaigns, creates ad listings, bids for keywords Ad serving returns relevant & available ads matching the search query Clicks on Ad Shows ads returned by ad serving Yahoo! Search Scale

Daily, Weekly, Monthly & Yearly Daily, Hourly, Weekly, Monthly & Yearly Daily, Weekly, Monthly & Yearly Daily, Hourly, Weekly, Monthly & Yearly Performance, Credit Summary Performance, Budget Headroom, AM performance, competitive analysis Performance, Feature Adoption Competitive analysis, cross sell, upsell, performance Business Model

Business Perfomance monitoring RDBMS Facts Home Grown App Level 1 & 2 analysis Granular aggregates Home Grown App What if analysis and deep dive data analysis Most granular data- event level model Tactical & Operational reporting Improvement & Alignment Excellence & Strategic Hour Glass Model – A Perspective

Functional View Data – 100+ Gigabytes/Day Hadoop Grid + PIG Cloud Hadoop Grid + PIG Cloud Aggregates & Metadata layer App Server – BI layer Data Source Dimension & Fact Utility Computing Build Aggregates Oracle RDBMS BI Aggregates (H,D,W,M) BI Tool/Home Grown What is computed where Metrics Impressions, Revenue, Clicks, Conversions, Quality Score, Top keywords Rollups, Type 2 Dimension, Alerts & Messaging Load balanced web Apache Web Server Derived Metrics – CTR, Depth, RPM, Coverage BI on Cloud [1000ft view]

BI on Cloud – Screen Shots

CUBE on Hadoop?

Oracle ETL/ Aggregation I-CUBE HADOOP MicroStrategy Home Grown Tools ART Tradition APOLLO FEEDS

I-CUBE HADOOP BI Tool Home Grown Tools ART HBASE Aggregation in HIVE Game Changer – Hbase & Schema Hiveserver JDBC/ODBC

How we do? RowKeyDay MetricsWeek MetricsMTD MetricsSCD InfoOffer Stats OrderId-MMYYD 1 D 2 ……..D n W x W x+1 …… W y Imp ClicksName … Htable – Schema Less Use Hbase Incrementor - incrementColumnValue for Weekly & MTD Hive Windowing UDF to generate flattened daily row Carefully choose Rowkey SCD – Comes free Performance – Physical file Hfile by table & Column Family Number Game Size – 360GBFormat – RCFileRows – 14.7 Bilion Mappers – 562Reducers – 436 Elapsed Time <= 30 mins

Hadoop/RD BMS BIG DATA SLA

What users love? – Excel & Pivot

What if I need to Pivot Having few Million Record Or maybe Billion records But “Hang” on a minute? – BIG DATA?

Our Answer – Hadoop Pivot Number Game Size – 360GB Format – RCFile Rows – 14.7 Bilion Mappers – 670 Reducers – 30 Elapsed Time – 251 secs [< 5 mins] Voila – Back to Excel

Questions?

Hadoop HDFS – Hourly Feeds Hadoop HDFS Grid – Daily Feeds & Aggregates Oracle RAC 8 Node 60TB Oracle RAC 8 Node 60TB Oracle ETL Server BI App Server BI Web Server App Server,Grid Launcher Box GRID Based Report Web Server GRID Based Report Web Server Metadata Unified Web BI Portal Web Services Data Access Layer [ ODBC/PL/SQL API] Dimensions HBase Dimensions HBase Facts on HDFS [Rcfile] Other Tools Other Tools TRADITIONALTRADITIONAL GRIDGRID Hive + PIG – Query Engine Sche duler