PLATFORM FOR BIG DATA, NOSQL AND RELATIONAL DATA. WHAT MAKES SENSE FOR ME? (+AZURE)
WHAT IS BIG DATA?
RoadDesignatorDrivingStatus A1Difficulties
Batch ProcessingInteractive AnalysisStream Processing Query runtimeMinutes to hoursMilliseconds to minutesNever-ending Data volumeTBs to PBsGBs to PBsContinuous stream Programming modelMapReduceQueriesDAG UsersDevelopersAnalysts and developersDevelopers Originating projectGoogle MapReduceGoogle DremelTwitter Storm Open source projectHadoop / SparkDrill / Shark / Impala Hbase Storm / Apache S4 /Kafka
How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED ANALYTICS Whats the social sentiment for my brand or products How do I better predict future outcomes? A NEW SET OF QUESTIONS
COMMON BIG DATA CUSTOMER SCENARIOS GAIN COMPETITIVE ADVANTAGE BY MOVING FIRST AND FAST IN YOUR INDUSTRY Web app optimization Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting Natural resource exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure optimization Legal discovery
persistent | distributed In Memory Efficient at Random Reads/Writes Distributed, large scale data store Utilizes Hadoop for persistence Both HBase and Hadoop are distributed
MANAGE ANY DATA, ANY SIZE, ANYWHERE
HADOOP INTEGRATED INTO THE DATA PLATFORM
Distributed Storage (HDFS) Hadoop architecture. Distributed Processing (Map Reduce)
INSIGHTS FOR ALL USERS THROUGH FAMILIAR TOOLS PB TB GB
Orders_federation CREATE FEDERATION fed_name(fed_key_label fed_key_type distribution_type)
Orders_federation Federation Key The key used for data distribution int, bigint, guid, varbinary Atomic Unit Represent a single instance of a federation key. All rows in all federated tables with the same federation key value.
Federated Table Contains only atomic units for members key range Reference Table Non-Federated table
SalesDB Orders_federation Orders_Fed [5000, 10000) ALTER FEDERATION Orders_Fed SPLIT AT (tenant_id=7500) [5000, 7500) & [7500, 10000) Dynamic Partitioning SPLIT members to spread workloads over to more nodes DROP members to shrink back to fewer nodes
SalesDB Orders_federation Orders_Fed [5000, 7500) & [7500, 10000) USE FEDERATION Orders_Fed (tenant_id=7509) Built-in Data-Dependent Routing (DDR) Ensure apps can discover where the data is just-in-time No Shard Map caching Guaranteed member routing
EntityTableAccount contoso Name =… = … Name =… Add= customers Photo ID =… Date =… photos Photo ID =… Date =…
Table Details Insert Update Merge – Partial update Replace – Update entire entity Upsert Delete Query Entity Group Transactions Multiple CUD Operations in a single atomic transaction Create, Query, Delete Tables can have metadata Not an RDBMS! Table Entities
FIRSTLASTBIRTHDATE WadeWegner2/2/1981 NathanTotten3/15/1965 NickHarrisMay 1, 1976 FAV SPORT Canoeing
FIRSTLASTBIRTHDATE WadeWegner2/2/1981 NathanTotten3/15/1965 NickHarrisMay 1, 1976 ?$filter=Last eq Wegner
PARTITIONKEY (CATEGORY) ROWKEY (TITLE) TIMESTAMPMODELYEAR BikesSuper Duper Cycle…2009 Bikes Quick Cycle 200 Deluxe …2007 ………… CanoesWhitewater…2009 CanoesFlatwater…2006 PARTITIONKEY (CATEGORY) ROWKEY (TITLE) TIMESTAMPMODELYEAR Rafts14ft Super Tourer…1999 ………… Skis Fabrikam Back Trackers …2009 ………… TentsSuper Palace…2008 PARTITIONKEY (CATEGORY) ROWKEY (TITLE) TIMESTAMPMODELYEAR BikesSuper Duper Cycle…2009 Bikes Quick Cycle 200 Deluxe …2007 ………… CanoesWhitewater…2009 CanoesFlatwater…2006 Rafts14ft Super Tourer…1999 ………… Skis Fabrikam Back Trackers …2009 ………… TentsSuper Palace…2008
MANAGE ANY DATA, ANY SIZE ANYWHERE SQL Server Database & Parallel Data Warehouse Hadoop on Windows Hadoop on Azure StreamInsight Hadoop Connectors & ETL
Global Physical Infrastructure servers / network / datacenters computestoragenetworking virtual machinesweb sitescloud servicesSQL databasenoSQL databaseblob storageconnectvirtual networktraffic manager Frameworks Services Fabric Infrastructure N Central US, S Central US, N Europe, W Europe, E Asia, SE Asia + 24 Edge CDN Locations Automated Managed Resources Elastic Usage Based