Open Source on .NET A real world use case.

Slides:



Advertisements
Similar presentations
XProtect ® Express Integration made easy. With support for up to 48 cameras, XProtect Express is easy and affordable IP video surveillance software with.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Data Formats CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Mobile web Sebastian Lopienski IT Technical Forum 29 June 2012.
Machine Learning as a Service
Real-Time Dashboards on Power BI
Windows App Studio Windows App Studio is the tool that makes it fast and easy to build Windows 10 apps. It’s accessible from any device with a browser.
61% YoY Growth.NET Active Developers (VS 2012+) 40%.NET Core downloads by new developers 62% GitHub contributions from outside of Microsoft (corefx.
The Basics of Android App Development Sankarshan Mridha Satadal Sengupta.
A presentation on ElasticSearch
What is it all about? .NET MeetUp in Prague, CZ (2017/7/19)
12/29/2017 2:33 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Introduction to Xamarin C# Everywhere
Data Platform and Analytics Foundational Training
Top 8 Best Programming Languages To Learn
Make your app a native part of Office with Add-ins
MOBILIZE.NET Modernize code to native .NET, web, mobile and Azure
Make Power BI Your Own with the Power BI APIs
DotnetConf 9/10/2018 7:49 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE.
File Format Benchmark - Avro, JSON, ORC, & Parquet
Tulika Chaudharie / Harikharan Krishnaraju
Build interactive data analysis environments using Apache Spark
Microsoft Machine Learning & Data Science Summit
What is Cloud Computing - How cloud computing help your Business?
Open Source distributed document DB for an enterprise
Make Power BI Your Own with the Power BI APIs
Spark Presentation.
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
Apache Cordova Overview
The Transition to Modern Office Add-in Development
Logo here Module 3 Microsoft Azure Web App. Logo here Module Overview Introduction to App Service Overview of Web Apps Hosting Web Applications in Azure.
Did your feature got in, out or planned?
Building Analytics At Scale With USQL and C#
Building Innovative Apps using the Microsoft Developer Platform
Hybrid Mobile Applications
DNN Connect 2017 Microsoft Keynote
1.1. .NET architectural components and .NET Core
A developers guide to Azure SQL Data Warehouse
Prepared by Kimberly Sayre and Jinbo Bi
Power Apps & Flow for Microsoft Dynamics SL
September 11, Ian R Brooks Ph.D.
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
+Vonus: An Intuitive, Cloud-Based Point-of-Sale Solution That’s Powered by Microsoft Office 365 with Tools to Increase Sales Using Social Media OFFICE.
Sviluppo mobile con Visual Studio OnLine
Confidential – Oracle Internal/Restricted/Highly Restricted
Cloud Computing and Cloud Networking
DotnetConf 11/14/2018 3:27 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE.
Microsoft Ignite /14/ :21 AM BRK2101
Microsoft Connect /22/2018 9:50 PM
GIFT / Fiscal Data Package Iteration 3
A developers guide to Azure SQL Data Warehouse
11/23/2018 3:03 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
MIX 09 11/23/2018 6:07 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Microsoft Connect /1/2018 2:36 AM
NAV In The Cloud: Exploring Options for a Cloud-based Deployment
Modern cloud PaaS for mobile apps, web sites, API's and business logic apps
Overview of big data tools
Data analytics with Hadoop In the Microsoft Azure cloud
Azure Data Lake for First Time Swimmers
Moving from Studio to Atelier
.NET Micro Framework Salvador Ramirez Program Manager.
Pablo Castro Software Architect Microsoft Corporation
Serverless Architecture in the Cloud
Building and running HPC apps in Windows Azure
5 Azure Services Every .NET Developer Needs to Know
Dataverse for citing and sharing research data
HDInsight & Power BI By Łukasz Gołębiewski.
Windows Azure John Stallo Principal Program Manager Lead 2-001
Spark with R Martijn Tennekes
Presentation transcript:

Open Source on .NET A real world use case

Where it all began Analysing huge datasets Apache Spark (HDInsight) Various formats (CSV, JSON, XML) Row-based formats are generally slow We needed a columnar format Apache Parquet

Why row-based formats can be difficult Column 2 Row 1 Row 2 Row 3 Read all data

Read only needed subset Columnar formats Column 2 Column 1 Column 2 Column 3 Read only needed subset

Parquet Format Row Group 1 Column Chunk Row Group 2

Column Chunk Fixed data type (int, string, etc.) Logical compression Run-Length Encoding Dictionary compression Bit packing etc. Bold compression (None, GZIP, Snappy) Statistics! Min value Max value Number of unique values Number of nulls skip unwanted data

How we used to do it Expensive Slow Unsuitable Too much development effort Requires understanding parquet internals Slow Deployment effort (even with Miniconda) + fastparquet

The Dream Came True Wouldn’t be nice to run it on .NET Developed expressive language Great tooling Works everywhere! No heavy third-party dependencies (Apache Thrift.Core) No native dependencies (Google Snappy)

It’s on GitHub! Took 3 month and 3 people (evenings and weekends) More than 10 contributors now and growing Used by our big name clients Used by other companies Iterations take from hours to 1-2 days Completely open! In dialog to include in the main Apache Repo

Use Cases

Demo Parquet.Net Core Spark + Scala

Azure Data Lake Analytics Custom Outputter Custom Extractor Parquet Files

Demo Create Parquet File with ADLA

Parquet Viewer for Windows 10 Using Parquet.Net for .NET Standard 1.4 UWP is extremely fast comparing to “modern” UI framework UWP perfectly fits CPU heavy workloads Easy distribution model via Store Works on any Windows Device Showcase

Demo Parquet Viewer

Works on Xbox One

Future Plans DataFrames Open data science library built on top of Parquet.Net with Panda-like structures and distributed computing. Data Science Studio Open platform for Data preparation Data analysis Etc. Runs on Desktop(UWP), Azure Service Fabric, Kubernetes.

Why OSS is Important Quality Customisability Freedom Flexibility Interoperability Support options Cost Try before you buy Quality – handful devs vs thousands of devs Customisability – businesses can tweak to their needs Freedom – no vendor (creator) lock-in Flexibility – you have a say in how resource intensive the app should be Interoperability – OSS is much better at adhering to open stanards than proprietary is Support Options – generally free, excellent documentation, forums, etc. Cost – get it for a fraction of a price Try before you buy – nothing to pay, see if you can adjust it

Why there is not much OSS in .NET .NET was traditionally closed source .NET was Windows Only Visual Studio was the only true IDE Other tech was more attractive to academic community Licensing blocker to use in data centers

Config.Net The easiest configuration framework for .NET developers

Storage.Net Storage abstractions with implementations for .NET/.NET Standard

Thank you