Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Source on .NET A real world use case.

Similar presentations


Presentation on theme: "Open Source on .NET A real world use case."— Presentation transcript:

1 Open Source on .NET A real world use case

2 Where it all began Analysing huge datasets Apache Spark (HDInsight)
Various formats (CSV, JSON, XML) Row-based formats are generally slow We needed a columnar format Apache Parquet

3 Why row-based formats can be difficult
Column 2 Row 1 Row 2 Row 3 Read all data

4 Read only needed subset
Columnar formats Column 2 Column 1 Column 2 Column 3 Read only needed subset

5 Parquet Format Row Group 1 Column Chunk Row Group 2

6 Column Chunk Fixed data type (int, string, etc.) Logical compression
Run-Length Encoding Dictionary compression Bit packing etc. Bold compression (None, GZIP, Snappy) Statistics! Min value Max value Number of unique values Number of nulls skip unwanted data

7 How we used to do it Expensive Slow Unsuitable
Too much development effort Requires understanding parquet internals Slow Deployment effort (even with Miniconda) + fastparquet

8 The Dream Came True Wouldn’t be nice to run it on .NET
Developed expressive language Great tooling Works everywhere! No heavy third-party dependencies (Apache Thrift.Core) No native dependencies (Google Snappy)

9 It’s on GitHub! Took 3 month and 3 people (evenings and weekends)
More than 10 contributors now and growing Used by our big name clients Used by other companies Iterations take from hours to 1-2 days Completely open! In dialog to include in the main Apache Repo

10 Use Cases

11 Demo Parquet.Net Core Spark + Scala

12 Azure Data Lake Analytics
Custom Outputter Custom Extractor Parquet Files

13 Demo Create Parquet File with ADLA

14 Parquet Viewer for Windows 10
Using Parquet.Net for .NET Standard 1.4 UWP is extremely fast comparing to “modern” UI framework UWP perfectly fits CPU heavy workloads Easy distribution model via Store Works on any Windows Device Showcase

15 Demo Parquet Viewer

16 Works on Xbox One

17 Future Plans DataFrames Open data science library built on top of Parquet.Net with Panda-like structures and distributed computing. Data Science Studio Open platform for Data preparation Data analysis Etc. Runs on Desktop(UWP), Azure Service Fabric, Kubernetes.

18 Why OSS is Important Quality Customisability Freedom Flexibility
Interoperability Support options Cost Try before you buy Quality – handful devs vs thousands of devs Customisability – businesses can tweak to their needs Freedom – no vendor (creator) lock-in Flexibility – you have a say in how resource intensive the app should be Interoperability – OSS is much better at adhering to open stanards than proprietary is Support Options – generally free, excellent documentation, forums, etc. Cost – get it for a fraction of a price Try before you buy – nothing to pay, see if you can adjust it

19 Why there is not much OSS in .NET
.NET was traditionally closed source .NET was Windows Only Visual Studio was the only true IDE Other tech was more attractive to academic community Licensing blocker to use in data centers

20 Config.Net The easiest configuration framework for .NET developers

21 Storage.Net Storage abstractions with implementations for .NET/.NET Standard

22 Thank you


Download ppt "Open Source on .NET A real world use case."

Similar presentations


Ads by Google