Download presentation
Presentation is loading. Please wait.
1
Open Source on .NET A real world use case
2
Where it all began Analysing huge datasets Apache Spark (HDInsight)
Various formats (CSV, JSON, XML) Row-based formats are generally slow We needed a columnar format Apache Parquet
3
Why row-based formats can be difficult
Column 2 Row 1 Row 2 Row 3 Read all data
4
Read only needed subset
Columnar formats Column 2 Column 1 Column 2 Column 3 Read only needed subset
5
Parquet Format Row Group 1 Column Chunk Row Group 2
6
Column Chunk Fixed data type (int, string, etc.) Logical compression
Run-Length Encoding Dictionary compression Bit packing etc. Bold compression (None, GZIP, Snappy) Statistics! Min value Max value Number of unique values Number of nulls skip unwanted data
7
How we used to do it Expensive Slow Unsuitable
Too much development effort Requires understanding parquet internals Slow Deployment effort (even with Miniconda) + fastparquet
8
The Dream Came True Wouldn’t be nice to run it on .NET
Developed expressive language Great tooling Works everywhere! No heavy third-party dependencies (Apache Thrift.Core) No native dependencies (Google Snappy)
9
It’s on GitHub! Took 3 month and 3 people (evenings and weekends)
More than 10 contributors now and growing Used by our big name clients Used by other companies Iterations take from hours to 1-2 days Completely open! In dialog to include in the main Apache Repo
10
Use Cases
11
Demo Parquet.Net Core Spark + Scala
12
Azure Data Lake Analytics
Custom Outputter Custom Extractor Parquet Files
13
Demo Create Parquet File with ADLA
14
Parquet Viewer for Windows 10
Using Parquet.Net for .NET Standard 1.4 UWP is extremely fast comparing to “modern” UI framework UWP perfectly fits CPU heavy workloads Easy distribution model via Store Works on any Windows Device Showcase
15
Demo Parquet Viewer
16
Works on Xbox One
17
Future Plans DataFrames Open data science library built on top of Parquet.Net with Panda-like structures and distributed computing. Data Science Studio Open platform for Data preparation Data analysis Etc. Runs on Desktop(UWP), Azure Service Fabric, Kubernetes.
18
Why OSS is Important Quality Customisability Freedom Flexibility
Interoperability Support options Cost Try before you buy Quality – handful devs vs thousands of devs Customisability – businesses can tweak to their needs Freedom – no vendor (creator) lock-in Flexibility – you have a say in how resource intensive the app should be Interoperability – OSS is much better at adhering to open stanards than proprietary is Support Options – generally free, excellent documentation, forums, etc. Cost – get it for a fraction of a price Try before you buy – nothing to pay, see if you can adjust it
19
Why there is not much OSS in .NET
.NET was traditionally closed source .NET was Windows Only Visual Studio was the only true IDE Other tech was more attractive to academic community Licensing blocker to use in data centers
20
Config.Net The easiest configuration framework for .NET developers
21
Storage.Net Storage abstractions with implementations for .NET/.NET Standard
22
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.