Pipe Engineering
IdoFriedman.yml Name: Ido Friedman, Past:[SQL Server consultant, Instructor, Team Leader] Present: [Data engineer, Architect] Technologies: [Elasticsearch,CouchBase,MongoDB,Python,SQL …] WorkPlace: Perion WhenNotWorking: @Sea
Lambda Architecture Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.
Lambda Example
Lambda Example
Data processing Batch Micro batch Streaming
Tools of the trade Processing Source/Targets HDFS SQL Server Amazon S3 MongoDB Azure EventHub ELK Stack Azure blob storage AWS Kinesis
All very nice BUT… Lots of systems = Lots of issues + Lots of data movements Lots of knowable required
ETL vs Streaming ETL Streaming Row by Row (not always) Shifting data window Supports many data structures in many cases Streaming Batch processing Known data window Known data structure Known and expected behavior patterns Small number of data platforms in one process
StreamSets http://www.streamsets.com Performance Management for Data Flows Not an ETL tool Open Source Many connectors and integration Integration to Kafka / Hadoop at cluster mode No coding is required – But is fully supported VERY simple deployment http://www.streamsets.com
Piping Challenges Data Enrichment Windowing Performance Excepted results Data visibility Flexibility Scaling Monitoring
StreamSets SDC
SDC Cluster and Streaming mode SDC runs as an application within Spark Streaming SDC runs as an application on top of MapReduce
What are we doing with SDC Error and application Log analysis (ELK = SDC+K) Clickstream Ad Hoc needs
What are we connecting Amazon S3 JDBC Consumer ElasticSearch RabbitMQ AWS Kinesis SQL Server Amazon S3 JDBC Consumer ElasticSearch RabbitMQ File Tail Redis
Some numbers 100+M events per day including record level operations On a 2CPU 8GB machine (Centos 7)
Recommended reading` https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102