PROTECT | OPTIMIZE | TRANSFORM

Slides:

Advertisements

Similar presentations

Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.

Advertisements

UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.

Hadoop Ecosystem Overview

SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.

Matthew Winter and Ned Shawa

Nov 2006 Google released the paper on BigTable.

Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.

What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.

Microsoft Partner since 2011

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.

Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.

LARGE-SCALE DATA ANALYSIS WITH APACHE SPARK ALEXEY SVYATKOVSKIY.

Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.

Integration of Oracle and Hadoop: hybrid databases affordable at scale

OMOP CDM on Hadoop Reference Architecture

Best IT Training Institute in Hyderabad

Image taken from: slideshare

COURSE DETAILS SPARK ONLINE TRAINING COURSE CONTENT

Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.

Big Data is a Big Deal!.

Integration of Oracle and Hadoop: hybrid databases affordable at scale

About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.

Data Platform and Analytics Foundational Training

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Machine Learning Library for Apache Ignite

Introduction to Spark Streaming for Real Time data analysis

Data Analytics and CERN IT Hadoop Service

Hadoop and Analytics at CERN IT

BigDL Deep Learning Library on HDInsight

Hadoop Tutorials Spark

Hadoop Developer.

Spark Presentation.

CLOUDERA TRAINING For Apache HBase

Getting Data into Hadoop

Big-Data Fundamentals

Data Platform and Analytics Foundational Training

Hadoop Clusters Tess Fulkerson.

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Lab #2 - Create a movies dataset

Microsoft Dumps PDF Cloudera CCA175 Dumps PDF CCA Spark and Hadoop Developer Exam - Performance Based Scenarios RealExamCollection.com.

2018 Valid Cloudera CCA175 Dumps Questions - CCA175 Braindumps Dumpsprofessor.com

Overview of Azure Data Lake Store

Ministry of Higher Education

Introduction to Spark.

Distributed Systems CS

CS110: Discussion about Spark

Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC

Introduction to Apache

Overview of big data tools

Data analytics with Hadoop In the Microsoft Azure cloud

Spark and Scala.

Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC

Distributed Systems CS

Charles Tappert Seidenberg School of CSIS, Pace University

Fast, Interactive, Language-Integrated Cluster Computing

Big-Data Analytics with Azure HDInsight

Oracle 1z0-928 Oracle Cloud Platform Big Data Management 2018 Associate.

CS639: Data Management for Data Science

Presentation transcript:

PROTECT | OPTIMIZE | TRANSFORM COURSE INFORMATION Cloudera’s four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark(including Spark Streaming and Spark SQL),Flume, Kafka, and Sqoop.This training course is the best preparation for the real-world challenges faced by Hadoop developers. Attendees learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools. Through lecture and interactive, hands-on exercises conducted by SoftSource Solutions’ trainers certified by Cloudera, attendees will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, learning topics such as: How data is distributed, stored, and processed in a Hadoop cluster How to write, configure, and deploy Apache Spark applications on a Hadoop Cluster How to use the Spark shell for interactive data analysis How to process and query structured data using Spark SQL How use Spark Streaming to process a live data stream How to use Flume and Kafka to ingest data for Spark Streaming CLOUDERA TRAINING For Spark & Hadoop 1 DEVELOPER Scala and/or Python programming language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful. Prior knowledge of Hadoop is not required. Best for Developers, Engineers and Programmers Follow-On Training After successfully completing this course, we recommend that participants attend Cloudera’s Developer Training for Spark and Hadoop II: Advanced Techniques course, which builds on the foundations taught here. PROTECT | OPTIMIZE | TRANSFORM SoftSource Solutions Pte Ltd http://www.softsource.com.sg | 6746 5355 | enquiry@softsource.com.sg ©2013 SoftSource Solutions Pte Ltd. All rights reserved. SoftSource and the SoftSource logo are trademarks or registered trademarks of SoftSource Solutions Pte Ltd. All other trademarks are the property of their respective companies. Information is subject to change without notice.

PROTECT | OPTIMIZE | TRANSFORM CLOUDERA DEVELOPER TRAINING FOR SPARK & HADOOP 1 COURSE OUTLINE Introduction Introduction to Hadoop and the Hadoop Ecosystem Apache Hadoop Overview Data Storage and Ingest Data Processing Data Analysis and Exploration Other Ecosystem Tools Introduction to the Hands-On Excercises Apache Hadoop File Storage Problems with Traditional Large-Scale Systems HDFS Architecture Using HDFS Apache Hadoop File Formats Data Processing on an Apache Hadoop Cluster YARN Architecture Working with YARN Importing Relational Data with Apache Sqoop Apache Sqoop Overview Importing Data Importing File Options Exporting Data Spark Basics What is Apache Spark? Using the Spark Shell RDDs (Resilient Distributed Datasets) Functional Programming in Spark Working with RDDs Creating RDDs Other General RDD Operations Aggregating Data with Pair RDDs Key-Value Pair RDDs Map-Reduce Other Pair RDD Operations Writing and Running Apache Spark Applications Spark Applications vs. Spark Shell Creating the SparkContext Building a Spark Application (Scala and Java) Running a Spark Aplplication The Spark Application Web UI Configuring Apache Spark Applications Configuring Spark Properties Logging Parallel Processing in Apache Spark Review: Apache Spark on a Cluster RDD Partitions Partitioning of File-Based RDDS HDFS and Data Locality Executing Parallel Operations Stages and Tasks RDD Persistance RDD Lineage RDD Persistence Overview Distributed Persistence Common Patterns in Spark Data Processing Common Apache Spark Use Cases Iterative Algorithms in Apache Spark Machine Learning Example: k-means DataFrames and Spark SQL Apache Spark SQL and the SQL Context Creating DataFrames Transforming and Querying DataFrames DataFrames and RDDs Comparing Apache Spark SQL, Impala, and Hive-on-Spark Apache Spark SQL in Spark 2.x Message Processing with Apache Kafka What is Apache Kafka? Apache Kafka Overview Scaling Apache Kafka Apache Kafka Cluster Architecture Apache Kafka Command Line Tools Capturing Data with Apache Flume What is Apache Flume? Basic Flume Architecture Flume Sources Flume Sinks Flume Channels Flume Configuration Integrating Apache Flume and Apache Kafka Overview Use Cases Configuration Apache Spark Streaming: Introductions to Dstreams Apache Spark Streaming Overview Example: Streaming Request Count Dstreams Developing Streaming Applications Processing Multiple Batches Multi-Batch Operations Time Slicing State Operations Sliding Window Operations Data Sources Streaming Data Source Overview Apache Flume and Apache Kafka Data Sources Example: Using a Kafka Direct Data Source Conclusion PROTECT | OPTIMIZE | TRANSFORM SoftSource Solutions Pte Ltd http://www.softsource.com.sg | 6746 5355 | enquiry@softsource.com.sg ©2013 SoftSource Solutions Pte Ltd. All rights reserved. SoftSource and the SoftSource logo are trademarks or registered trademarks of SoftSource Solutions Pte Ltd. All other trademarks are the property of their respective companies. Information is subject to change without notice.