Apache Spark™ is a unified analytics engine for large-scale data processing. Join us on this 2 day training course to understand and apply the fundamentals of Spark.
Why Spark? Because a new paradigm in computer use is spreading fast – instead of programming computers we now also train computers.
Spark is rated as one of the most sought after and highest paid technology skills in the world. Kick off your Spark powered Data Engineering and Data Science career today!
Course Overview
Big Data
What is Big Data?
History of Big Data
Big Data challenges
Batch vs. Stream processing
Machine Learning
History
How to make a brain
Unsupervised learning
Supervised learning
Reinforcement learning
Introduction to Spark
History
Architecture
Spark development environments
Component overview
Deployment modes
Spark SQL and Dataframes
Basics of Spark SQL vs. Dataframes / Datasets
Reading data
Writing data
Joins
Functions
Hive integration
Spark performance
Spark UI
Jobs, stages, tasks
DAG’s and Lazy execution
Caching
Partitions and Shuffling
Catalyst optimiser
Caching guidelines
Spark Streaming
Introduction
Basic operation – Sources and Sinks
Windows and Aggregations
Checkpoints and Watermarks
Kafka and Spark
Machine Learning with Spark
Feature engineering
Pipelines
Bucketing, Normalisation, StringIndexer, Imputation
Clustering
Regression
Classification
Prerequisites
Knowledge of SQL
Basic programming experience in an object-oriented or functional language is highly recommended but not required.
The class is taught using Python / Scala, but no specific knowledge of either is required.
Lab requirements
Please bring your own laptop / desktop with Chrome or Firefox Web Browser.
Internet Explorer and Safari are not supported.
Duration
2 days
Location
Johannesburg / Pretoria
Contact us for Cape Town or other location requirements
Cost
R12 000 ex Vat.
Next Public Course
19 & 20 July 2018
Midrand
We also do customer specific on-premises training for 5 or more candidates
Bookings
Contact us today to secure your booking.