What is Spark ?

Spark is a fast, scalable,general purpose engine for large scale data processing.

Spark comes in multiple flavours :

Why Spark ?

Spark Context

Spark RDD (Resilient Distributed Dataset)

RDD are fundamental unit of data in Spark. Most of the processing in Spark is done on RDDs. RDD are immutable which allows : Consistency,Concurrency,Easy & deterministic recreation.

Spark MLlib

Spark Streaming

Spark GraphX

For further details along with code snippets(pyspark) follow the topics listed below: