Content What is Apache Spark? History Why Apache Spark Apache Spark Structure and Architecture Spark APIs Demo What is Apache Spark? Apache Spark is a general purpose platform for quickly processing large scale data that is developed in Scala programming language. A framework for distributed computing In-memory, fault tolerant data
INTRODUCTION ‣ Apache Kafka® is a distributed streaming platform. A streaming platform has three key capabilities 1. Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. 2. Store streams of records in a fault-tolerant durable way. 3.