Streaming Analytics with Kafka, Spark, and Cassandra

The Kafka-Spark-Cassandra pipeline for processing a firehose of incoming events.

Tools

Kafka is a distributed, partitioned, replicated commit log service. It…
Distributed, fault tolerant, high throughput pub-sub messaging system
Spark is a fast and general processing engine compatible with Hadoop d…
Fast and general engine for large-scale data processing
Partitioning means that Cassandra can distribute your data across mult…
Highly-scalable partitioned row store