What Are Kafka Streams and How Are They Implemented

Apache Kafka is a widely used framework for implementing stream processing. It is developed by Apache Software Foundation as an open-source software platform and it is written in Java and Scala. Apache Kafka aims to offer a high-throughput, low-latency, and unified platform for managing real-time data feeds. It can be integrated with external systems using Kafka connect to utilize Kafka streams for Java Stream Processing.

Kafka is the platform that utilizes a binary TCP-based protocol optimized for efficiency and depends on a “message set” to reduce the overhead of the network trips. Kafka allows users to publish data to any number of systems or real-time applications with subscription options and it includes managing passenger and driver matching for companies to extract real-time analytics and predictive maintenance.

Kafka supports two types of topics as regular and compacted. Regular topics are used to configure a retention time or space-bound. Kafka allows deleting the old data to free up storage space by setting the configured topics with retention time. Compacted topics are used to keep the records for future reference and they should not be deleted. Following are the five major APIs of Kafka

  • Producer API that allows an application to publish streams of records.

  • Consumer API allows the application to subscribe to topics and processes the streams of records.

  • Connector API that executes reusable producer and consumer APIs for linking the topics into an existing application.

  • Streams API that converts the input streams into output for producing the results.

  • Admin API that is used to handle Kafka topics, objects, and brokers.

The significant features of Apache Kafka

Kafka Streams is the easy way to write mission-critical and real-time applications and microservices where the input and output data are stored in Kafka clusters. It is used to integrate the development and deployment of standard Java and Scala applications on the client-side with the advantages of Kafka’s server-side cluster technology. Following are the specialized features of Kafka.

  • Highly scalable, elastic, and fault-tolerant.

  • Deploy to containers, virtual machines, bare metal, and cloud.

  • Relatively viable for small, medium, and large app development processes.

  • Complete integrated with Kafka security processes

  • Easy to write standard Java and Scala applications

  • Implemented for Exactly-once processing semantics

  • Kafka does not require separate processing clusters

  • Apache Kafka can be developed on all major OS like Mac, Linux, and Windows.

Use Cases of Apache Kafka

Top MNC firms are using Apache Kafka for various purposes and the following are the popular companies that use Apache Kafka.

  • The New York Times uses Apache Kafka streams to store and distribute real-time published content to the different applications that are available for readers.

  • Pinterest is one of the popular platforms using Kafka to power the real-time and predictive budgeting system for their advertising infrastructure.

  • Zalando is the popular online fashion retailer in Europe and it uses Kafka for transitioning monolithic services to microservices for enabling their technical team to real-time Business Intelligence.

  • Rabobank is a popular banking platform that is powered by Apache Kafka and it is used in financial processes and services to alert their customers about financial events.

  • LINE is using Apache Kafka as a central datahub for communicating to one another through privacy messages for executing business logic, threat detection, and search indexing.

  • Trivago is the global platform for searching hotels and they use Kafka for providing access to broad customers and travelers through various websites and apps.

Key Concepts of Kafka Streaming

Kafka Streaming can be stateless and it responds to the events without the consideration of previous events or states. The important key concepts of Kafka Streaming include topology, keys, time, windows, Kstreams, KTables, domain-specific language (DSL) operations, and SerDes.

Topology – A processor topology consists of one or more graphs of stream processors or nodes connected by edges or streams to perform stream processing.

Time – It is a critical concept of Kafka Streams that are windowing-based time boundaries. There are event time, processing time, and ingestion time in Kafka for recording data.

DSL – Domain-Specific Language – Functional operations offered by built-in streams DSL, Lower-level procedural operations defined by Processor API, and KSQL query language.

KTables – It is used to keep and use a state and they have key values to retain the latest value and it provides a changelog stream by delivering new inserts, updates, and deletes.

DSL Operations – It is used to create a streaming application for complicated topologies using a table to refer to the complete understanding of input and output mappings.

SerDes – It allows record keys and values to materialize to a Kafka topic or state store. It also provides instances and API methods for calling configuration.

The architecture of Kafka Streams

Apache Kafka Streams include Stream Partitioning, Stream Threading, Local State Stores, Fault Tolerance, Scaling out, and Interactive Queries. The implementation of Apache Streams is simple and the procedure is as follows.

Implementation of Kafka Streams

Here is the sample code for implementing Kafka Streams API in the Java application.

import org.apache.kafka.common.serialization.Serdes;

import org.apache.kafka.common.utils.Bytes;

import org.apache.kafka.streams.KafkaStreams;

import org.apache.kafka.streams.StreamsBuilder;

import org.apache.kafka.streams.StreamsConfig;

import org.apache.kafka.streams.kstream.KStream;

import org.apache.kafka.streams.kstream.KTable;

import org.apache.kafka.streams.kstream.Materialized;

import org.apache.kafka.streams.kstream.Produced;

import org.apache.kafka.streams.state.KeyValueStore;

import java.util.Arrays;

import java.util.Properties;

public class WordCountApplication {


   public static void main(final String[] args) throws Exception {

       Properties props = new Properties();

       props.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-application");

       props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");

       props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());

       props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());


       StreamsBuilder builder = new StreamsBuilder();

       KStream<String, String> textLines = builder.stream("TextLinesTopic");

       KTable<String, Long> wordCounts = textLines

           .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))

           .groupBy((key, word) -> word)

           .count(Materialized.<String, Long, KeyValueStore<Bytes, byte[]>>as("counts-store"));

       wordCounts.toStream().to("WordsWithCountsTopic", Produced.with(Serdes.String(), Serdes.Long()));


       KafkaStreams streams = new KafkaStreams(builder.build(), props);

       streams.start();

   }


}


Steps to implement sample code

  • Download the code

  • Start the Kafka Server

  • Prepare input topic and start Kafka producer

  • Start the Wordcount Application

  • Process sample data

  • Teardown the application

Conclusion

The configuration and implementation of Kafka Streaming given here is a sample. If you want to gain an in-depth understanding of Apache Kafka Streaming, enroll in ourJava Training Institute in Chennai. We have top industry experts to train on application development used for real-time data analytics with hands-on exposure on industry projects. We offer a course completion certificate that adds value to your profile along with job support at Softlogic.


Comments