Confluent's Field CTO shares Insights on Future of Streaming Analytics and Data Management

"Data freshness isn't just a luxury; it's a necessity. Confluent's platform accelerates data ingestion, ensuring businesses stay ahead in the race for real-time insights." - Kai

Aanchal Ghatak

10 May 2024 17:42 IST

New Update

We recently sat down with Kai Waehner, Confluent's Field CTO, at the Kafka Summit Bangalore, where a series of groundbreaking innovations were unveiled. We delved into their platform's intricacies and its profound impact on streaming analytics. From ensuring robust data flow to fostering collaboration and security, our discussion uncovers Confluent's vision for the future of data management. The conversation highlights its commitment to reliable data flow, collaboration, and security. Set against the backdrop of the summit's technological advancements, it provides valuable insights into the company's forward-looking approach.

Advertisment

Here are excerpts from the interview:

Excerpts from an interview:

How does Confluent ensure the Schema Mapping and data flow between Kafka and Iceberg table is reliable, especially at large scale?

Advertisment

Kai Waehner: Yes, indeed. So, Confluent provides its own schema registry, which is a fundamental component of our platform within the data governance suite. Here, you define different structures. The crucial aspect lies in merging the Kafka topic, containing the data, with the schema. Essentially, through a click of a button or via specific business logic implementation, this amalgamation seamlessly translates into the table format of Iceberg. This is an implementation detail that the end user need not concern themselves with; it's out of the box functionality. Confluent's architecture is inherently built for scale, catering to varying needs, from small basic clusters to enterprise-grade setups. Our cloud offerings provide scalability guarantees through SLAs. For instance, our enterprise clusters can handle hundreds of megabytes per second, sufficient for most customers' data needs.

How does integrating streaming data with analytics table impact data freshness? Can queries be run on the latest data within a certain SLA?

Kai Waehner: This is a core value proposition we offer. Traditionally, data pipelines to warehouses or data lakes incurred time delays, even if minimal. However, in today's fast-paced landscape, even a few minutes can render data outdated. Confluent's aim is to accelerate data ingestion into data lakes, leveraging Kafka's low-latency capabilities. We can process data end-to-end within milliseconds, meeting critical workload SLAs. While data lakes are evolving towards near real-time ingestion, our platform ensures compatibility with their capabilities. Hence, data freshness is significantly improved, a key reason why more customers are adopting Confluent Cloud for data ingestion. Whether through connectors like those to Snowflake or via Iceberg integration, the process is streamlined, providing both low-latency and fresh data.

Advertisment

How mature is the Tableflow functionality currently?

Kai Waehner: Tableflow is currently in its early access phase, having been announced at Kafka Summit London a few weeks ago. We're actively working with initial customers to gather feedback and address any issues. Typically, after the early access phase, it takes a few months before a feature becomes generally available and production-ready.

Does Tableflow create any data silos that impact cross-team collaboration, and how does it promote data sharing?

Advertisment

Kai Waehner: Quite the opposite, actually. Tableflow aims to eliminate data silos, facilitating seamless data sharing across teams. The idea is to make data easily accessible from various sources, be it logs, mobile apps, or databases, by ingesting it into Kafka topics or table formats. This accessibility fosters collaboration and enables teams to leverage their preferred technologies for consumption, be it real-time or batch processing. The ultimate goal is to empower users to choose the tools that best suit their needs while ensuring data consistency and availability across the platform.

Any plans for future enhancements to Tableflow with new integrations with other data platforms?

Kai Waehner: Tableflow, being built on Apache Iceberg, aligns with the emerging standard for table formats. Our roadmap includes bidirectional capabilities, enabling data movement between platforms like Snowflake and Kafka. The focus is on storing data once in an object store, independent of its source or destination. This interoperability ensures seamless integration with other frameworks and solutions that support Apache Iceberg, fostering a unified data ecosystem.

Advertisment

Do you have any certificates or compliance regarding security?

Kai Waehner: Security is paramount for us, and we collaborate closely with major cloud providers to ensure compliance with industry standards. Our cloud platform integrates seamlessly with the security features offered by providers like Amazon, Google, and Microsoft. We adhere to best practices such as end-to-end encryption, private networking, and key management, ensuring data remains secure throughout its lifecycle. Additionally, our presence in cloud marketplaces simplifies procurement and aligns with customers' existing security and payment models.

What industries and use cases are garnering the most interest for Tableflow, and who are the early adopters?

Advertisment

Kai Waehner: Tableflow addresses a wide range of use cases, with a primary focus on unifying operational and analytical workloads. This convergence is particularly appealing to industries that require both transactional and analytical capabilities within a single platform. While we can't disclose specific early adopters publicly, our solution resonates with both large enterprises and digital natives. From financial services to retail, the versatility of Tableflow appeals to organizations seeking to streamline data processing and analysis.

How do you plan to educate more companies about the convergence of streaming analytics, and do you partner with any vendors to build awareness?

Kai Waehner: Education remains a significant focus for us, given the paradigm shift associated with data streaming. We invest in developer programs, workshops, and webinars to impart knowledge about our platform and best practices. Additionally, we collaborate with software vendors and system integrators to reach a broader audience. Whether through joint events, industry partnerships, or solution integrations, our goal is to raise awareness and empower organizations to leverage streaming analytics effectively.

Advertisment

What type of benchmark testing has Confluent conducted to validate Tableflow, and what throughput and latency can customers expect?

Kai Waehner: While we don't have public benchmarks available, our platform's performance is rooted in established technologies like Confluent Cloud and object stores such as Amazon S3. Latency concerns are minimal, and we've received no significant issues from customers or prospects. However, we'll explore internal testing data to provide more insights on throughput and latency benchmarks.

Is there anything else you'd like to add?

Kai Waehner: I'd like to emphasize our commitment to the Indian market, which we see as a significant growth opportunity. Whether through partnerships, investments in employee resources, or tailored solutions for specific industries, we're dedicated to supporting India's evolving data ecosystem. Our hybrid approach, catering to both cloud-native and on-premises deployments, reflects our strategy to meet diverse customer needs. We're excited about the prospects in this region and look forward to continued collaboration and innovation.

apache-kafka Data Streaming Flink Apache Flink confluent

Advertisment