Streaming ETL

Column Concatenation

Organizations will often want to take data from an operational system and drive analytics on it, typically through a data warehouse or data lake. You can use Apache Kafka® and KSQL to move this from being a high-latency batch process to a low-latency streaming one.

By performing ETL on data as it arrives, the business benefits from more timely insights into their data, and making the enriched and cleansed data available for other event-driven applications.

The need to have a single view of your data records can involve the need to see customer interactions in a single topic. This includes combining your transactional data from the data lake with the purchase history data from the data warehouse, or seeing what products have been shipped to your customer, combined with their browsing history. These questions are easily answered, and the answer can be updated in real time as events change with a KSQL-combined topic.

This example shows how you can derive data from an inbound stream using KSQL to concatenate the two columns together.

Directions

The source data has two fields: FIRST_NAME and LAST_NAME.

ksql> SELECT ID, FIRST_NAME, LAST_NAME FROM CUSTOMERS;
2 | Auberon | Sulland

Using KSQL, you can join one or more fields together using the + operator:

ksql> SELECT ID, FIRST_NAME, LAST_NAME, FIRST_NAME + ' ' + LAST_NAME AS FULL_NAME FROM CUSTOMERS;
2 | Auberon | Sulland | Auberon Sulland
< Back to the Stream Processing Cookbook

Nous utilisons des cookies afin de comprendre comment vous utilisez notre site et améliorer votre expérience. Cliquez ici pour en apprendre davantage ou pour modifier vos paramètres de cookies. En poursuivant la navigation, vous consentez à ce que nous utilisions des cookies.