Steam schema
Let’s see how Blink uses stream schema to specify the data structure.
Introduction
Stream Schema is a way to specify the data structure of the stream. It is used to define the schema of the data that will be processed by the Blink service. Stream schema is defined in the configuration file of the Blink service.
It’s important to undestand that defining the correct schema is crucial. Blink uses schema to efficiently parse the data and pass it to the processors.
Schema stages
-
Schema definition: The schema is defined in the configuration file of the Blink service. The schema is defined in the
source
section of the configuration file. -
Schema evolution: The schema defined for the source may evolve as you add more processors that can remove some columns, add new columns, or change the data type of the columns. Blink supports schema evolution and will automatically adjust the schema as needed.
-
Schema persistence: The schema is evolved and the output data format is known. the final version of the schema will be used to encode your data on the last stage. Blink may use the schema to generate create table statements for the destination if needed.
Schema definition
The schema is defined in the source
section of the configuration file. Here is an example of how to define the schema in the configuration file (this is partial config file. It doesn’t contain all the properties):
Schema properties
Let’s take a look at the properties of the schema. Schema contains an array of streams. Each stream represents a table in the source database or a topic in the source message broker.
For example for AirTable
source each stream represents a table in the AirTable database.
Name of the stream. You can refference this stream in the processors.
Column object represents a property of the stream. It may be mapped to database column or a field in the message.
Was this page helpful?