Introduction

Stream Schema is a way to specify the data structure of the stream. It is used to define the schema of the data that will be processed by the Blink service. Stream schema is defined in the configuration file of the Blink service.

It’s important to undestand that defining the correct schema is crucial. Blink uses schema to efficiently parse the data and pass it to the processors.

Schema stages

  1. Schema definition: The schema is defined in the configuration file of the Blink service. The schema is defined in the source section of the configuration file.

  2. Schema evolution: The schema defined for the source may evolve as you add more processors that can remove some columns, add new columns, or change the data type of the columns. Blink supports schema evolution and will automatically adjust the schema as needed.

  3. Schema persistence: The schema is evolved and the output data format is known. the final version of the schema will be used to encode your data on the last stage. Blink may use the schema to generate create table statements for the destination if needed.

Schema definition

The schema is defined in the source section of the configuration file. Here is an example of how to define the schema in the configuration file (this is partial config file. It doesn’t contain all the properties):

source:
  driver: playground
  config:
    data_type: market
    publish_interval: 1
    historical_batch: false
  stream_schema:
    - stream: market
      columns:
        - name: company
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false
        - name: currency
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false

Schema properties

Let’s take a look at the properties of the schema. Schema contains an array of streams. Each stream represents a table in the source database or a topic in the source message broker.

For example for AirTable source each stream represents a table in the AirTable database.

source.stream_schema.stream
string

Name of the stream. You can refference this stream in the processors.

source.stream_schema.columns[0]
Column object

Column object represents a property of the stream. It may be mapped to database column or a field in the message.