Steam schema

Introduction

Stream Schema is a way to specify the data structure of the stream. It is used to define the schema of the data that will be processed by the Blink service. Stream schema is defined in the configuration file of the Blink service.

It’s important to undestand that defining the correct schema is crucial. Blink uses schema to efficiently parse the data and pass it to the processors.

Schema stages

Schema definition: The schema is defined in the configuration file of the Blink service. The schema is defined in the source section of the configuration file.
Schema evolution: The schema defined for the source may evolve as you add more processors that can remove some columns, add new columns, or change the data type of the columns. Blink supports schema evolution and will automatically adjust the schema as needed.
Schema persistence: The schema is evolved and the output data format is known. the final version of the schema will be used to encode your data on the last stage. Blink may use the schema to generate create table statements for the destination if needed.

Schema definition

The schema is defined in the source section of the configuration file. Here is an example of how to define the schema in the configuration file (this is partial config file. It doesn’t contain all the properties):

source:
  driver: playground
  config:
    data_type: market
    publish_interval: 1
    historical_batch: false
  stream_schema:
    - stream: market
      columns:
        - name: company
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false
        - name: currency
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false

Schema properties

Let’s take a look at the properties of the schema. Schema contains an array of streams. Each stream represents a table in the source database or a topic in the source message broker.

For example for AirTable source each stream represents a table in the AirTable database.

source.stream_schema.stream

string

Name of the stream. You can refference this stream in the processors.

source.stream_schema.columns[0]

Column object

Column object represents a property of the stream. It may be mapped to database column or a field in the message.

Show properties

name

string

Name of the column

boolean

If the column is a primary key. This property is required to understand the uniqueness of the row. Primary key is also required for database sink types. As Blink must know about the primary key to perform delete/update operations. Without the primary key, Blink will only perform insert operations.

nullable

boolean

If the column can be null. This property is used to understand if the column can be null or not.

nativeConnectorType

string

The native data type of the column in the source database or third-party service. Deprecated and will be removed soon

databrewType

string

DataBrew type of the column. This property specifies how Blink will treat the column. More about available types in the DataBrew Types) section.

Get started

Configuration

Stream schema

Introduction

Schema stages

Schema definition

Schema properties

Get started

Configuration

Stream schema

​Introduction

​Schema stages

​Schema definition

​Schema properties

Introduction

Schema stages

Schema definition

Schema properties