Introduction

Running Blink is as easy as running a single command. In this article, we will walk you through the steps to run your first Blink instance.

To simplify the process - we are not going to use any real database or data source. Instead, we will use Playground connector provided by Blink. The Playground connector is a simple in-memory connector that can be used for testing purposes. It generates random data so you can check how blink works without setting up any real data source.

Prerequisites

Make sure you have Blink instance installed or Docker image pulled. If you haven’t done that yet, please refer to the Prerequisites section.

Blink configs are written in YAML format. The configuration file is used to define the data sources, transformations, and destinations and many more. The configuration file is passed to the Blink instance when starting it.

Here is an example of a simple Blink configuration file we are going to use:

service:
  pipeline_id: 1
source:
  driver: playground
  config:
    data_type: market
    publish_interval: 1
    historical_batch: false
  stream_schema:
    - stream: market
      columns:
        - name: company
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false
        - name: currency
          nativeConnectorType: String
          databrewType: String
          nullable: false
          pk: false
processors:
  - driver: sql
    config:
      query: "select * from streams.market where currency = 'USD'"
sink:
  driver: stdout
  config: {}

This config file defines a simple pipeline with a single source, a single processor, and a single sink.

  • The source is the Playground connector that generates random market data. - The processor is a simple SQL processor that filters the data where the currency is USD. So we can filter your data on the fly.
  • The sink is the stdout connector that prints the data to the console.

Configuration tweaks

We can change the configuration file to suit our needs. For example, we can change the query in the processor to filter the data based on a different currency. We can also add more processors to the pipeline to perform more complex transformations.

But we will keep the configuration file simple for now. And change only one param source.config.publish_interval

source.config.publish_interval
number

Publish interval in seconds. Defines how often the source will publish random data. Default is 1 second.

  1. Save the configuration file to a file named blink-config.yaml.
  2. Run the following command to start the Blink instance:
blink start -c blink-config.yaml

You will see a few logs in the console and the Blink instance will start.

2024-05-13 11:49:22 WARN blink: No offset storage URI provided. Offset will not be stored
2024-05-13 11:49:22 WARN Metrics: No influx config has been provided. Fallback to local prometheus metrics
2024-05-13 11:49:22 INFO Metrics: Component has been loaded
2024-05-13 11:49:22 INFO Source: Loaded driver=playground
2024-05-13 11:49:22 INFO Processors: Loaded driver=sql
2024-05-13 11:49:22 INFO Sinks: Loaded driver=stdout

First few lines indicate configuration warnings saysing that no offset storage URI provided and no InfluxDB configuration provided. These are optional configurations and can be ignored for now.

A few seconds later - you will see the logs showing the data we filtered in the processor. It may take some tine, becase we are displaying only data with USD currency. If you don’t want to wait - simply leave processors section empty.

processors: []
2024-05-13 11:55:07 INFO Stream: Messages stat messages_received=40 messages_sent=3 messages_dropped_or_filtered=37
2024-05-13 11:55:10 INFO [sink]: stdout: {
  "company": "Doyle, Doyle and Doyle",
  "currency": "USD"
}
2024-05-13 11:55:17 INFO Stream: Messages stat messages_received=50 messages_sent=4 messages_dropped_or_filtered=46
2024-05-13 11:55:27 INFO Stream: Messages stat messages_received=60 messages_sent=4 messages_dropped_or_filtered=56

Run with Docker

If you are using Docker, you can run the Blink instance with the following command:

docker run -v ./blink-config.yaml:/app/blink.yaml usedatabrew/blink start

The rest will be the same as running with the installed Blink instance.