PostgreSQL to NATS Streaming
Describes the way to stream changes (CDC) from Postgres to NATS using DataBrew Cloud and Open Source Blink
Introduction
As we all know - Postgres is eating the world of databases. It stands out like a Swiss army knife of databases. So, more and more developers adopt PostgreSQL in their projects to store the data. But as it always happens when projects grow - the need to stream the changes from the database to other services arises. This is where DataBrew Cloud and Open Source Blink come in.
Why would you do that?
Streaming data is not a silver bullet, but it still has a lot of use cases. Here are some of them:
- Building event-driven architecture
- Real-time analytics
- Sharing data with external systems
What are the benefits?
Data streaming from Postgres also called CDC (Change-Data-Capture) is a process of reading changes from a WAL file directly, instead of querying your data which may cause a significant load on the database.
It also allows you to be sure your consumer may be offline for a while and still get all the changes when it comes back online.
Requirements
Postgres setup
First, let’s ensure you have your database ready for CDC. Let’s check your WAL_LEVEL:
If the result is not logical
you should change it to logical
:
WAL_LEVEL
param represents the way your database will work with WAL.
We want to have it set to logical
as it makes the database write changes to a WAL file in a way that we can read it later.
NATS setup
Make sure you have nats.io server running. You can use the official docker image:
You should see logs like this:
If you are going to use DataBrew Cloud - you must ensure your Postgres and NATS are accessible from the internet. You can use services like ngrok to expose your local services to the internet. Or you can deploy them in the cloud.
Start with DataBrew Cloud
First, you need to create a new account in DataBrew Cloud or log into an existing one.
Then you need to create a new pipeline. You can do this by clicking on the “New Pipeline” button in the top right corner.
Add Postgres source
First, we must configure our PostgreSQL database as a source for the pipeline. Click on the “Add Connector” button and select “Postgres-CDC” from the list.
Create new Postgres-CDC Connector
Then you need to fill in the connection details for your Postgres database. You need to provide the following information:
Postgres-CDC Connector settings
When you fill out all the info - Press “Check Connection” to ensure the connection is working.
You will later be asked to provide the table you want to stream the changes from. Simply select the one needed to proceed.
Add NATS sink
To create a full pipeline you need to add a sink for the data. In our case, it will be NATS.
Click on the “Add Connector” button and select “NATS” from the list. The flow is relatively the same as with Postgres-CDC connector. You need to provide the connection details for your NATS server.
Create NATS Connector destination
Provide the connection details and press “Check Connection” to ensure the connection is working.
NATS Connector settings
Creating the pipeline
Once you have both connectors configured, you can press the “Create Pipeline” button to create the pipeline.
Select the previously created Postgres-CDC Connection as a source and NATS connector as a destination. It our case the connection name is “Taxi rides”, as we are going to stream the changes from the “taxi_rides” table.
Select Postgres as pipeline source
Select NATS as pipeline destination
Now is the time to save and deploy our pipeline. Press the “Save pipeline” button. We are not going to add any processors to our data flow just yet.
Save new pipeline
After you store the pipeline and press the “Deploy” button - you will see the logs of the pipeline execution.
Please keep in mind that the first pipeline deployment may take a few seconds.
Within a few seconds, you will see the logs from the pipeline execution. If everything is correct - you will see logs like this:
Start with Open Source Blink
Blink is an Open-Source project from DataBrew that allows you to stream data from various sources to various destinations.
In this section, we will cover how to start with Blink and stream data from Postgres to NATS.
Assuming you already have all Postgres and NATS setup - let’s start with Blink.
Download and install Blink
You can read more about the installation here - Installing Blink.
Create a new pipeline
Comparing to the DataBrew Cloud - Blink is a CLI tool. You can create a new pipeline by defining the pipeline configuration in a YAML file.
Here is an example of the pipeline configuration for our particular use case: Store the file with the name blink.yaml
Start the pipeline
If you have Blink installed locally, you can start the pipeline by running the following command:
You should see the following output:
The logs above display the data that is being streamed from the snapshot of existing data in the Postgres table.
Your logs may be slightly different as you may have different data in your Postgres table.
Check the data in NATS
The last step we can do is to check the data in NATS. You can use the NATS CLI tool to check the data in the subject.
If you did everything correctly, you should see the following logs:
Was this page helpful?