Introduction

This document provides a detailed guide on setting up the PostgreSQL target connector for the DataBrew project. The PostgreSQL connector allows for efficient and reliable storage of processed data, ensuring that your data pipeline’s output is securely persisted in your PostgreSQL database.

Connector types

When using DataBrew you may face two similar connectors: Postgres-CDC and Postgres

  • Postgres-CDC is used when you are going to stream your data by allowing DataBrew to read WAL. Generally this approach is more reliable since DataBrew will relay on PostgreSQL internals to ensure your data will be streamed as soon as possible with no delays
  • Postgres is used when you are not able to enable logical replication mode on your database, therefore you are not able to use Postgres-CDC plugin. It’s fine, you just have to keep in mind that your data may be delayed. DataBrew syncs batches every 5 seconds

Requirements

Before setting up the PostgreSQL target connector, ensure you meet the following requirements:

  • Access to your PostgreSQL database.
  • Necessary permissions to write data into the PostgreSQL tables.
  • Understanding of the data schema and any constraints within your PostgreSQL database.

Preparing Your PostgreSQL Database

To ensure smooth integration:

  • Permissions: Verify that the DataBrew project’s user account has write access to the desired tables.
  • Schema Design: DataBrew will automatically create the tables needed on a target database. Please, make sure that user, given to DataBrew has the permission to create tables in schema

Cloud Setup

This section guides you through setting up the PostgreSQL target connector in the cloud for the DataBrew project.

Setting up in Cloud

  1. Access DataBrew Cloud Platform: Navigate to DataBrew Cloud App.
  2. Create a New Target Connector Instance: Follow these steps…
    • Step 1: Choose ‘PostgreSQL’ from the list of available target connectors.
    • Step 2: Provide the necessary connection details, including your database host, database name, user, and password.
    • Step 3: Configure the connector by specifying the target tables and any necessary column mappings or transformations.
    • (Include screenshots or code snippets if necessary)

Incremental sync limitations

When you are not able to use WAL mode sync - you must know what DataBrew doesn’t support incremental sync for tables with no primary keys. In order to make it work - the table must have incremental primary key which will be used to sync data and make sure the ordering is consistent.