Pipelines concept

In DataBrew, pipelines serve as the infrastructure to connect two data sources for data streaming purposes. However, their functionality extends beyond simple linkage. Pipelines can incorporate processors—intermediate components that operate on each data frame as it is processed. These processors are critical for transforming, filtering, or otherwise manipulating data as it flows from the source to the destination, ensuring that the streamed data aligns with the desired specifications and use cases.

Pipeline states

Each pipeline within DataBrew can be assigned one of the following statuses, which provide insight into its current operational state:

  • Created: This status indicates that the pipeline has been successfully created within the system but has not yet been initiated. It remains in this state until it is launched for the first time.

  • Starting: At this stage, the pipeline is in the process of being deployed across DataBrew’s compute network. It is scheduled to commence operations as swiftly as possible following deployment. It is important to note that the pipeline will continue to display a ‘Starting’ status for a brief period post-deployment. This additional time ensures that all pipeline components are correctly loaded and operational, paving the way for a seamless data streaming initiation.

  • Started: When a pipeline attains a ‘Started’ status, it signifies that it is functioning as intended and actively streaming data. This status is an indication that your data is currently in the process of being transmitted.

  • Failed Stop: This status occurs when the pipeline encounters an internal error or is unable to establish a connection with one of the specified connectors. Additionally, a pipeline may enter this state if there is an issue with one of the processors – for example, if an OpenAI key has expired, preventing DataBrew from processing the data as specified. It is crucial to consult the pipeline logs upon encountering a ‘Failed Stop’ status to diagnose and understand the nature of the issue.

  • Stopped: This indicates that the pipeline has been either manually or automatically halted, and as a result, data replication has ceased.

It’s essential for developers to monitor these statuses closely to manage their pipelines effectively, troubleshoot any issues that arise, and ensure the uninterrupted flow of data streaming processes.

Auto stopped pipelines

In the majority of scenarios, DataBrew does not automatically terminate user pipelines. We recognize the critical nature of ensuring your data is delivered promptly and uninterrupted. However, under certain circumstances, we may be compelled to halt pipeline operations. These exceptional situations are typically driven by factors beyond our control, such as security concerns, compliance requirements, or significant operational issues that impact the integrity of the DataBrew platform and the services we provide. Our commitment is to maintain transparent communication with our users during such events, providing timely updates and guidance on how to address and resolve any issues that necessitate the stopping of a pipeline.

If you exceed your tier’s bandwidth or if your subscription expires due to non-payment, DataBrew will automatically stop your pipeline. To prevent service disruption, monitor your usage and subscription status closely. Resolving these issues by upgrading your plan or renewing your subscription will allow for pipeline restart. DataBrew offers management tools and alerts to help avoid interruptions.