Building a Cloud-Native Data Sync Pipeline: Architecture and Lessons Learned

How to design a multi-service sync pipeline from cloud to on-premises (or between systems) using hexagonal architecture, Cloud Workflows, and event-driven orchestration.

Context: Keeping cloud data in sync with on-premises or legacy systems is a common integration problem. This post focuses on solution architecture and design; two entry points, one event shape, hexagonal structure per service, and tag-based deploys; so the pattern applies to any sync use case.

Why this kind of pipeline?

In many organizations, data lives in the cloud (e.g. a data warehouse/lakehouse ) while operational or legacy systems run elsewhere; on-premises or in another cloud. Keeping them in sync demands reliability, clear ownership of each step, and the ability to trigger syncs both on a schedule and on demand (e.g. from a query or API).

We set out to build a pipeline that:

Syncs data from a cloud source to a target system (e.g. on-premises or another service).
Supports two entry points: a scheduled batch (e.g. every N minutes) and an on-demand path; e.g. invokable via a remote function or HTTP API so upstream processes can trigger a sync with a single call.
Keeps the same downstream behaviour regardless of how the sync was triggered.
Makes each step testable and replaceable without rewriting the whole flow.

This post focuses on the architecture and design we used, and lessons that apply to any sync pipeline.

High-level shape: two triggers, one pipeline

Conceptually, the system looks like this:

Scheduler path
A scheduler calls an HTTP service periodically. The service reads “pending” items from a queue table in the source (e.g. BigQuery), publishes one event per item to Pub/Sub, and updates queue status. The trigger does not call the target system directly; it only enqueues work.
On-demand path
An external caller (e.g. remote function, API, or another job) calls the same HTTP service with payload data. The service publishes the same kind of event to Pub/Sub and optionally writes the payload into a canonical table so both paths converge in one place.
After publish
From here on, both paths are identical: Pub/Sub -> Eventarc -> Cloud Workflows -> a fixed sequence of Cloud Run Jobs (e.g. data sync stage 1, stage 2, optional delay, reconciliation). The workflow reports status back to the trigger (e.g. SyncStarted, SyncEnded, SyncError) so you can track runs.

So we have two ways in, but one event shape and one orchestration pipeline. That keeps the design simple and makes it easy to add the on-demand path later without duplicating logic.

Hexagonal architecture per service

Each service (trigger and each sync/reconciliation job) follows hexagonal (ports and adapters) structure:

Domain: Core models (e.g. sync event types, entity identifiers) and ports (interfaces for “read from queue”, “publish event”, “write to target system”).
Application: Use cases that depend only on ports; e.g. “for each pending queue row, build event, publish, update status.”
Adapters: Concrete implementations; source client (e.g. BigQuery), Pub/Sub publisher, target system client (e.g. DB or API), HTTP server (e.g. FastAPI for the trigger).

Benefits:

Testing: Unit-test use cases with mocks for the source, Pub/Sub, and target. No need to hit real systems for business logic.
Swapping implementations: Changing queue implementation or message broker means implementing the same port in a new adapter, not rewriting the core flow.
Clarity: “Business rules” live in application + domain; “I/O” lives in adapters.

The shared core (event schemas and common types) lives in a small library used by the trigger and the jobs, so payloads stay consistent and producer and consumers don’t drift.

Event-driven orchestration with Cloud Workflows

We avoided a single monolith that does “read queue -> sync stage 1 -> sync stage 2 -> reconcile.” That would be hard to scale, hard to retry per step, and would mix concerns. Instead:

Pub/Sub carries a single event type: e.g. base64-encoded JSON with an entity id and payload.
Eventarc fires when a message is published; it starts one Workflows execution per message.
The workflow is a linear sequence: decode message -> post SyncStarted -> run Job 1 -> run Job 2 -> optional sleep (if target needs propagation time) -> run Job 3 (e.g. reconciliation) -> post SyncEnded (or SyncError on failure).

Each Cloud Run Job is a separate deployable: it receives the same encoded event as a CLI argument, decodes it, and does one thing (e.g. “sync data from source to target for this entity”). The workflow is the only place that knows the order and any fixed delays; the jobs are stateless and single-purpose.

What this gives you:

Visibility: Each run is one workflow execution; you can see exactly which step failed.
Retries and failure handling: The workflow can post SyncError with details and support retries or alerts without baking that into each job.
Scaling: Jobs scale with workload; the trigger only does lightweight “read queue / build event / publish” and stays simple.

One event, many consumers

A key design choice: one event schema for both trigger paths. The payload might look like:

event_type: e.g. "created", "updated"
payload: entity id and whatever fields the sync stages need (ids, names, types, etc.).

The trigger (scheduler or on-demand) is responsible for building this payload. All jobs receive the same encoded blob. Each job decodes it and uses only what it needs. So you don’t have “scheduler events” vs “on-demand events”; you have “sync events,” and the pipeline doesn’t care who produced them.

CI/CD: tag-based prod and “deploy what changed”

Production deployment is tag-driven: pushing a release tag (e.g. v1.0.0) triggers the prod pipeline. The pipeline:

Fetches tags and checks out the tag that was pushed.
Compares that tag to the previous tag (by creation date) and computes which top-level components (e.g. sync-trigger, sync-stage-1, sync-stage-2, reconciliation, infrastructure) have file changes.
Deploys only those components, in a fixed order (e.g. infrastructure first), using the same tag for image versions.

So you avoid “deploy everything on every run” and avoid deploying an older build by mistake: the tag defines the release, and the diff defines the scope. Pipeline and shared code are excluded from the deploy list so that changes only in docs or shared libraries don’t trigger full redeploys.

Reconciliation as a separate step

One of the jobs is reconciliation: it compares source and target (e.g. row counts or checksums for the same entity) and fails if they differ, so the workflow records SyncError. We kept this as a dedicated job rather than embedding it into a sync job because:

Single responsibility: One job “syncs,” the other “verifies.”
Clear failure semantics: A reconciliation failure means “source and target are out of sync,” which is easy to monitor and alert on.
Same event, same pattern: It receives the same event and uses the same ports/adapters style, so it fits the rest of the architecture.

Things we’d do again (and one we’d refine)

Hexagonal structure per service: It paid off for testing and for keeping “how we talk to source/target/Pub/Sub” out of the core logic.
Single event schema and one pipeline after Pub/Sub: Adding the on-demand path was mostly “add an HTTP adapter that builds the same event and publishes”; the workflow and jobs didn’t need to change.
Workflows as the only orchestrator: Having one place that defines the sequence and error handling made reasoning about the system much easier than spreading orchestration across services.
Tag-based prod and diff-based deploy: Reduced risk and made releases predictable.

One thing we’d refine: a fixed sleep in the workflow (e.g. between two stages) is there when the target system needs time to propagate state before the next step. It’s explicit and safe, but it’s a fixed delay. If we were to revisit it, we’d consider a “poll until ready” step with a timeout instead of a blind sleep, so that fast runs don’t wait unnecessarily; while still keeping the design simple.

Summary

To build a multi-service, event-driven sync pipeline (cloud to on-premises or between systems) a simple solution would be as follows:

Two entry points (scheduler and on-demand) that produce the same event type and feed the same workflow.
Hexagonal architecture in each service for clear boundaries and testability.
Cloud Workflows as the single place that defines order, status reporting, and error handling.
Tag-based production releases and diff-based deploy lists so only changed components are deployed.

If you’re building a sync pipeline that must support both scheduled and ad-hoc triggers, this pattern; single event schema, single orchestration, ports and adapters per service; is a solid base to build on.