DR. ATABAK KH
Cloud Platform Modernization Architect specializing in transforming legacy systems into reliable, observable, and cost-efficient Cloud platforms.
Certified: Google Professional Cloud Architect, AWS Solutions Architect, MapR Cluster Administrator
How to design a multi-service sync pipeline from cloud to on-premises (or between systems) using hexagonal architecture, Cloud Workflows, and event-driven orchestration.
Context: Keeping cloud data in sync with on-premises or legacy systems is a common integration problem. This post focuses on solution architecture and design; two entry points, one event shape, hexagonal structure per service, and tag-based deploys; so the pattern applies to any sync use case.
In many organizations, data lives in the cloud (e.g. a data warehouse/lakehouse ) while operational or legacy systems run elsewhere; on-premises or in another cloud. Keeping them in sync demands reliability, clear ownership of each step, and the ability to trigger syncs both on a schedule and on demand (e.g. from a query or API).
We set out to build a pipeline that:
This post focuses on the architecture and design we used, and lessons that apply to any sync pipeline.
Conceptually, the system looks like this:
Scheduler path
A scheduler calls an HTTP service periodically. The service reads “pending” items from a queue table in the source (e.g. BigQuery), publishes one event per item to Pub/Sub, and updates queue status. The trigger does not call the target system directly; it only enqueues work.
On-demand path
An external caller (e.g. remote function, API, or another job) calls the same HTTP service with payload data. The service publishes the same kind of event to Pub/Sub and optionally writes the payload into a canonical table so both paths converge in one place.
After publish
From here on, both paths are identical: Pub/Sub -> Eventarc -> Cloud Workflows -> a fixed sequence of Cloud Run Jobs (e.g. data sync stage 1, stage 2, optional delay, reconciliation). The workflow reports status back to the trigger (e.g. SyncStarted, SyncEnded, SyncError) so you can track runs.
So we have two ways in, but one event shape and one orchestration pipeline. That keeps the design simple and makes it easy to add the on-demand path later without duplicating logic.
Each service (trigger and each sync/reconciliation job) follows hexagonal (ports and adapters) structure:
Benefits:
The shared core (event schemas and common types) lives in a small library used by the trigger and the jobs, so payloads stay consistent and producer and consumers don’t drift.
We avoided a single monolith that does “read queue -> sync stage 1 -> sync stage 2 -> reconcile.” That would be hard to scale, hard to retry per step, and would mix concerns. Instead:
Each Cloud Run Job is a separate deployable: it receives the same encoded event as a CLI argument, decodes it, and does one thing (e.g. “sync data from source to target for this entity”). The workflow is the only place that knows the order and any fixed delays; the jobs are stateless and single-purpose.
What this gives you:
A key design choice: one event schema for both trigger paths. The payload might look like:
event_type: e.g. "created", "updated"payload: entity id and whatever fields the sync stages need (ids, names, types, etc.).The trigger (scheduler or on-demand) is responsible for building this payload. All jobs receive the same encoded blob. Each job decodes it and uses only what it needs. So you don’t have “scheduler events” vs “on-demand events”; you have “sync events,” and the pipeline doesn’t care who produced them.
Production deployment is tag-driven: pushing a release tag (e.g. v1.0.0) triggers the prod pipeline. The pipeline:
sync-trigger, sync-stage-1, sync-stage-2, reconciliation, infrastructure) have file changes.So you avoid “deploy everything on every run” and avoid deploying an older build by mistake: the tag defines the release, and the diff defines the scope. Pipeline and shared code are excluded from the deploy list so that changes only in docs or shared libraries don’t trigger full redeploys.
One of the jobs is reconciliation: it compares source and target (e.g. row counts or checksums for the same entity) and fails if they differ, so the workflow records SyncError. We kept this as a dedicated job rather than embedding it into a sync job because:
One thing we’d refine: a fixed sleep in the workflow (e.g. between two stages) is there when the target system needs time to propagate state before the next step. It’s explicit and safe, but it’s a fixed delay. If we were to revisit it, we’d consider a “poll until ready” step with a timeout instead of a blind sleep, so that fast runs don’t wait unnecessarily; while still keeping the design simple.
To build a multi-service, event-driven sync pipeline (cloud to on-premises or between systems) a simple solution would be as follows:
If you’re building a sync pipeline that must support both scheduled and ad-hoc triggers, this pattern; single event schema, single orchestration, ports and adapters per service; is a solid base to build on.
This is a personal blog. The views, thoughts, and opinions expressed here are my own and do not represent, reflect, or constitute the views, policies, or positions of any employer, university, client, or organization I am associated with or have been associated with.