C4 Modelling for Complex Distributed Systems
Applying C4 model diagrams to a real distributed platform — from system context down to code-level detail, with practical examples and living documentation strategies.
Architecture diagrams in most organisations fall into two categories: either someone drew boxes on a whiteboard three years ago and took a photo that now lives in a wiki nobody reads, or every team maintains their own contradictory Visio diagram that’s perpetually six months out of date.
The C4 model, created by Simon Brown, solves this by providing four levels of abstraction — Context, Containers, Components, and Code — each targeting a different audience and answering different questions. After applying C4 to Mesh-Sync, a distributed 3D model processing platform with a NestJS backend, TypeScript orchestration engine, Python worker pool, and half a dozen infrastructure services, I’m convinced it’s the most practical architecture documentation approach for complex systems.
This article walks through all four C4 levels using a real production system, shows how to keep diagrams in sync with code, and demonstrates integration with Architecture Decision Records (ADRs).
The Documentation Problem
Before C4, our architecture documentation looked like this:
- A one-page “system overview” diagram in Confluence that mixed infrastructure (Redis, PostgreSQL) with application components (AuthModule, ModelService) at the same level of abstraction
- Inline ASCII diagrams in README files that nobody updated
- Deployment diagrams that showed AWS services but not application boundaries
- Zero documentation of the internal structure of our most complex service — the pipeline orchestration engine
The problem isn’t that people don’t want to document. It’s that there’s no shared vocabulary for what a “component” means or what level of detail belongs where. C4 provides that vocabulary.
Level 1: System Context — Who Uses What
The System Context diagram is the most zoomed-out view. It shows your system as a single box, surrounded by the users and external systems it interacts with. Non-technical stakeholders should be able to read this.
graph TB
subgraph ext[External Systems]
S3[AWS S3<br/><i>File Storage</i>]
Stripe[Stripe<br/><i>Payments</i>]
Email[SendGrid<br/><i>Email Delivery</i>]
OAuth[Google/GitHub OAuth<br/><i>Authentication</i>]
end
User([3D Artist / Designer])
Admin([Platform Admin])
API([API Consumer<br/><i>Third-party integrations</i>])
User -->|Uploads models,<br/>browses marketplace| MS
Admin -->|Manages users,<br/>monitors pipelines| MS
API -->|REST API calls| MS
MS[Mesh-Sync Platform<br/><i>3D Model Processing<br/>& Marketplace</i>]
MS -->|Stores/retrieves files| S3
MS -->|Processes payments| Stripe
MS -->|Sends notifications| Email
MS -->|Authenticates users| OAuth
style MS fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style User fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style Admin fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style API fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style S3 fill:#1e1e24,stroke:#fbbf24,color:#e4e4e7
style Stripe fill:#1e1e24,stroke:#fbbf24,color:#e4e4e7
style Email fill:#1e1e24,stroke:#fbbf24,color:#e4e4e7
style OAuth fill:#1e1e24,stroke:#fbbf24,color:#e4e4e7
Key decisions visible at this level:
- The platform has three distinct user personas (artists, admins, API consumers) with different interaction patterns
- External system boundaries are explicit — if Stripe goes down, payments are affected but model processing continues
- Authentication is delegated to OAuth providers, not built in-house
This diagram doesn’t show Redis, PostgreSQL, or BullMQ. Those are implementation details. A VP of Engineering or a new team member can look at this and understand what the system does and who it serves in 30 seconds.
Level 2: Container — What Runs Where
The Container diagram zooms into the “Mesh-Sync Platform” box and shows the separately deployable units — applications, databases, message brokers, and file stores. Each container is a process or a data store that communicates over the network.
graph TB
subgraph meshsync[Mesh-Sync Platform]
BE[NestJS Backend<br/><i>REST API, Auth,<br/>Business Logic</i>]
WB[Worker Backend<br/><i>Pipeline Orchestration,<br/>Job Dispatch</i>]
subgraph workers[Python Worker Pool]
W1[Thumbnail Generator<br/><i>Blender + Python</i>]
W2[Semantic Analyzer<br/><i>LLM-powered classification</i>]
W3[Metadata Extractor<br/><i>Format parsing</i>]
W4[Model Discovery<br/><i>Search indexing</i>]
end
PG[(PostgreSQL<br/><i>Primary data store</i>)]
Redis[(Redis<br/><i>Queues, Cache,<br/>Pipeline State</i>)]
MinIO[(MinIO<br/><i>Object Storage<br/>Pipeline Cache</i>)]
ELK[(Elasticsearch<br/><i>Observability,<br/>Search Index</i>)]
end
User([Users]) -->|HTTPS| BE
BE -->|REST Webhooks| WB
BE -->|SQL| PG
WB -->|BullMQ Jobs| Redis
WB -->|Cache R/W| MinIO
WB -->|Events| ELK
Redis -->|Job Dispatch| W1
Redis -->|Job Dispatch| W2
Redis -->|Job Dispatch| W3
Redis -->|Job Dispatch| W4
W1 & W2 & W3 & W4 -->|HMAC Webhooks| WB
style BE fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style WB fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style W1 fill:#1e1e24,stroke:#34d399,color:#e4e4e7
style W2 fill:#1e1e24,stroke:#34d399,color:#e4e4e7
style W3 fill:#1e1e24,stroke:#34d399,color:#e4e4e7
style W4 fill:#1e1e24,stroke:#34d399,color:#e4e4e7
style PG fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style Redis fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style MinIO fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style ELK fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
Key decisions visible at this level:
- Two backend services — the NestJS Backend handles user-facing API/business logic while the Worker Backend handles pipeline orchestration. This separation means orchestration complexity doesn’t leak into the API layer.
- CQRS via webhooks — Workers don’t write to PostgreSQL directly. They report results via HMAC-signed webhooks to the Worker Backend, which decides what to persist. This is ADR-010 in our decision log.
- Redis serves triple duty — message queue (BullMQ), pipeline state cache, and distributed lock store. A pragmatic trade-off: fewer infrastructure components to operate.
- Python workers are stateless — They pull jobs from Redis, process them, send webhooks, and terminate. No local state, no direct DB access. This makes scaling trivial: add more worker replicas.
Level 3: Component — What’s Inside
The Component diagram zooms into a single container to show its major structural building blocks. Let’s look inside the Worker Backend — the orchestration engine — since it’s the most architecturally complex container:
graph TB
subgraph wb[Worker Backend — Pipeline Orchestration Engine]
PO[Pipeline Orchestrator<br/><i>Facade: Pipeline lifecycle</i>]
DR[Dependency Resolver<br/><i>DAG graph builder</i>]
SE[Stage Executor<br/><i>Routes to handler by type</i>]
PV[Pipeline Validator<br/><i>Schema + Semantic checks</i>]
IV[Interpolation Validator<br/><i>Variable resolution safety</i>]
subgraph events[Domain Event System]
DED[Domain Event Dispatcher<br/><i>Mediator pattern</i>]
MSH[Model Status Handler]
TMH[Technical Metadata Handler]
FCH[Folder Completion Handler]
end
subgraph actions[Action Registry]
AR[Action Registry<br/><i>Command pattern</i>]
MA[Model Actions<br/><i>Status updates</i>]
CA[Context Actions<br/><i>State mutations</i>]
end
subgraph infra[Infrastructure Services]
ELK[ELK Event Publisher<br/><i>Observability</i>]
MC[MinIO Cache Manager<br/><i>Result caching</i>]
TM[Timeout Monitor<br/><i>Deadline enforcement</i>]
end
end
API([REST API Endpoints]) --> PO
PO --> DR
PO --> PV
PO --> SE
SE --> AR
SE --> DED
SE --> IV
DED --> MSH & TMH & FCH
MSH -->|Webhook| ExtBE([NestJS Backend])
SE -->|Enqueue| Redis[(Redis / BullMQ)]
ELK -->|Batch publish| ES[(Elasticsearch)]
MC -->|Cache R/W| MinIOStore[(MinIO)]
TM -->|Scan running stages| Redis
style PO fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style DR fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style SE fill:#1e1e24,stroke:#5eead4,color:#e4e4e7
style DED fill:#1e1e24,stroke:#818cf8,color:#e4e4e7
style AR fill:#1e1e24,stroke:#34d399,color:#e4e4e7
style ELK fill:#1e1e24,stroke:#fbbf24,color:#e4e4e7
Component responsibilities:
| Component | Responsibility | Pattern |
|---|---|---|
| PipelineOrchestrator | Entry point facade — receives pipeline start/stop requests, coordinates lifecycle | Facade |
| DependencyResolver | Builds DAG from stage definitions, checks if dependencies are satisfied | Graph analysis |
| StageExecutor | Routes stage execution to the correct handler based on type (worker/internal/parallel/decision) | Strategy |
| PipelineValidator | Three-layer validation: JSON Schema → semantic → interpolation | Chain of Responsibility |
| DomainEventDispatcher | Routes domain events to registered handlers without coupling emitters to consumers | Mediator |
| ActionRegistry | Maps action names to handler implementations for internal stages | Command + Registry |
| ELKEventPublisher | Batched event streaming to Elasticsearch for pipeline observability | Observer + Buffer |
| MinIOCacheManager | Content-addressable caching of stage results to skip redundant computation | Cache-Aside |
| TimeoutMonitor | Background scanner that detects and escalates timed-out stages | Polling Monitor |
This level of detail is useful for developers working on the orchestration engine. It shows which component to modify for a given change, identifies the design patterns in use, and maps data flow through the system.
Level 4: Code — The Implementation Detail
The Code level zooms into a single component to show classes, interfaces, and their relationships. This is the most ephemeral level — it changes with every refactor — so we only create Code diagrams for critical abstractions that need to be well-understood.
Here’s the Domain Event System in detail:
classDiagram
class DomainEvent {
<<interface>>
+type: string
+correlationId: string
+timestamp: Date
+payload: any
}
class DomainEventHandler {
<<interface>>
+handle(event: DomainEvent) Promise~void~
}
class DomainEventDispatcher {
-handlers: Map~string, DomainEventHandler~
+register(eventType: string, handler: DomainEventHandler) void
+dispatch(event: DomainEvent) Promise~any~
}
class ModelStatusUpdateEvent {
+type: "model.status.update_requested"
+modelId: string
+newStatus: string
}
class TechnicalMetadataSaveEvent {
+type: "model.technical_metadata.save_requested"
+modelId: string
+metadata: object
}
class ModelStatusUpdateHandler {
-modelWebhookClient: ModelWebhookClient
+handle(event: ModelStatusUpdateEvent) Promise~void~
}
class TechnicalMetadataSaveHandler {
-modelWebhookClient: ModelWebhookClient
+handle(event: TechnicalMetadataSaveEvent) Promise~void~
}
class FolderCompletionCheckHandler {
-folderService: FolderService
+handle(event: DomainEvent) Promise~void~
}
DomainEvent <|-- ModelStatusUpdateEvent
DomainEvent <|-- TechnicalMetadataSaveEvent
DomainEventHandler <|.. ModelStatusUpdateHandler
DomainEventHandler <|.. TechnicalMetadataSaveHandler
DomainEventHandler <|.. FolderCompletionCheckHandler
DomainEventDispatcher --> DomainEventHandler : routes to
DomainEventDispatcher --> DomainEvent : dispatches
Why this component gets a Code diagram:
The Domain Event System is a critical integration point — it’s how the orchestration engine communicates state changes to the NestJS backend without direct coupling. New developers need to understand the registration pattern, the one-handler-per-event-type constraint, and the fact that handlers are the only place where webhooks are sent. This diagram makes that structure explicit.
Integrating C4 with Architecture Decision Records
C4 diagrams answer “what does the system look like?” ADRs answer “why does it look that way?” Linking them creates architecture documentation that’s both visual and rationale-rich.
We embed ADR references directly in our C4 descriptions:
| C4 Element | ADR | Decision |
|---|---|---|
| Workers → Webhooks → Worker Backend | ADR-010 | CQRS via webhooks — workers never write to the database directly |
| Pipeline definitions in YAML | ADR-007 | Event-driven worker architecture with declarative pipeline models |
| Domain Event Dispatcher | ADR-012 | Mediator pattern for domain events — single handler per event type, no fan-out |
| BullMQ over custom queue | ADR-003 | Use BullMQ for job queuing — mature, Redis-backed, supports priorities and rate limiting |
When someone reads the Container diagram and wonders “why do workers send webhooks instead of writing to PostgreSQL directly?”, they follow the ADR-010 link and find the context, options considered, and rationale. The diagram shows what; the ADR explains why.
ADR Format We Use
# ADR-010: CQRS Webhook Architecture
## Status
Accepted — 2025-08-15
## Context
Workers process jobs asynchronously. They need to report results
back to the system. Options considered:
1. Direct database writes from workers (shared database)
2. Message queue events consumed by backend
3. HTTP webhook callbacks to backend API
## Decision
Option 3 — Workers send HMAC-signed HTTP webhooks to the
Worker Backend, which handles persistence and side effects.
## Consequences
- Workers have zero knowledge of the database schema
- Backend controls all write operations (single writer principle)
- Workers can be implemented in any language
- Added latency from HTTP round-trip (acceptable: <50ms)
- Requires webhook signature verification (HMAC-SHA256)
Keeping C4 Diagrams Alive
The biggest risk with any architecture documentation is drift. Here’s our strategy for keeping C4 diagrams in sync with reality:
1. Diagrams Live in Code
All C4 diagrams are Mermaid blocks inside Markdown files in the repository — alongside the code they describe. Not in Confluence, not in a shared drive, not in a Structurizr cloud instance. When you change the code, you see the diagram in the same PR.
2. ADR-Triggered Updates
Every new ADR that affects system structure triggers a C4 update as part of the same PR. The PR template includes a checkbox: “Does this change affect the C4 model? If yes, update the relevant diagram.”
3. Quarterly Architecture Review
Every quarter, lead engineers walk through the C4 diagrams in a 1-hour session. We project the Context and Container diagrams and ask: “Does this still match reality?” The Component and Code diagrams are reviewed by the team that owns the container.
4. Level-Appropriate Detail
We deliberately keep Level 1 (Context) and Level 2 (Container) very stable — these change only when we add/remove external integrations or deploy new services. Level 3 (Component) changes with significant refactors. Level 4 (Code) is generated on demand and never persisted — it’s too volatile to maintain.
Common Mistakes
Mixing Abstraction Levels
The most common C4 mistake is putting databases on the same diagram as code classes, or showing network protocols alongside business concepts. Each C4 level has a vocabulary:
- Context: Systems, users, external services
- Container: Applications, databases, file systems, message brokers
- Component: Modules, services, controllers, repositories (within a container)
- Code: Classes, interfaces, enums, functions (within a component)
If your diagram has a “PostgreSQL” box next to a “UserService” class, you’re mixing levels.
Over-Detailing Level 1
The System Context diagram should be understandable by someone who’s never written code. If it has more than 10 boxes, you’re showing too much. Merge external systems into categories if needed: “Cloud Infrastructure (AWS)” instead of listing every Lambda, S3 bucket, and SQS queue.
Skipping Level 3
Many teams draw Context and Container diagrams but never produce Component diagrams. This leaves a gap: developers can see the containers but don’t know how their internals are structured. Level 3 is where the real architectural value lives — it shows the design patterns, responsibilities, and data flows that determine how easy the system is to change.
Not Linking to ADRs
A C4 diagram without ADR references is a picture of the current state with no explanation of how you got there. Six months later, a new team member looks at the webhook arrows and asks “why don’t workers just write to the database?” Without ADR-010, nobody remembers.
Conclusion
C4 modelling works because it gives teams a shared abstraction hierarchy that scales from executive summaries (Level 1) to implementation details (Level 4). The format is lightweight — Mermaid diagrams in Markdown cost nothing to produce and live naturally alongside code.
For distributed systems with multiple services, worker pools, and infrastructure dependencies, C4 is the difference between “I think the data flows through…” and “here’s exactly what talks to what, and here’s why we designed it that way.”
Start with Level 1. Draw it on a whiteboard. If your entire team agrees it’s accurate, write it down. Then zoom in.