From Discovery to Sprint Zero: Architecting a Cloud SaaS for Industrial IoT at Scale
When a Swiss industrial company managing thousands of IoT weighing scales across hundreds of customers asked adesso to replace their legacy on-premise inventory system with a cloud SaaS, the challenge wasn't just technical. It was methodological: how do you go from a vague brief to a production-ready architecture — with full team alignment — in under four weeks?
This is the story of how we did it, and what I learned along the way.
The Problem
The existing system was a Windows desktop application doing one job well: it received weight measurements from physical scales via FTP, computed stock levels, and triggered replenishment orders. But it was on-premise, single-tenant, and reaching end of life.
The replacement had to handle roughly 1,500 IoT scales across 108 customers, each measuring inventory at regular intervals, each feeding into alarms, purchase orders, and ERP webhook exports — end-to-end in under 60 seconds. It also had to support a live data migration from the legacy system while keeping all existing customers operational.
That's not a CRUD app.
The Method: nWave
We ran the project through nWave, an AI-assisted software delivery methodology built around six structured waves: DISCOVER → DISCUSS → DESIGN → DEVOPS → DISTILL → DELIVER. Each wave produces typed artifacts — problem validation, user stories, architecture decisions, infrastructure runbooks, acceptance tests, and implementation roadmaps — with AI agents doing the heavy lifting and human architects reviewing and steering.
The key discipline: no wave starts until the prior wave's gate passes. No architecture before validated requirements. No sprint planning before a signed-off walking skeleton.
DISCOVER: Validating the Problem Before Touching Code
Before writing a single line, we ran a structured discovery phase. We mapped 12 core assumptions across 4 risk domains (business viability, technical feasibility, regulatory compliance, migration scope) and systematically stress-tested each one against customer interviews and document evidence.
Key findings that changed our approach:
- EDIFACT orders were a false assumption — the customer used plain HTTP webhooks, not EDI. This eliminated an entire integration complexity.
- Migration scope was larger than expected — all existing customers needed migration, not just new ones. This became a separate phased workstream.
- IP transfer rights were a hard contractual requirement — not a nice-to-have. Architecture had to support eventual on-premise deployment.
The go/no-go gate passed as CONDITIONAL GO, later elevated to GO after contract signature. We documented every assumption, its evidence, and its risk score in a structured decision log. Future-me (or any new team member) can read exactly why each call was made.
DESIGN: Eight Architecture Decisions, Each Justified
The architecture landed on a modular monolith with ports-and-adapters, nine bounded DDD contexts, and a deliberately boring tech stack:
| Layer | Choice | Why |
|---|---|---|
| API | Python 3.12 + FastAPI | Team expertise, async-native |
| Time-series | TimescaleDB on PostgreSQL | Single engine, 2000× headroom vs load |
| Multi-tenancy | Row-Level Security (PostgreSQL) | DB-enforced isolation, no schema sprawl |
| Event delivery | Outbox pattern | No dual-write, transactional guarantee |
| Cloud | Azure Container Apps | adesso expertise, GDPR |
| CI/CD | Bitbucket Pipelines | Existing client toolchain |
The back-of-envelope math was explicit and documented. At 50 measurements/second, TimescaleDB uses 0.05% of rated capacity. No message queue needed at MVP — the decision to skip Kafka isn't laziness, it's a calculated bet with a documented trigger for revisiting.
The one gate that turned into a fixture: probe_rls() — a pytest-based tenant isolation check that runs in CI on every push. If it fails, the build fails. Tenant data leakage is not something you debug in production.
DISCUSS: Eight User Stories with Elevator Pitches
Every user story in this project has an elevator pitch — three lines that force precision:
Before: operations manager cannot tell if a scale is transmitting correctly
After: GET /devices/{id}/status → {"status": "healthy", "last_seen": "2026-06-08T14:00Z"}
Decision enabled: operations manager decides whether to dispatch a technician today
If you can't write the "After" line with a real endpoint and a real observable output, you don't have a user story — you have infrastructure masquerading as value. We rejected several draft stories on this criterion alone.
The walking skeleton — our first end-to-end shippable slice — proves exactly one thing: a scale measurement travels from SFTP ingestion through stock computation, alarm evaluation, order generation, and ERP webhook in under 60 seconds. Everything else is a layer on top.
Sprint Planning: Elephant Carpaccio in Practice
We sliced the backlog using Elephant Carpaccio — the discipline of cutting features into thin vertical slices, each shippable in one day, each carrying a named learning hypothesis.
Sprint 1 (June 16–29): Walking Skeleton — prove the full pipeline works end-to-end with a single tenant and a single scale. No multi-tenancy, no device management UI, no migration. Just the happy path, fully observable, deployed to staging.
Sprint 2 (June 30–July 13): Device Management — an operations manager can onboard a new scale without engineering involvement.
Each sprint has a KPI gate. Sprint 1 doesn't close until probe_rls() passes, the 60-second SLA is measured over 100 simulator cycles, and zero measurements are lost. Numbers, not feelings.
AI-Assisted Development: What Actually Changed
The team is two Python engineers (implementation) and two domain architects (design + review). The AI layer — Claude Code with nWave agents — handles the heavy documentation drafting, architecture analysis, assumption scoring, and roadmap generation. Human architects steer, challenge, and sign off.
What surprised me: the biggest productivity gain wasn't in code generation. It was in decision traceability. Every architecture decision has a rationale, every assumption has evidence, every story has a job-to-be-done trace. In a traditional engagement, half that context lives in someone's head. Here it's in the repo, versioned, searchable, and ready for the next engineer who joins.
The pre-sprint setup used a 5-stage AI-assisted refinement pipeline: the agent generates a refinement file on any sprint doc edit, the team annotates it in Markdown comments, the agent revises based on comments, and stories don't flip to READY until all checklist items pass — and the AI never sets READY itself. Human approval is always the gate.
What's Next
Kickoff is June 11. Sprint 1 starts June 16. The architecture is locked, the backlog is refined, the CI/CD reference is written, and two remaining blockers — ALM tool confirmation and API credentials — close at kickoff.
Go-live is 2027. Phase 1 migrates the highest-priority customers. Phase 2, contingent on legacy data access being granted, migrates the remaining.
The bet we're making: a boring, well-documented, rigorously tested modular monolith, built by a small team with strong AI tooling, will outdeliver a complex microservices architecture built by a large team without it. We'll know by Christmas.