Diagnosing and Rebuilding Delivery Systems Under Pressure
By Carlo Kruger – October 2025
Context
In mid-2025, I took the initiative to assess the health of a mission-critical platform for a large South African insurer — a cloud-native system that manages and allocates sales leads across business units.
What I found was not a failed system, but a capable team operating under unrelenting pressure without the stabilising influence of engineering management. The architecture was sound, the developers were competent, and the tools existed — yet delivery quality had eroded to crisis levels.
This case study documents how I approached the diagnosis, what I uncovered, and how I would lead such a team toward long-term technical and cultural recovery.
Diagnosis
Evidence-Based Assessment
Using a combination of commit-history analysis, code-review data, and automated coverage reports, I mapped the decline in engineering discipline over time.
Code-Review Collapse
| Period | Total Commits | Review Feedback | Review Rate | Quality Score |
|---|---|---|---|---|
| May–Aug 2025 | 1 408 (≈352/mo) | 17 | 1.2% | 2.4 % rushed commits |
| September | 810 | 18 | 2.2% | Maintained standards |
| October | 424 | 0 | 0% | 50.5 % rushed commits |
Interpretation:
September shows over-extension (2.3× normal throughput); October shows collapse — zero code reviews and a flood of “temp logging” or “WIP” commits. No corrective management action occurred as standards deteriorated.
“October Crisis” Commit Patterns
“temp logging” · “more logging” · “Remove logging” · “Remoe console logging”
Developers were firefighting emergent incidents directly in code, bypassing observability tools. This pattern signals burnout, loss of process discipline, and an urgent need for stabilising leadership.
| Component | Files | Coverage | Business Risk |
|---|---|---|---|
| Data Masking (POPIA) | 15 | 0% | R 1 M – R 10 M regulatory fines |
| Authorization Matrix | 8 | 0% | Security-breach exposure |
| SLA Management | 12 | 0% | Reporting failures |
| Domain Layer | 55 | 0% | Operational defects |
| Lambda Functions | 47 | 11% avg | Silent production failures |
The test harness existed — what was missing was enforcement and accountability.
Root Causes
Absence of Engineering Management
- No ownership of code-quality metrics
- Incomplete or unenforced “Definition of Done”
- Unrealistic delivery timelines are accepted without escalation
- Team capacity continually eroded by urgent work
Cultural Fatigue
- Developers operating in survival mode
- Lack of psychological safety to raise quality concerns
- Burnout and disengagement are beginning to surface
Invisible Technical Debt
- Debt not tracked or surfaced to leadership
- Business unaware of their costs until incidents occur
These are not technical failures; they are management design failures.
Intervention Plan
Phase 1 – Stabilise (First 30 Days)
Week 1 – Listen & Map
- 1-on-1s with every engineer and QA
- Map the end-to-end SDLC and quality gates
- Produce a State of Engineering report for leadership
Weeks 2–3 – Quick Wins
- Reinstate mandatory code review via branch protection
- Introduce PR checklist (tests, docs, security)
- Remove console debugging; restore CloudWatch dashboards
- Configure basic SNS alerts (≈ R 75 K, prevents 80 % of undetected incidents)
- Establish “Stop-the-Line” policy for production issues
Week 4 – Define Standards
- Engineering Standards Document
- Updated Definition of Done
- Quality Metrics Dashboard (test coverage, review rate, MTTR, tech-debt trend)
Phase 2 – Build Momentum (First 90 Days)
Parallel Tracks: 60 % Quality / 40 % Feature
Quality Track Priorities
| Focus | Duration | Cost | Outcome |
|---|---|---|---|
| POPIA Data Masking Tests | Weeks 5–6 | R 150–250 K | Audit-ready compliance |
| Authorization Matrix Tests | Weeks 7-9 | R 300–450 K | Security assurance |
| SLA Management Tests | Weeks 10-12 | R 250–400 K | Reliable reporting |
Developer Experience
- Faster CI/CD feedback
- Local setup simplified
- Pair programming & monthly tech talks
Technical Debt Management
- 20 % sprint capacity reserved for debt
- Debt tracked with business-impact tags
- Monthly review with Product
Incident Response
- On-call rotation + runbooks
- Blameless post-mortems
Growth & Mentorship
- Career plans for each developer
- Training budget and mentorship pairings
Phase 3 – Transform Culture (First 180 Days)
Vision: A team that takes pride in craft, ships with confidence, and owns its platform.
Levers of Change
- Lead by Example – participate in reviews, docs, on-call, and visible learning.
- Celebrate Quality – highlight exemplary testing and refactoring.
- Make Quality Visible – dashboards reviewed weekly.
- Empower Autonomy – engineers can say, “This needs more time to do right.”
Result: a self-correcting culture where speed and quality reinforce each other.
Quantifying the Impact
Business Risk vs Investment
| Item | Annual Exposure | Mitigation Value |
|---|---|---|
| Regulatory violations | R 1 M – 10 M | High |
| Operational defects | R 0.5 M – 1.5 M | Medium |
| Revenue leakage | R 0.25 M – 0.75 M | Medium |
Engineering Manager Cost: R 1.4 M – 1.8 M per year
| Benefit | Conservative | Optimistic |
|---|---|---|
| Reduced defects | R 400 K / yr | R 800 K / yr |
| Prevented regulatory incidents | R 1 M / yr | R 5 M / yr |
| Predictable delivery value | R 500 K / yr | R 1.5 M / yr |
| Lower turnover costs | R 200 K / yr | R 500 K / yr |
| Total Annual Benefit | R 2.1 M | R 7.8 M |
Break-Even: 3–6 months 3-Year ROI: 250 % – 1 200 %
Long-Term Transformation Indicators (12 Months)
| Dimension | Target Metric |
|---|---|
| Quality | ≥ 80 % coverage on new logic; 100 % code reviews maintained |
| Reliability | Zero P0 incidents from untested code |
| Debt Reduction | 40 % tech-debt backlog burn-down |
| Team Health | Dev satisfaction > 8 / 10; zero unplanned turnover |
| Velocity | +25 % due to less firefighting |
| Business Outcomes | > 99 % lead-allocation accuracy; documented POPIA compliance; MTTD < 5 min; MTTR -60 % |
Leadership Philosophy & Why This Approach Works
My approach to engineering management rests on a simple principle: build systems that enable people to do their best work consistently. The framework that guides me — refined through years at Unboxed Consulting and applied again in this case — rests on four cornerstones.
1. Technical Credibility
Leadership starts with competence.
In this engagement, I could analyse codebases in multiple languages and frameworks, AWS Lambda architectures, PostgreSQL performance, and React/TypeScript patterns firsthand. That hands-on fluency earns developer trust — and trust is the currency of technical leadership.
I may not write production code, but I need to understand the trade-offs behind every pull request. My role is to create an environment where sound engineering judgment is respected and enforced.
2. Data-Driven Decision Making
The story of this turnaround begins with measurement: commit patterns, test coverage, risk exposure.
I quantify before I intervene. Every corrective action — from reinstating code reviews to budgeting for test coverage — was grounded in evidence and ROI.
Good engineering management is about balancing speed and quality. Those decisions can’t be made by instinct alone; they require visibility and data.
3. Business Acumen
Engineering doesn’t exist in a vacuum. POPIA compliance isn’t just a technical checkbox — it’s a multi-million-rand regulatory exposure. Missing SLA logic means business reporting failures, not just failing tests.
My approach bridges that gap: translate technical debt and quality metrics into business-risk language that executives can act on. This ensures that technical excellence aligns directly with commercial outcomes.
4. Empathy for Developers
When I saw 810 commits in September, followed by 50 % “rushed” in October, I didn’t see lazy engineers; I saw a team drowning without support.
Empathy doesn’t mean lowering standards — it means protecting capacity, saying “no” when the system is overloaded, and helping people rediscover pride in their craft.
The best engineering managers are force multipliers: they clear blockers, defend focus, and rebuild confidence.
The Broader Pattern
This case illustrates a repeatable pattern I’ve seen across organisations:
1. Pressure erodes quality when there’s no empowered engineering leadership.
2. Quality data exposes the true nature of the problem.
3. Restoring discipline through process, metrics, and empathy reverses the trend.
It’s the same approach I’d apply anywhere a team is under strain: diagnose objectively, intervene decisively, and rebuild sustainably.
About carlo kruger
optimistic apocalyptarian. agilist. cook. cat-lover. coffee snob. aka grumpycat. ai enthusiast

