Case Study: Engineering Management in Action

carlo kruger

carlo kruger

· 8 min read
an engineer in front of a white board solving a problem

Diagnosing and Rebuilding Delivery Systems Under Pressure

By Carlo Kruger – October 2025

Context

In mid-2025, I took the initiative to assess the health of a mission-critical platform for a large South African insurer — a cloud-native system that manages and allocates sales leads across business units.

What I found was not a failed system, but a capable team operating under unrelenting pressure without the stabilising influence of engineering management. The architecture was sound, the developers were competent, and the tools existed — yet delivery quality had eroded to crisis levels.

This case study documents how I approached the diagnosis, what I uncovered, and how I would lead such a team toward long-term technical and cultural recovery.

Diagnosis

Evidence-Based Assessment

Using a combination of commit-history analysis, code-review data, and automated coverage reports, I mapped the decline in engineering discipline over time.

Code-Review Collapse


PeriodTotal CommitsReview FeedbackReview RateQuality Score
May–Aug 20251 408 (≈352/mo)171.2%2.4 % rushed commits
September810182.2%Maintained standards
October42400%50.5 % rushed commits

Interpretation:
September shows over-extension (2.3× normal throughput); October shows collapse — zero code reviews and a flood of “temp logging” or “WIP” commits. No corrective management action occurred as standards deteriorated.

“October Crisis” Commit Patterns

“temp logging” · “more logging” · “Remove logging” · “Remoe console logging”

Developers were firefighting emergent incidents directly in code, bypassing observability tools. This pattern signals burnout, loss of process discipline, and an urgent need for stabilising leadership.

ComponentFilesCoverageBusiness Risk
Data Masking (POPIA)150%R 1 M – R 10 M regulatory fines
Authorization Matrix80%Security-breach exposure
SLA Management120%Reporting failures
Domain Layer550%Operational defects
Lambda Functions4711% avgSilent production failures

The test harness existed — what was missing was enforcement and accountability.

Root Causes

Absence of Engineering Management

  • No ownership of code-quality metrics
  • Incomplete or unenforced “Definition of Done”
  • Unrealistic delivery timelines are accepted without escalation
  • Team capacity continually eroded by urgent work

Cultural Fatigue

  • Developers operating in survival mode
  • Lack of psychological safety to raise quality concerns
  • Burnout and disengagement are beginning to surface

Invisible Technical Debt

  • Debt not tracked or surfaced to leadership
  • Business unaware of their costs until incidents occur

These are not technical failures; they are management design failures.

Intervention Plan

Phase 1 – Stabilise (First 30 Days)

Week 1 – Listen & Map

  • 1-on-1s with every engineer and QA
  • Map the end-to-end SDLC and quality gates
  • Produce a State of Engineering report for leadership

Weeks 2–3 – Quick Wins

  • Reinstate mandatory code review via branch protection
  • Introduce PR checklist (tests, docs, security)
  • Remove console debugging; restore CloudWatch dashboards
  • Configure basic SNS alerts (≈ R 75 K, prevents 80 % of undetected incidents)
  • Establish “Stop-the-Line” policy for production issues

Week 4 – Define Standards

  • Engineering Standards Document
  • Updated Definition of Done
  • Quality Metrics Dashboard (test coverage, review rate, MTTR, tech-debt trend)

Phase 2 – Build Momentum (First 90 Days)

Parallel Tracks: 60 % Quality / 40 % Feature

Quality Track Priorities

FocusDurationCostOutcome
POPIA Data Masking TestsWeeks 5–6R 150–250 KAudit-ready compliance
Authorization Matrix TestsWeeks 7-9R 300–450 KSecurity assurance
SLA Management TestsWeeks 10-12R 250–400 KReliable reporting

Developer Experience

  • Faster CI/CD feedback
  • Local setup simplified
  • Pair programming & monthly tech talks

Technical Debt Management

  • 20 % sprint capacity reserved for debt
  • Debt tracked with business-impact tags
  • Monthly review with Product

Incident Response

  • On-call rotation + runbooks
  • Blameless post-mortems

Growth & Mentorship

  • Career plans for each developer
  • Training budget and mentorship pairings

Phase 3 – Transform Culture (First 180 Days)

Vision: A team that takes pride in craft, ships with confidence, and owns its platform.

Levers of Change

  1. Lead by Example – participate in reviews, docs, on-call, and visible learning.
  2. Celebrate Quality – highlight exemplary testing and refactoring.
  3. Make Quality Visible – dashboards reviewed weekly.
  4. Empower Autonomy – engineers can say, “This needs more time to do right.”

Result: a self-correcting culture where speed and quality reinforce each other.

Quantifying the Impact

Business Risk vs Investment

ItemAnnual ExposureMitigation Value
Regulatory violationsR 1 M – 10 MHigh
Operational defectsR 0.5 M – 1.5 MMedium
Revenue leakageR 0.25 M – 0.75 MMedium

Engineering Manager Cost: R 1.4 M – 1.8 M per year

BenefitConservativeOptimistic
Reduced defectsR 400 K / yrR 800 K / yr
Prevented regulatory incidentsR 1 M / yrR 5 M / yr
Predictable delivery valueR 500 K / yrR 1.5 M / yr
Lower turnover costsR 200 K / yrR 500 K / yr
Total Annual BenefitR 2.1 MR 7.8 M

Break-Even: 3–6 months 3-Year ROI: 250 % – 1 200 %

Long-Term Transformation Indicators (12 Months)

DimensionTarget Metric
Quality≥ 80 % coverage on new logic; 100 % code reviews maintained
ReliabilityZero P0 incidents from untested code
Debt Reduction40 % tech-debt backlog burn-down
Team HealthDev satisfaction > 8 / 10; zero unplanned turnover
Velocity+25 % due to less firefighting
Business Outcomes> 99 % lead-allocation accuracy; documented POPIA compliance; MTTD < 5 min; MTTR -60 %

Leadership Philosophy & Why This Approach Works

My approach to engineering management rests on a simple principle: build systems that enable people to do their best work consistently. The framework that guides me — refined through years at Unboxed Consulting and applied again in this case — rests on four cornerstones.

1. Technical Credibility

Leadership starts with competence.
In this engagement, I could analyse codebases in multiple languages and frameworks, AWS Lambda architectures, PostgreSQL performance, and React/TypeScript patterns firsthand. That hands-on fluency earns developer trust — and trust is the currency of technical leadership.

I may not write production code, but I need to understand the trade-offs behind every pull request. My role is to create an environment where sound engineering judgment is respected and enforced.

2. Data-Driven Decision Making

The story of this turnaround begins with measurement: commit patterns, test coverage, risk exposure.
I quantify before I intervene. Every corrective action — from reinstating code reviews to budgeting for test coverage — was grounded in evidence and ROI.

Good engineering management is about balancing speed and quality. Those decisions can’t be made by instinct alone; they require visibility and data.

3. Business Acumen

Engineering doesn’t exist in a vacuum. POPIA compliance isn’t just a technical checkbox — it’s a multi-million-rand regulatory exposure. Missing SLA logic means business reporting failures, not just failing tests.

My approach bridges that gap: translate technical debt and quality metrics into business-risk language that executives can act on. This ensures that technical excellence aligns directly with commercial outcomes.

4. Empathy for Developers

When I saw 810 commits in September, followed by 50 % “rushed” in October, I didn’t see lazy engineers; I saw a team drowning without support.
Empathy doesn’t mean lowering standards — it means protecting capacity, saying “no” when the system is overloaded, and helping people rediscover pride in their craft.

The best engineering managers are force multipliers: they clear blockers, defend focus, and rebuild confidence.

The Broader Pattern

This case illustrates a repeatable pattern I’ve seen across organisations:
1. Pressure erodes quality when there’s no empowered engineering leadership.
2. Quality data exposes the true nature of the problem.
3. Restoring discipline through process, metrics, and empathy reverses the trend.

It’s the same approach I’d apply anywhere a team is under strain: diagnose objectively, intervene decisively, and rebuild sustainably.

carlo kruger

About carlo kruger

optimistic apocalyptarian. agilist. cook. cat-lover. coffee snob. aka grumpycat. ai enthusiast

Copyright © 2026 . All rights reserved.
Made by Web3Templates· Github
Powered by Vercel