Case Study: Engineering Management in Action

Diagnosing and Rebuilding Delivery Systems Under Pressure

By Carlo Kruger – October 2025

Context

In mid-2025, I took the initiative to assess the health of a mission-critical platform for a large South African insurer — a cloud-native system that manages and allocates sales leads across business units.

What I found was not a failed system, but a capable team operating under unrelenting pressure without the stabilising influence of engineering management. The architecture was sound, the developers were competent, and the tools existed — yet delivery quality had eroded to crisis levels.

This case study documents how I approached the diagnosis, what I uncovered, and how I would lead such a team toward long-term technical and cultural recovery.

Diagnosis

Evidence-Based Assessment

Using a combination of commit-history analysis, code-review data, and automated coverage reports, I mapped the decline in engineering discipline over time.

Code-Review Collapse

Period	Total Commits	Review Feedback	Review Rate	Quality Score
May–Aug 2025	1 408 (≈352/mo)	17	1.2%	2.4 % rushed commits
September	810	18	2.2%	Maintained standards
October	424	0	0%	50.5 % rushed commits

Interpretation:
September shows over-extension (2.3× normal throughput); October shows collapse — zero code reviews and a flood of “temp logging” or “WIP” commits. No corrective management action occurred as standards deteriorated.

“October Crisis” Commit Patterns

“temp logging” · “more logging” · “Remove logging” · “Remoe console logging”

Developers were firefighting emergent incidents directly in code, bypassing observability tools. This pattern signals burnout, loss of process discipline, and an urgent need for stabilising leadership.

Component	Files	Coverage	Business Risk
Data Masking (POPIA)	15	0%	R 1 M – R 10 M regulatory fines
Authorization Matrix	8	0%	Security-breach exposure
SLA Management	12	0%	Reporting failures
Domain Layer	55	0%	Operational defects
Lambda Functions	47	11% avg	Silent production failures

The test harness existed — what was missing was enforcement and accountability.

Root Causes

Absence of Engineering Management

No ownership of code-quality metrics
Incomplete or unenforced “Definition of Done”
Unrealistic delivery timelines are accepted without escalation
Team capacity continually eroded by urgent work

Cultural Fatigue

Developers operating in survival mode
Lack of psychological safety to raise quality concerns
Burnout and disengagement are beginning to surface

Invisible Technical Debt

Debt not tracked or surfaced to leadership
Business unaware of their costs until incidents occur

These are not technical failures; they are management design failures.

Intervention Plan

Phase 1 – Stabilise (First 30 Days)

Week 1 – Listen & Map

1-on-1s with every engineer and QA
Map the end-to-end SDLC and quality gates
Produce a State of Engineering report for leadership

Weeks 2–3 – Quick Wins

Reinstate mandatory code review via branch protection
Introduce PR checklist (tests, docs, security)
Remove console debugging; restore CloudWatch dashboards
Configure basic SNS alerts (≈ R 75 K, prevents 80 % of undetected incidents)
Establish “Stop-the-Line” policy for production issues

Week 4 – Define Standards

Engineering Standards Document
Updated Definition of Done
Quality Metrics Dashboard (test coverage, review rate, MTTR, tech-debt trend)

Phase 2 – Build Momentum (First 90 Days)

Parallel Tracks: 60 % Quality / 40 % Feature

Quality Track Priorities

Focus	Duration	Cost	Outcome
POPIA Data Masking Tests	Weeks 5–6	R 150–250 K	Audit-ready compliance
Authorization Matrix Tests	Weeks 7-9	R 300–450 K	Security assurance
SLA Management Tests	Weeks 10-12	R 250–400 K	Reliable reporting

Developer Experience

Faster CI/CD feedback
Local setup simplified
Pair programming & monthly tech talks

Technical Debt Management

20 % sprint capacity reserved for debt
Debt tracked with business-impact tags
Monthly review with Product

Incident Response

On-call rotation + runbooks
Blameless post-mortems

Growth & Mentorship

Career plans for each developer
Training budget and mentorship pairings

Phase 3 – Transform Culture (First 180 Days)

Vision: A team that takes pride in craft, ships with confidence, and owns its platform.

Levers of Change

Lead by Example – participate in reviews, docs, on-call, and visible learning.
Celebrate Quality – highlight exemplary testing and refactoring.
Make Quality Visible – dashboards reviewed weekly.
Empower Autonomy – engineers can say, “This needs more time to do right.”

Result: a self-correcting culture where speed and quality reinforce each other.

Quantifying the Impact

Business Risk vs Investment

Item	Annual Exposure	Mitigation Value
Regulatory violations	R 1 M – 10 M	High
Operational defects	R 0.5 M – 1.5 M	Medium
Revenue leakage	R 0.25 M – 0.75 M	Medium

Engineering Manager Cost: R 1.4 M – 1.8 M per year

Benefit	Conservative	Optimistic
Reduced defects	R 400 K / yr	R 800 K / yr
Prevented regulatory incidents	R 1 M / yr	R 5 M / yr
Predictable delivery value	R 500 K / yr	R 1.5 M / yr
Lower turnover costs	R 200 K / yr	R 500 K / yr
Total Annual Benefit	R 2.1 M	R 7.8 M

Break-Even: 3–6 months 3-Year ROI: 250 % – 1 200 %

Long-Term Transformation Indicators (12 Months)

Dimension	Target Metric
Quality	≥ 80 % coverage on new logic; 100 % code reviews maintained
Reliability	Zero P0 incidents from untested code
Debt Reduction	40 % tech-debt backlog burn-down
Team Health	Dev satisfaction > 8 / 10; zero unplanned turnover
Velocity	+25 % due to less firefighting
Business Outcomes	> 99 % lead-allocation accuracy; documented POPIA compliance; MTTD < 5 min; MTTR -60 %

Leadership Philosophy & Why This Approach Works

My approach to engineering management rests on a simple principle: build systems that enable people to do their best work consistently. The framework that guides me — refined through years at Unboxed Consulting and applied again in this case — rests on four cornerstones.

1. Technical Credibility

Leadership starts with competence.
In this engagement, I could analyse codebases in multiple languages and frameworks, AWS Lambda architectures, PostgreSQL performance, and React/TypeScript patterns firsthand. That hands-on fluency earns developer trust — and trust is the currency of technical leadership.

I may not write production code, but I need to understand the trade-offs behind every pull request. My role is to create an environment where sound engineering judgment is respected and enforced.

2. Data-Driven Decision Making

The story of this turnaround begins with measurement: commit patterns, test coverage, risk exposure.
I quantify before I intervene. Every corrective action — from reinstating code reviews to budgeting for test coverage — was grounded in evidence and ROI.

Good engineering management is about balancing speed and quality. Those decisions can’t be made by instinct alone; they require visibility and data.

3. Business Acumen

Engineering doesn’t exist in a vacuum. POPIA compliance isn’t just a technical checkbox — it’s a multi-million-rand regulatory exposure. Missing SLA logic means business reporting failures, not just failing tests.

My approach bridges that gap: translate technical debt and quality metrics into business-risk language that executives can act on. This ensures that technical excellence aligns directly with commercial outcomes.

4. Empathy for Developers

When I saw 810 commits in September, followed by 50 % “rushed” in October, I didn’t see lazy engineers; I saw a team drowning without support.
Empathy doesn’t mean lowering standards — it means protecting capacity, saying “no” when the system is overloaded, and helping people rediscover pride in their craft.

The best engineering managers are force multipliers: they clear blockers, defend focus, and rebuild confidence.

The Broader Pattern

This case illustrates a repeatable pattern I’ve seen across organisations:
1. Pressure erodes quality when there’s no empowered engineering leadership.
2. Quality data exposes the true nature of the problem.
3. Restoring discipline through process, metrics, and empathy reverses the trend.

It’s the same approach I’d apply anywhere a team is under strain: diagnose objectively, intervene decisively, and rebuild sustainably.

Case Study: Engineering Management in Action

Diagnosing and Rebuilding Delivery Systems Under Pressure

Intervention Plan

The Broader Pattern

About carlo kruger