Enterprise Platform Engineering
Software Engineer · Aug 2024 – Present
Backend-focused platform engineering on scalable distributed systems at Palo Alto Networks. Building Go services, CI feedback acceleration, and operational analytics.
Disclosure
This case study describes work at Palo Alto Networks using publicly available information from my resume and LinkedIn. Specific product UIs, customer details, internal metrics beyond what's on my resume, and proprietary architecture are omitted to respect confidentiality. The focus is on engineering challenges, approach, and measurable outcomes.
The Problem
Enterprise cybersecurity platforms serve thousands of organizations and must balance velocity with reliability at every layer. Pre-merge testing was slow, production incidents were hard to reproduce in CI, and developer feedback loops needed acceleration without sacrificing release confidence.
Role & Constraints
Software Engineer
- Large-scale distributed systems with strict reliability requirements
- Cross-team coordination across multiple engineering organizations
- Security and compliance considerations in every change
- Must improve CI signal without increasing feedback time
Approach
Built a production Go-based distributed agent simulator and end-to-end test harness for microservices. Designed deterministic validation pipelines and golden test datasets to reproduce production incidents in CI. Replayed production-shaped traffic with randomized timing to surface latency regressions early. Designed an agentic internal platform (MCP server) orchestrating PagerDuty, Grafana, BigQuery, GitLab, Jira, and Confluence for automated incident triage.
Media
┌────────────┐ ┌─────────────┐ ┌──────────────────┐
│ Developer │──▶│ Git Push / │──▶│ Agent Simulator │
│ Commit │ │ CI Trigger │ │ (Go, gRPC) │
└────────────┘ └─────────────┘ └────────┬─────────┘
│
┌─────────────┐ ┌───────▼─────────┐
│ Release │◀──│ Validation │
│ Gate │ │ Pipeline │
└──────┬──────┘ └───────┬─────────┘
│ │
┌──────▼──────┐ ┌───────▼─────────┐
│ Production │ │ Golden Tests + │
│ Deploy │ │ Traffic Replay │
└─────────────┘ └─────────────────┘
┌──────────────────────────────────────────────────────┐
│ MCP Agentic Platform │
│ PagerDuty · Grafana · BigQuery · GitLab · Jira │
└──────────────────────────────────────────────────────┘CI feedback loop with agent simulator and agentic triage platform
Outcomes
- Improved pre-merge issue detection by ~20% and reduced feedback time from 4 min to 3 min
- Reproduced ~85% of recent production incidents in CI, reducing flaky failures by ~30%
- Surfaced p95/p99 latency regressions ~2 weeks earlier, catching ~10 performance regressions before release
- Reduced manual QA by ~40% and prevented ~5 post-merge regressions per quarter
Proof
Reflection
Working at this scale taught me that the hardest part of enterprise engineering isn't writing code. It's understanding the blast radius of every change, communicating across teams, and maintaining velocity without introducing risk. Building the agentic MCP platform for incident triage showed me how internal tooling can have outsized impact when it reduces cognitive load during high-pressure moments.
Previous project
Orwell Web Scraper
An async scraping pipeline resilient to blocking and churn, maintaining access on ~90% of runs and producing 26k labeled assets for downstream classification.
Next project
Portus
A Rust daemon that brokers port allocations atomically, preventing collisions between dev servers and AI coding agents via lease-based IPC and an MCP server.