Enterprise Platform Engineering

Software Engineer · Aug 2024 – Present

Backend-focused platform engineering on scalable distributed systems at Palo Alto Networks. Building Go services, CI feedback acceleration, and operational analytics.

GogRPCProtobufKubernetesBigQueryGrafanaCI/CDPython

Disclosure

This case study describes work at Palo Alto Networks using publicly available information from my resume and LinkedIn. Specific product UIs, customer details, internal metrics beyond what's on my resume, and proprietary architecture are omitted to respect confidentiality. The focus is on engineering challenges, approach, and measurable outcomes.

The Problem

Enterprise cybersecurity platforms serve thousands of organizations and must balance velocity with reliability at every layer. Pre-merge testing was slow, production incidents were hard to reproduce in CI, and developer feedback loops needed acceleration without sacrificing release confidence.

Role & Constraints

Software Engineer

Large-scale distributed systems with strict reliability requirements
Cross-team coordination across multiple engineering organizations
Security and compliance considerations in every change
Must improve CI signal without increasing feedback time

Approach

Built a production Go-based distributed agent simulator and end-to-end test harness for microservices. Designed deterministic validation pipelines and golden test datasets to reproduce production incidents in CI. Replayed production-shaped traffic with randomized timing to surface latency regressions early. Designed an agentic internal platform (MCP server) orchestrating PagerDuty, Grafana, BigQuery, GitLab, Jira, and Confluence for automated incident triage.

Media

┌────────────┐   ┌─────────────┐   ┌──────────────────┐
│ Developer   │──▶│ Git Push /  │──▶│ Agent Simulator  │
│ Commit      │   │ CI Trigger  │   │ (Go, gRPC)       │
└────────────┘   └─────────────┘   └────────┬─────────┘
                                            │
                  ┌─────────────┐   ┌───────▼─────────┐
                  │ Release     │◀──│ Validation      │
                  │ Gate        │   │ Pipeline        │
                  └──────┬──────┘   └───────┬─────────┘
                         │                  │
                  ┌──────▼──────┐   ┌───────▼─────────┐
                  │ Production  │   │ Golden Tests +  │
                  │ Deploy      │   │ Traffic Replay  │
                  └─────────────┘   └─────────────────┘

┌──────────────────────────────────────────────────────┐
│ MCP Agentic Platform                                 │
│ PagerDuty · Grafana · BigQuery · GitLab · Jira      │
└──────────────────────────────────────────────────────┘

CI feedback loop with agent simulator and agentic triage platform

Outcomes

Improved pre-merge issue detection by ~20% and reduced feedback time from 4 min to 3 min
Reproduced ~85% of recent production incidents in CI, reducing flaky failures by ~30%
Surfaced p95/p99 latency regressions ~2 weeks earlier, catching ~10 performance regressions before release
Reduced manual QA by ~40% and prevented ~5 post-merge regressions per quarter

Proof

LinkedIn Profile

Reflection

Working at this scale taught me that the hardest part of enterprise engineering isn't writing code. It's understanding the blast radius of every change, communicating across teams, and maintaining velocity without introducing risk. Building the agentic MCP platform for incident triage showed me how internal tooling can have outsized impact when it reduces cognitive load during high-pressure moments.

Previous project

Orwell Web Scraper

An async scraping pipeline resilient to blocking and churn, maintaining access on ~90% of runs and producing 26k labeled assets for downstream classification.

Next project

Portus

A Rust daemon that brokers port allocations atomically, preventing collisions between dev servers and AI coding agents via lease-based IPC and an MCP server.

← All work