← Back to workFP Complete · VP Platform Engineering, AI R&D

Multi-Model Agent Platform

Problem

Client AI initiatives were slow to ship: model deployment was manual, onboarding new engineers took weeks, and there was no secure, shared substrate for running multi-model LLM and agent workloads in production.

Architecture

A Kubernetes/AWS platform running multi-model LLM and generative-AI pipelines, fronted by SSO-secured agent infrastructure and an internal developer platform with self-service onboarding. Standardized CI/CD on GitHub Actions, with SLAs, incident-response processes, and full observability (Prometheus, Grafana, OpenTelemetry).

What I built

Multi-model orchestration pipelines for LLMs and generative AI
Integrated Model Context Protocol (MCP) infrastructure
Self-service developer platform with automated onboarding
Standardized CI/CD and incident-response practices across teams

Outcomes

−40%

Model deployment time

−50%

Engineer onboarding time

−60%

CI/CD cycle time

+15%

System uptime

−40%

Mean time to recovery

Stack

Kubernetes (EKS)AWSRustPythonGitHub ActionsPrometheus/GrafanaOpenTelemetry