Architecting for delivery velocity

While at Splunk, transitioning a monolithic on-premise product into a scalable, multi-tenant cloud platform was a transformative engineering endeavor. Among the most complex and impactful areas of change was our approach to Continuous Integration and Continuous Delivery (CI/CD). The legacy model: manual, ticket-based, and weeks-long; was incompatible with the velocity required for cloud-native innovation.

The Problem

1. Manual Releases Bottleneck Innovation

Our release process was fragmented and painfully slow. A typical release cycle involved:

Filing tickets and waiting up to a week for approvals
Another week or so to provision infrastructure
Manual test execution across fragmented environments
Redundant cycles when issues inevitably arose

This lead time was a critical blocker for developer productivity and agility. As an architect on the platform team, I saw a clear need to reimagine our release strategy — not just to automate, but to align delivery velocity with the expectations of a cloud-native engineering organization.

2. Cross-Functional Complexity

Solving this problem wasn’t confined to a single domain. It required coordination across ingest, indexing, search, platform, build, infrastructure, and test teams. We needed an architecture that unified these disparate components into a resilient, automated pipeline — without compromising flexibility or quality.

The Solution: CI/CD Reimagined

I architected a solution that revolved around four key design principles:

1. Unified Release Strategy Across Deployment Targets

We standardized on a single development branch as the source of truth for both cloud and on-premise releases. Runtime flags controlled feature toggling and environment-specific configurations. This decision eliminated branching complexity and enabled:

Faster, consistent releases
Reduced overhead from managing divergent code paths
Easier debugging and traceability

This was a deliberate trade-off favoring long-term maintainability and developer simplicity over optimizing for narrowly tailored pipelines.

2. Dynamic, On-Demand Kubernetes Infrastructure

To replace the static provisioning model, we introduced dynamic environment creation using Kubernetes. Key elements included:

Declarative service definitions for cloud builds
Custom load balancers for precise routing and traffic shaping
Just-in-time (JIT) tenant provisioning for ephemeral test environments

This setup allowed any branch or feature to be deployed into a fully isolated environment within minutes — a major leap from the previous multi-week provisioning cycle.

3. Lightweight, Isolated Container Execution

Many of our workloads required quick execution of single-container services (e.g., lightweight microservices or test runners). Rather than spinning up full virtual environments, we used:

Kubernetes node taints and tolerations to allocate dedicated resources
Isolation at the pod level for performance and security
Horizontal scalability for parallelized builds

This achieved faster spin-up times and better resource efficiency without sacrificing container-level isolation.

4. Custom Kubernetes Controller for Declarative Orchestration

The backbone of this system was a custom Kubernetes controller that managed:

Infrastructure provisioning
Application deployment
Automated teardown after successful merges

This controller introduced a declarative API for orchestrating complex release workflows. It abstracted the lifecycle management of our environments, enabling development teams to interact with CI/CD as a self-service model rather than through platform ops or ticket queues.

The Outcome: Fully Automated Developer-Centric CI/CD Pipeline

The architecture enabled several key capabilities:

Commit-to-Release Deployability: Any branch could be tagged and released with no manual intervention
Persistent Dynamic Sandboxes: Environments persisted across commits for consistent testing and reduced spin-up time
Integrated Automated Testing: Testing teams contributed Dockerized test suites that were auto-invoked within the pipeline
Automatic Teardown: Environments were cleaned up post-merge to optimize resource use and cost

Business Impact

100% Cloud Release Automation
Weeks reduced to minutes in infrastructure provisioning
Significant developer velocity gains, enabling faster iteration and delivery
Hundreds of hours saved per release cycle, accelerating roadmap delivery

This initiative was foundational in Splunk’s cloud transformation. The engineering decisions, particularly the unified main branch strategy and investment in a custom controller, prioritized simplicity and scale. In doing so, we not only solved a technical bottleneck, but unlocked the velocity necessary for modern SaaS innovation.

The Problem#

1. Manual Releases Bottleneck Innovation#

2. Cross-Functional Complexity#

The Solution: CI/CD Reimagined#

1. Unified Release Strategy Across Deployment Targets#

2. Dynamic, On-Demand Kubernetes Infrastructure#

3. Lightweight, Isolated Container Execution#

4. Custom Kubernetes Controller for Declarative Orchestration#

The Outcome: Fully Automated Developer-Centric CI/CD Pipeline#

Business Impact#