Building Custom Ansible Execution Environments

Situation

As Ansible projects grow, managing dependencies (Python libraries, Ansible collections, system binaries) on every developer’s machine and every CI/CD runner becomes a nightmare. “It works on my machine” is a common phrase when a playbook fails because a specific version of the community.general collection is missing.

The solution is Ansible Execution Environments (EE): container images that bundle everything needed to run your playbooks.

Task 1 – The Anatomy of an EE

An Execution Environment is a standard Docker/Podman image that contains:

RHEL/UBI Base: A stable base OS.
Ansible Core: The engine itself.
Python Dependencies: Libraries like netaddr, requests, or pyvmomi.
Ansible Collections: Modules for specific platforms (e.g., community.vmware, ansible.posix).

Task 2 – The Build Process

We automate our EE builds using a simple build.sh script that wraps the ansible-builder tool.

1. Define your requirements

We maintain a requirements.txt for Python and a requirements.yml for Ansible collections.

# requirements.yml
collections:
  - name: community.vmware
  - name: ansible.posix
  - name: community.general

2. The Build Script

The build script ensures we are using the correct tags and handling the container registry login.

#!/bin/bash

IMAGE_NAME="ansible-ee-custom"
TAG=$(date +%Y-%m-%d)

echo "Building Ansible Execution Environment: ${IMAGE_NAME}:${TAG}"

# Run the build using ansible-builder or podman/docker directly
podman build -t "${IMAGE_NAME}:${TAG}" .

# Tag as latest for local development
podman tag "${IMAGE_NAME}:${TAG}" "${IMAGE_NAME}:latest"

Task 3 – Using the EE in StackGuardian

Once the image is built and pushed to our private registry, we configure our automation platforms (like StackGuardian) to use it.

In StackGuardian, you define the “Runner Image” for your workflow. Instead of using a generic image, you point it to: registry.example.internal/automation/ansible-ee-custom:latest

Now, whenever a workflow runs, it spins up a container that has the exact same environment as the one you tested locally.

Why This Matters

Consistency: Every run is identical, regardless of which physical node StackGuardian chooses to run the container on.
Portability: New team members don’t need to spend hours installing Python libraries. They just need Podman and the image.
Speed: By using caching layers in our container build, we can add a single new collection and rebuild the image in seconds.

Execution Environments represent the shift of Ansible from a “scripting tool” to a true Infrastructure as Code platform that follows modern software engineering principles.

Architecture Diagram

Building Custom Ansible Execution Environments execution diagram

This diagram supports Building Custom Ansible Execution Environments and highlights where controls, validation, and ownership boundaries sit in the workflow.

Post-Specific Engineering Lens

For this post, the primary objective is: Increase automation reliability and reduce human variance.

Implementation decisions for this case

Chose a staged approach centered on ansible to avoid high-blast-radius rollouts.
Used containers checkpoints to make regressions observable before full rollout.
Treated devops documentation as part of delivery, not a post-task artifact.

Practical command path

These are representative execution checkpoints relevant to this post:

ansible-playbook site.yml --limit target --check --diff
ansible-playbook site.yml --limit target
ansible all -m ping -o

Validation Matrix

Validation goal	What to baseline	What confirms success
Functional stability	service availability, package state, SELinux/firewall posture	`systemctl --failed` stays empty
Operational safety	rollback ownership + change window	`journalctl -p err -b` has no new regressions
Production readiness	monitoring visibility and handoff notes	critical endpoint checks pass from at least two network zones

Failure Modes and Mitigations

Failure mode	Why it appears in this type of work	Mitigation used in this post pattern
Inventory scope error	Wrong hosts receive a valid but unintended change	Use explicit host limits and pre-flight host list confirmation
Role variable drift	Different environments behave inconsistently	Pin defaults and validate required vars in CI
Undocumented manual step	Automation appears successful but remains incomplete	Move manual steps into pre/post tasks with assertions

Recruiter-Readable Impact Summary

Scope: deliver Linux platform changes with controlled blast radius.
Execution quality: guarded by staged checks and explicit rollback triggers.
Outcome signal: repeatable implementation that can be handed over without hidden steps.

Engineer Command Palette

Building Custom Ansible Execution Environments

Case Snapshot

Situation

Issue

Solution

Used In

Impact

Situation

Task 1 – The Anatomy of an EE

Task 2 – The Build Process

1. Define your requirements

2. The Build Script

Task 3 – Using the EE in StackGuardian

Why This Matters

Architecture Diagram

Post-Specific Engineering Lens

Implementation decisions for this case

Practical command path

Validation Matrix

Failure Modes and Mitigations

Recruiter-Readable Impact Summary