Automating Golden Images with Packer and StackGuardian

Situation

Maintaining a consistent “Golden Image” is the cornerstone of a stable infrastructure. A Golden Image is a pre-configured template that includes security hardening (like CIS benchmarks), standard monitoring agents, and corporate configurations.

Instead of building these manually, we use HashiCorp Packer for the build logic and StackGuardian for the orchestration and lifecycle management.

The Build Hierarchy: Base vs. Common

We split our image builds into two distinct phases to optimize build times and maintainability.

1. The Base Image

The Base Image is built from the raw ISO using the vsphere-iso Packer plugin. It handles:

The Kickstart automated installation.
Basic disk partitioning (LVM).
Initial package updates.
We rarely deploy this image directly; it serves as the parent for all others.

2. The Common Image

The Common Image is built by cloning the Base Image using the vsphere-clone plugin. This is the image used for 90% of our server fleet. It includes:

CIS Hardening: Applying security policies to the OS.
Agents: Installing Nessus, Checkmk, and UC4 agents.
Corporate Config: SSSD, realmd, and internal certificates.

Orchestration with StackGuardian

StackGuardian acts as our CI/CD runner for these images. It provides a UI for triggering builds and manages the sensitive environment variables (like vCenter credentials) in a secure vault.

The Workflow:

Code Check-in: Packer HCL code and Ansible hardening roles are stored in Azure DevOps.
StackGuardian Deployment: A workflow is defined in StackGuardian that points to the repository.
Provisioning: StackGuardian spins up a containerized runner, initializes Packer, and executes the build.
Verification: After the build, the runner can execute a test playbook to ensure the image meets all requirements before marking it as “Stable” in the vCenter gallery.

Key Packer Snippet (HCL)

Here is a simplified look at how we define the vsphere-clone source:

source "vsphere-clone" "rhel-common" {
  template             = "rhel9-base-template-YYYY-MM"
  vm_name              = "rhel9-common-${local.timestamp}"
  cluster              = "CLUSTER_EXAMPLE_A"
  datacenter           = "DC_EXAMPLE_A"
  folder               = "templates/"
  
  # Connect to vCenter using credentials from the StackGuardian Vault
  username             = var.vcenter_user
  password             = var.vcenter_pass
  
  # Resource allocation
  cpus                 = 2
  ram                  = 4096
  disk_size            = 51200
}

Benefits of this Approach

Reproducibility: If a server is compromised or misconfigured, we can simply redeploy from the latest verified Golden Image.
Security: By embedding CIS hardening into the template, every new server is secure by default from the moment it boots.
Automation: New images are automatically rebuilt monthly after the release of new Red Hat Security Advisories (RHSAs), ensuring our templates are never more than 30 days out of date.

Architecture Diagram

Automating Golden Images with Packer and StackGuardian execution diagram

This diagram visualizes the Two-Phase Automated Golden Image Pipeline orchestrating Packer through StackGuardian. It demonstrates separation of concerns: an infrequent heavy vsphere-iso base build, decoupled from the frequent vsphere-clone automation that applies shifting CIS controls and configuration management logic before sealing the resulting asset into vCenter formats.

Post-Specific Engineering Lens

For this post, the primary objective is: Apply infrastructure practices with measurable validation and clear rollback ownership.

Implementation decisions for this case

Chose a staged approach centered on packer to avoid high-blast-radius rollouts.
Used stackguardian checkpoints to make regressions observable before full rollout.
Treated devops documentation as part of delivery, not a post-task artifact.

Practical command path

These are representative execution checkpoints relevant to this post:

echo "define baseline"
echo "apply change with controls"
echo "validate result and handoff"

Validation Matrix

Validation goal	What to baseline	What confirms success
Functional stability	service availability, package state, SELinux/firewall posture	`systemctl --failed` stays empty
Operational safety	rollback ownership + change window	`journalctl -p err -b` has no new regressions
Production readiness	monitoring visibility and handoff notes	critical endpoint checks pass from at least two network zones

Failure Modes and Mitigations

Failure mode	Why it appears in this type of work	Mitigation used in this post pattern
Scope ambiguity	Teams execute different interpretations	Write explicit pre-check and success criteria
Weak rollback plan	Incident recovery slows down	Define rollback trigger + owner before rollout
Insufficient telemetry	Failures surface too late	Require post-change monitoring checkpoints

Recruiter-Readable Impact Summary

Scope: deliver Linux platform changes with controlled blast radius.
Execution quality: guarded by staged checks and explicit rollback triggers.
Outcome signal: repeatable implementation that can be handed over without hidden steps.

Engineer Command Palette

Automating Golden Images with Packer and StackGuardian

Case Snapshot

Situation

Issue

Solution

Used In

Impact

Situation

The Build Hierarchy: Base vs. Common

1. The Base Image

2. The Common Image

Orchestration with StackGuardian

The Workflow:

Key Packer Snippet (HCL)

Benefits of this Approach

Architecture Diagram

Post-Specific Engineering Lens

Implementation decisions for this case

Practical command path

Validation Matrix

Failure Modes and Mitigations

Recruiter-Readable Impact Summary