Issue
Mixed branches make PRs unreviewable, increase blast radius, and risk dragging unrelated changes into production. When one branch contains role code, host variables, certificate files, and inventory updates together, reviewers cannot isolate what changed or why.
Solution
Split the oversized branch into multiple clean, topic-focused branches by checking out only the relevant paths from the mixed branch into new branches created fresh off main.
A master reference merging daily Linux operations, Ansible Vault secrets, Python environments, Molecule testing, networking diagnostics, and Git recovery commands into a single, massive cheatsheet.
Issue
Scattered knowledge means slower response times during critical operations. Having Linux commands on one page and Ansible/Python commands on another breaks the operational flow.
Solution
Compiled every sanitized, production-tested command snippet from my daily workflow into a single, massive reference guide with a coordinated SVG poster set.
Issue
The knowledge existed, but it was fragmented across storage work, account management, package checks, Git recovery, and automation workflows. That fragmentation increases the chance of typos and slows down repeat work.
Solution
Consolidated the most reused Linux and admin commands into a snippets-first cheatsheet, grouped them by task, added flag guidance, and replaced every real identifier with placeholders.
Issue
Direct IP replacement would cause service disruption. Applications had hardcoded references to old hostnames. Certificates were tied to specific DNS names. Testing needed to happen in parallel with production operation.
Solution
Implemented a two-phase DNS migration strategy using temporary test records, multi-SAN certificates, and coordinated DNS switchover during a planned maintenance window.
Issue
Storage operations were handled inconsistently across the team. Some admins would reboot servers for partition changes, others would attempt risky online operations without proper checkpoints, and migrations often resulted in extended downtime windows.
Solution
Documented a standardized LVM playbook covering the three core operations—expansion, shrinking, and migration—with clear pre-flight checks, execution steps, and rollback procedures.
A complete guide to integrating Linux with Active Directory: mapping AD groups to local permissions, deploying Kerberos SSO, and troubleshooting PAM issues.
Issue
AD integration was fragmented across multiple playbooks with no unified approach. Users couldn't 'su' to service accounts, SSO setup was manual and error-prone, and access control required manual sudoers edits on each server.
Solution
Implemented a unified AD integration strategy: AD group mapping for sudo access, automated Kerberos keytab deployment via Ansible, and standardized PAM configuration across all servers.
Issue
NFS configuration was inconsistent across servers. Some used hostnames, others used IPs. Network routing issues caused connections over slow backup networks instead of high-bandwidth production networks. Permission errors blocked user access.
Solution
Implemented automated NFS management using Ansible roles for export configuration, client mounting with proper network selection, and troubleshooting runbooks for common failure scenarios.
A practical guide to the Linux server provisioning workflow—from creating AD groups and technical users to Ansible role deployment and application-specific configurations.
Issue
Server provisioning was inconsistent across team members. Some skipped steps, documentation was scattered across wikis and emails, and handoffs to application teams were incomplete—missing access groups, wrong technical user configurations, or incomplete application dependencies.
Solution
Developed a standardized provisioning checklist and Ansible playbook structure that covers the complete lifecycle from VM deployment to application-ready state.
A concrete security module set for an edge AI backend: AES-256-GCM at rest, adaptive rate limiting, input validation, alerting, and automated scanning.
Issue
Without explicit controls, an AI API is vulnerable to abuse (burst traffic), unsafe inputs (command/path traversal), leaked secrets, and silent security regressions from dependencies.
Solution
Implemented five security modules: encryption at rest, enhanced rate limiting, advanced input validation, security monitoring + alerts, and vulnerability scanning with report generation.
Issue
Lack of understanding about stretched networks, leaf-spine trade-offs, and how application traffic patterns would be affected.
Solution
Documented the stretched network architecture, analyzed application traffic flows, and provided clear guidance on which applications were suitable for stretched L2 vs. Layer 3 approaches.
Issue
Manual software installations were time-consuming, inconsistent across servers, and couldn't be reproduced reliably for disaster recovery.
Solution
Developed Ansible patterns for silent installations with templated response files, pre-requisite validation, and idempotent deployment checks.
Issue
Ansible playbooks that worked on the control node failed on execution environments with missing dependencies, and reproducing issues was difficult without consistent environments.
Solution
Built custom Execution Environments using ansible-builder, packaging all Python dependencies, Ansible collections, and system packages into versioned container images.
Issue
No consistent reverse proxy pattern, manual SSL certificate management, and inconsistent load balancer configurations across environments.
Solution
Developed an Ansible role for Apache reverse proxy with automated SSL deployment, health check endpoints, and standardized load balancer configurations.
Issue
No centralized user management for local accounts, UID/GID inconsistencies breaking applications, and sudo access scattered across individual sudoers files.
Solution
Implemented Ansible-based user management with host_vars for server-specific accounts, standardized UID/GID ranges, and templated sudoers configurations.
Issue
No automated way to gather user permission data, manual auditing was error-prone and time-consuming, and compliance reports were always delayed.
Solution
Developed a bash script that collects user accounts, sudo access, and group memberships, outputting a standardized report that could be consolidated across all servers.
A practical cheatsheet covering the most essential commands for managing RHEL systems on a daily basis: systemd, storage, networking, and user management.
Issue
No single source of truth for common RHEL administration commands, leading to inconsistent practices and repeated onboarding questions.
Solution
Created a living cheatsheet covering systemd, LVM, networking, user management, and troubleshooting - the commands used daily in our environment.
Issue
No clear separation between dev and prod environments, inconsistent variable hierarchy, and accidental cross-environment changes were becoming common.
Solution
Implemented a standardized repository structure with separate inventory directories, clear group_vars/host_vars hierarchy, and environment-specific variable overrides.