Structured Engineering Case Studies

Linux & Virtualization Engineering Portfolio

CURRENT ROLE Linux & Virtualization Engineer Deutsche Pfandbriefbank AG · Madrid

PUBLISHED CASES 46 Technical deep dives

LAST UPDATE Apr 1, 2026 Archive-first publishing

At a Glance

Linux and virtualization engineer documenting real delivery patterns: clear issue statements, implementation choices, and production outcomes.

🐧 22 posts

Linux Infrastructure

RHEL lifecycle management, kernel tuning, Satellite, enterprise Linux operations

Infrastructure Automation

⚡ 12 posts

Platform Automation

Ansible playbooks, IaC, Git workflows, CI/CD pipelines, VMware automation

Automation Cloud

🤖 15 posts

Applied AI & Edge

Azure AI Foundry, local LLM inference, RK3588 edge deployment, document intelligence

AI Local AI

Production Spotlight

App Store →

● Live on Google Play & Web

📱 IntelliFlow: AI Budget Tracker

A production-grade personal finance application serving real users. Features an AI-powered financial coach, offline-first architecture, and cross-platform syncing.

Get it on Google Play Open Web App

Domains

Browse All →

Infrastructure 22

RHEL lifecycle, automation, virtualization, and production operations.

Automation 12

Ansible playbooks, task automation, and configuration management.

AI 7

Generative AI use cases, integration patterns, and practical lessons.

Cloud 2

Azure architecture, infrastructure design, and delivery practices.

Local AI 8

Running models on local hardware with privacy-first workflows.

Kotlin 6

Kotlin projects, notes, and engineering experiments.

Snippets 4

Quick commands and reusable building blocks for day-to-day work.

Featured Projects

View GitHub →

llamacpp-workbench

Local LLM inference workbench for RK3588 and edge devices

Python · JavaScript

IntelliFlow

AI-powered personal finance app with offline-first architecture

Flutter · Production

Ansible Playbooks

Infrastructure automation for enterprise Linux environments

Ansible · YAML

Recent Case Studies

All Posts →

Apr 1, 2026 • 11 min read

Bonsai-8B: Extreme Quantization and the Binary Neural Network Paradigm

A deep technical breakdown of 1-bit quantization in LLMs using the Bonsai-8B model. Exploring binary neural networks, inline dequantization kernels, and achieving 14x compression with minimal quality loss.

Local AI

Situation Deploying LLMs on edge devices with severe memory constraints. Standard FP16 models require 16GB+ VRAM, pricing out most users from running capable models locally.

Used In Edge LLM deployment, RK3588 inference, high-throughput serving

local-aiquantizationbonsaibinary-neural-networks

Mar 30, 2026 • 5 min read

llamacpp-workbench: Remote llama.cpp Control and REAP Model Serving on RK3588

Publishing a practical local-AI control plane for llama.cpp: remote model loading, runtime tuning, streaming chat, and real REAP model serving on a Radxa ROCK 5B+.

Local AI

Situation I wanted a serious remote control surface for local GGUF inference on a Radxa ROCK 5B+ instead of one-off shell commands or generic UIs that hide the important llama.cpp knobs.

Used In Local-first AI serving on a Radxa ROCK 5B+ / RK3588 using source-built llama.cpp and GGUF models, including GLM-4.7-Flash-REAP-23B-A3B.

llama.cpprk3588radxarock-5b-plus

Mar 28, 2026 • 13 min read

Qwen3.5 on RK3588 with llama.cpp: Real Benchmarks from a Radxa ROCK 5B+

An advanced benchmark report for running Qwen3.5 locally on RK3588 with source-built llama.cpp: prefill speed, decode speed, stable context, tool-calling behavior, and the practical model choices that actually work on a Radxa ROCK 5B+.

Local AI

Situation I was tuning a local-first Discord engineering agent on a Radxa ROCK 5B+ (RK3588, 24 GB RAM) and needed hard data for which Qwen3.5 models were actually practical on CPU inference with llama.cpp.

Used In Local-first Discord agent runtime on Radxa ROCK 5B+ / RK3588, built around raw llama.cpp rather than Ollama or LM Studio.

rk3588radxarock-5b-plusllama.cpp

Mar 25, 2026 • 15 min read

GPU VRAM, CPU Offload, and llama.cpp: The Real Performance Cliff

An advanced guide to local GPU inference with llama.cpp: why bandwidth matters more than model fit, how hybrid GPU+CPU offload behaves on cards like the RTX 3060 and 5070, what quantization really means mathematically, and how to run it on Linux, Windows, and WSL.

Local AI

Situation Teams running local models on consumer GPUs often assume that if a model loads, it is production-ready. In practice, once model layers or KV cache spill from VRAM into system RAM, the system hits a bandwidth cliff and throughput collapses.

Used In Local-first AI engineering runtimes and workstation inference setups using llama.cpp on consumer NVIDIA GPUs.

local-aillama.cppcudavram

Mar 25, 2026 • 3 min read

Implementing Google's TurboQuant: Hybrid KV Cache for Edge LLM Deployment

How I implemented hybrid per-layer KV cache quantization on RK3588 using insights from Google's TurboQuant research, achieving 17% better compression with zero quality loss.

Local AI

Situation Running a local-first Discord AI agent (Engram) on a $130 RK3588 single-board computer with 24GB RAM. The challenge: KV cache memory during long conversations would crash the bot or cause severe latency spikes.

Used In Engram AI Discord bot, RADXA AI Suite

local-aiedge-aiturboquantkv-cache

Mar 22, 2026 • 16 min read

The Architecture of Speed: Real-Time Telemetry and Generative AI in 2026 Motorsport

A deep dive into the cloud architectures, real-time data streaming capabilities, and Generative AI setups powering Formula 1 and Formula E in 2026.

Cloud

Situation Top-tier motorsport series (F1 and FE) introduced radical new technical regulations in 2026, causing an explosion in telemetry data (over 1.1 million data points per second) that legacy systems couldn't process in real-time.

Used In Researching modern high-throughput IoT edge-to-cloud architectures for autonomous vehicle frameworks.

system-architecturedata-engineeringawsgoogle-cloud