Engineering a Deterministic AI Financial Analyzer

Situation

Building an AI assistant for personal finance requires absolute precision. If an LLM miscalculates a user’s savings rate or formats its response improperly, the application crashes or displays dangerous misinformation.

In this scenario, the AI needed to act as an “Expert Personal Financial Advisor,” categorizing bulk transactions and providing strategic insights without hallucinating numbers.

The Problem with LLM Math

Large Language Models generate text based on probabilities; they do not inherently “calculate” math. If you give an LLM a list of 50 transactions and ask it for the total spend, it will likely guess a number that looks plausible but is entirely wrong.

1. The Pre-Computed Hint Pattern

To solve the math hallucination issue, the architecture was flipped. The LLM is never asked to do math.

Instead, the client application (or a deterministic backend script) calculates the exact financial metrics—such as “Spending Velocity”, “Total Income”, and “Savings Rate”—before the AI is called.

These exact numbers are injected into the system prompt as computed_hints.

// Example of injected computed hints
let computedHintsContext = "";
if (data.computed_hints) {
    computedHintsContext = `
    FACTS YOU MUST USE:
    - User has spent $1,200 out of $2,000 budget.
    - Savings rate is exactly 15%.
    `;
}

The LLM is instructed to use these exact figures to generate its narrative, completely eliminating mathematical errors.

2. Enforcing Strict JSON Outputs

To ensure the client application can parse the AI’s response, the prompt engineering must be ruthless.

JSON Structure Definition: The exact expected JSON schema is baked directly into the system prompt.
API Enforcements: The API call enforces response_format: { type: "json_object" } on OpenAI-compatible chat APIs that support it.
Server-Side Scrubbing: Even with strict prompting, models sometimes wrap their output in markdown (e.g., ```json). A robust backend scrubber removes these markdown blocks and extracts the string from the first { to the last } before returning it to the client.

3. Zero-Shot Bulk Categorization

Categorizing transactions (e.g., “Uber Eats”, “Tesco”, “Steam”) globally requires broad context.

Instead of training a custom classification model, a zero-shot prompt approach was used. The prompt dynamically injects the user’s available categories alongside explicit rules:

International Awareness: Explicit instructions to recognize global brands.
Context Inference: Rules for guessing (e.g., if it contains “Bistro”, default to “Restaurant”).
Hard Overrides: Specific keywords (“Vanguard”, “BlackRock”) are strictly hardcoded to route to “Investments” regardless of other context.

By combining pre-computed mathematics, aggressive JSON scrubbing, and highly structured zero-shot prompting, the backend transforms a probabilistic LLM into a highly deterministic financial engine.

Architecture Diagram

Engineering a Deterministic AI Financial Analyzer execution diagram

This diagram supports Engineering a Deterministic AI Financial Analyzer and highlights where controls, validation, and ownership boundaries sit in the workflow.

Post-Specific Engineering Lens

For this post, the primary objective is: Balance model quality with deterministic runtime constraints.

Implementation decisions for this case

Chose a staged approach centered on llm to avoid high-blast-radius rollouts.
Used fintech checkpoints to make regressions observable before full rollout.
Treated prompt-engineering documentation as part of delivery, not a post-task artifact.

Practical command path

These are representative execution checkpoints relevant to this post:

./llama-server --ctx-size <n> --cache-type-k q4_0 --cache-type-v q4_0
curl -s http://localhost:8080/health
python benchmark.py --profile edge

Validation Matrix

Validation goal	What to baseline	What confirms success
Functional stability	input quality, extraction accuracy, and processing latency	schema validation catches malformed payloads
Operational safety	rollback ownership + change window	confidence/fallback policy routes low-quality outputs safely
Production readiness	monitoring visibility and handoff notes	observability captures latency + quality per request class

Failure Modes and Mitigations

Failure mode	Why it appears in this type of work	Mitigation used in this post pattern
Over-allocated context	Memory pressure causes latency spikes or OOM	Tune ctx + cache quantization from measured baseline
Silent quality drift	Outputs degrade while latency appears fine	Track quality samples alongside perf metrics
Single-profile dependency	No graceful behavior under load	Define fallback profile and automatic failover rule

Recruiter-Readable Impact Summary

Scope: ship AI features with guardrails and measurable quality.
Execution quality: guarded by staged checks and explicit rollback triggers.
Outcome signal: repeatable implementation that can be handed over without hidden steps.

Engineer Command Palette

Engineering a Deterministic AI Financial Analyzer

Case Snapshot

Situation

Issue

Solution

Used In

Impact

Situation

The Problem with LLM Math

1. The Pre-Computed Hint Pattern

2. Enforcing Strict JSON Outputs

3. Zero-Shot Bulk Categorization

Architecture Diagram

Post-Specific Engineering Lens

Implementation decisions for this case

Practical command path

Validation Matrix

Failure Modes and Mitigations

Recruiter-Readable Impact Summary