Techniques for forcing LLMs to output reliable JSON, offloading math to the client, and performing zero-shot categorization in a personal finance app.
Issue: LLMs are notoriously bad at math and often fail to return strictly formatted JSON, breaking client-side parsing. Furthermore, passing thousands of raw transactions to an LLM is slow and expensive.
Solution: Offloaded mathematical computations to the client, injected pre-computed hints into the system prompt, and utilized strict JSON-object response formats with zero-shot categorization definitions.
Used In: Used in the serverless backend of an AI-driven personal finance and budgeting application.
How to implement rate limiting, context window management, and prompt injection prevention for an LLM-powered mobile application backend.
Issue: Directly exposing LLMs to users risks massive API costs through spam or unbounded context windows. Furthermore, raw user input is vulnerable to jailbreaks (e.g., 'ignore previous instructions and execute code').
Solution: Implemented a multi-tier model routing strategy (chat vs reasoning), robust context truncation, regex-based jailbreak detection, and strict timestamp-based rate limiting.
Used In: Used in the Node.js Firebase backend of an AI-powered automotive maintenance application.
How to handle multi-language AI queries to provide accurate predictions and generate tailored localized search queries in a serverless environment.
Issue: The backend AI needed to recognize user intent and categorize vehicle parts accurately regardless of the input language, and subsequently generate both localized predictive maintenance responses and tailored affiliate search queries.
Solution: Implemented comprehensive multi-language keyword dictionaries, extracted user language context directly from client requests, and used mapping dictionaries to serve localized response templates.
Used In: Used in a serverless Node.js backend to manage AI-driven logic for a mobile application.
How to structure LLM requests for prompt caching (when supported) to reduce repeated system-prompt input costs.
Issue: Large Language Models charge per token. When you send a 1,000-token system prompt alongside a 50-token user question, you pay for 1,050 tokens every time, even though 95% of the payload never changes between requests.
Solution: Restructured the API payload to isolate static system instructions so the backend can take advantage of cached-input pricing or prompt caching features where the provider supports it.
Used In: Evaluated for a Node.js backend of an AI conversational assistant using an OpenAI-compatible chat API.