byabhijeet
AboutContact

Systems. Software. Clear Thinking.

March 25, 20268 min read

Cheap → Expensive LLM Routing: How to Cut AI Costs by 70%

Learn how to implement a routing layer to dispatch LLM requests to the cheapest capable model, reducing costs by up to 70% without sacrificing quality.

April 2, 202517 min

Debugging AI Systems (Your First Real Struggle)

When something goes wrong in a multi-component AI system, where do you start? Tracing prompt-to-output, identifying failure source, structured logging, and the systematic method that beats guessing every time.

March 26, 202516 min

Intro to AI System Design

Pipelines vs agents, orchestration patterns, when orchestration is the wrong abstraction — the architectural decisions that determine whether your AI system is maintainable or a debugging nightmare.

March 19, 202516 min

From Toy to Production: What Breaks First

The systems that work perfectly in staging fail in production for reasons that are never about the model. Rate limits, inconsistent outputs at scale, state management, and graceful degradation — what actually breaks and how to engineer around it.

March 12, 202515 min

Cost & Latency: The First Production Pain

Token usage analysis, model selection, caching strategies, and the math that decides whether your AI feature is economically viable at scale.

March 5, 202516 min

Evaluating LLM Outputs (Your First Eval System)

Building the evaluation infrastructure that lets you know if your AI system is actually working — test datasets, scoring criteria, automation, and a continuous loop that catches regressions before users do.

February 26, 202517 min

Building Your First Real RAG System

Chunking strategies, top-k tuning, context window management, and the noise problem — how to move from a retrieval pipeline that sometimes works to one that works reliably.

February 19, 202516 min

Intro to RAG: Making AI Use Your Data

Embeddings, vector databases, the store-retrieve-inject pipeline, and the first real failure mode: irrelevant retrieval. What RAG is, why you need it, and how to build a version that actually works.

February 12, 202515 min

Why Your LLM App Hallucinates (and How to Reduce It)

Hallucination is not a bug in the model — it is an intrinsic property of probabilistic text generation. Here is what causes it, what your reliability layer cannot catch, and what you actually build to mitigate it.

February 5, 202516 min

Making LLM Output Reliable

JSON mode, schema enforcement, validation pipelines, and retry strategies — the complete reliability layer that sits between the model and your downstream systems.

January 29, 202517 min

Prompt Engineering That Actually Matters

The gap between a prompt that works sometimes and one that works reliably. Structured prompt design, system vs user roles, output schemas, and using examples — with concrete before/after comparisons on the same task.

January 22, 202516 min

First LLM App: From API Call to Working Feature

Building a real text summarizer API from scratch — handling latency, malformed responses, retries, and the gap between 'it works locally' and a feature you can actually ship.

January 15, 202518 min

What AI Engineering Actually Means (and What It Is Not)

A complete mental model for reasoning about AI systems in production — covering architecture, reliability, evaluation, and the layers most engineers skip when they call it done after the first API response.

© 2026 Abhijeet. All rights reserved.