Amit Kothari CEO of Tallyfy, AI advisor at Blue Sheen

LangChain vs LlamaIndex vs building it yourself

In brief

AI frameworks promise to simplify development, but they often add more complexity than they remove. LangChain has 90M+ monthly downloads yet introduces major overhead, LlamaIndex excels at data connection, while direct API implementation provides clarity and control. Here is when each approach actually makes sense for your team.

Amit Kothari Follow 10k+

Nov 8, 2025 · AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

LangChain vs LlamaIndex vs building it yourself

Quick answers

Why does this matter? Frameworks add abstraction layers - LangChain and LlamaIndex introduce major overhead that makes debugging harder and customization more painful than building directly with APIs

What should you do? Simple use cases favor direct implementation - For basic AI applications, direct API calls give you better performance, lower complexity, and clearer code paths than framework abstractions

What is the biggest risk? Frameworks excel at specific problems - LlamaIndex shines for data indexing workflows, LangChain works well for multi-step reasoning with durable state, but neither is a universal solution

Where do most people go wrong? Maintenance burden grows over time - Breaking changes, dependency bloat, and framework evolution create ongoing costs that outweigh initial productivity gains for many teams

The question every team building AI applications hits eventually: LangChain, LlamaIndex, or just call the API directly?

Sounds technical. It isn’t. It’s a question about what kind of problems you want to spend the next six months debugging.

Pick wrong and you’ll spend those months fighting abstraction layers instead of shipping features. Building reliable AI agents requires understanding these tradeoffs early. This pattern plays out constantly. Teams start with a framework because it promises fast movement. Six months later, they’re reading LangChain source code at 11pm trying to understand why their agent keeps producing garbage output.

The stakes are real. Harrison Chase’s LangChain now has 90M+ monthly downloads and runs in production at Uber, JP Morgan, and BlackRock. LlamaIndex has grown into document agents, smart spreadsheet processing, and enterprise document pipelines. These aren’t toys.

But popular isn’t the same as right for your situation.

The abstraction trap

Frameworks sell you on the first 20 minutes. LangChain’s documentation shows a working chatbot in five lines of code. LlamaIndex promises to connect LLMs to your data with minimal setup. Both deliver on that promise, for the simple case.

The crack appears around week three.

Your requirements hit something the framework didn’t anticipate. Now you’re not writing application code. You’re reverse-engineering framework internals to change behavior that should be simple. This analysis of LangChain’s complexity described it plainly: the framework becomes a source of painful friction rather than productivity once requirements get complex. You end up understanding LangChain better than your own application.

Count the abstraction layers in LangChain: LLM calls, prompts, memory, chains, agents. That’s five layers between you and the model. LlamaIndex is narrower in scope, focused on data connection and retrieval. Still has layers. Still has quirks.

Turns out, developers who abandoned frameworks found something that surprised me: their simpler direct implementations outperformed the framework versions in both quality and reliability. Not marginally. Measurably.

The reason is almost embarrassingly simple. Every abstraction layer adds complexity. You debug the framework, not your application. You learn LangChain’s quirks instead of learning how LLMs actually work.

What these frameworks actually solve

I want to be fair here, because frameworks aren’t inherently bad. They solve real problems. Just not always the ones you think you have.

Jerry Liu’s LlamaIndex does one thing well: connecting LLMs to your data. Building a system that searches documents, creates embeddings, and retrieves context for AI responses? LlamaIndex handles this solidly. The high-level API lets you prototype fast. The indexing and retrieval modules are well-built.

They’ve also expanded aggressively. LlamaParse v2 overhauled document parsing with up to 50% cost reduction at comparable accuracy. They’ve added LlamaAgents for one-click document agent deployment, LlamaSheets for messy spreadsheet processing, and enterprise document pipelines.

Where LlamaIndex struggles is anything beyond data-focused workflows. Complex multi-step reasoning with arbitrary logic? You’ll hit walls fast. Fine-grained control over agent behavior? You’ll fight opinionated abstractions the whole way.

LangChain goes the opposite direction. Maximum flexibility through modular components: agents, tools, memory, custom chains. The architecture has matured. LangGraph 1.0 now provides durable state persistence, production-tested at Uber, LinkedIn, and Klarna. Server restarts mid-workflow? It picks up exactly where it left off.

Does that mean LangChain is the automatic choice for complex work? Not quite. The flexibility still comes with real baggage. Dependency bloat is a persistent complaint: installing LangChain pulls in dozens of packages. Performance analysis comparing frameworks to direct API calls found measurably higher latency for simple requests. The overhead isn’t theoretical. For complex workflows, frameworks can actually perform better due to built-in optimizations, so the right call depends heavily on what you’re building.

If you want help shaping the actual implementation, Blue Sheen runs engagements like this.

When you should just build it yourself

Most AI applications don’t need a framework. They need three things: an API client, prompt management, and error handling.

That’s it.

Building without frameworks means you can create functional AI agents in surprisingly little code. No abstractions. No magic. Just direct API calls you fully control and understand.

The benefits compound. You know exactly what every line does. Debugging means reading your code, not framework source. Changes take minutes instead of hours. Your team learns how LLMs actually work instead of learning framework quirks that become irrelevant when you switch tools.

Direct implementation works best when requirements are clear and relatively contained. Need a chatbot with conversation context? Straightforward with the OpenAI API. Want document search? RAG implementations without frameworks use ChromaDB and direct API calls effectively.

The effort difference is smaller than you’d expect. Developers switching from LangChain report their custom implementations took roughly the same development time as properly learning the framework. But ongoing maintenance was dramatically simpler.

Skip the framework if you’re building something straightforward. Use the API directly. Write clean functions. You’ll ship faster and understand more.

Hidden costs that show up after launch

Most teams don’t see the maintenance problem coming. That’s where frameworks really extract their price.

Breaking changes are brutal. LangChain had frequent breaking changes throughout its development as it evolved fast. Code that worked last month breaks after an update. You’re stuck: stay on old versions with security risks, or spend cycles adapting.

LangChain and LangGraph hit 1.0 in October 2025, coinciding with a major Series B led by IVP. They now promise no breaking changes until 2.0. That stability took years to arrive. Early adopters paid for it in constant refactoring.

The reliability numbers should give you pause. Error rates compound across a chain: 95% reliability per step yields only 36% success over 20 steps. Which is nuts, when you think about it. Production demands 99.9%+ reliability, yet even complex agent implementations struggle to hit that bar. Every abstraction layer introduces more places for things to break. Microsoft’s analysis of agentic complexity put it clearly: frameworks need careful consideration for cognitive load, security concerns, latency, and ongoing maintenance.

The observability story does favor frameworks. 89% of teams have implemented observability for their agents. LangSmith provides tracing, evaluation, and cost tracking out of the box. Building from scratch means building or integrating this yourself. Doable with tools like Langfuse, but it’s not free work.

The cancellation rate for agentic projects is striking: a large share are expected to be scrapped over the next few years as unanticipated complexity and cost catch up with them. Adding framework dependencies increases that risk. Direct API implementations integrate more cleanly into existing systems, which matters when you’re trying to unwind a decision that didn’t work out.

“The early versions were fragile, poorly documented, abstractions shifted frequently, and it felt too premature to use in prod.” — Clara Chong, AI engineer building multi-agent features, Towards Data Science

How to actually choose

Decision tree mapping project requirement from simple chatbot to durable agent to recommended framework

Start with complexity assessment. Simple chatbot or single-purpose tool? Build directly. Data-heavy retrieval system? Consider LlamaIndex. Multi-step reasoning with durable state requirements? LangGraph is strong here: LinkedIn, Uber, and Replit run it in production for complex stateful workflows. Quick prototype with role-based agents? CrewAI is built for fast role-based prototypes, though teams often hit walls when requirements outgrow its opinionated design. Anything requiring heavy customization? Build directly.

Will the best framework always win? No. Team skills matter more than most people acknowledge. A team comfortable with abstractions can make frameworks work well. A team that prefers understanding fundamentals will fight them constantly. Small teams moving fast often find direct implementation is actually faster once you account for the learning curve on both sides.

The framework space has also consolidated. Beyond LangChain and LlamaIndex, OpenAI’s Agents SDK takes a minimalist approach with no graphs or state machines, supporting Python and TypeScript. Microsoft merged AutoGen and Semantic Kernel into a unified Agent Framework that reached 1.0 general availability in April 2026 with built-in governance and multi-cloud support.

More options, not fewer decisions.

I probably lean too hard toward direct implementation for teams that need what frameworks provide. But for most mid-size companies starting out: build your first version with direct API calls. You’ll learn what you actually need. If you hit complexity that requires a framework, you’ll recognize it. And you’ll understand LLMs well enough to use the framework effectively instead of being confused by it.

Frameworks promise to handle complexity for you. They introduce their own complexity in the process.

Build what you need. Not what a framework wants you to build.

langchainllamaindexcustom-developmentai-frameworks

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Contact me More about me

View All Posts »

Claude is allowed in regulated finance, but it has no EU data residency

Two objections kill most regulated-finance AI conversations before they start. The first, that Anthropic does not permit Claude for regulated work, is false: Claude for Financial Services exists, banks run it, and the usage policy names finance high-risk, not forbidden. The second is real and almost nobody states it plainly: first-party Claude Enterprise has no EU data residency at all. There is no "eu" inference region and workspace storage is US-only. If you are FCA-regulated, that is the fact to design around, and the only EU route runs through a hyperscaler.

Your locked-down Claude sandbox is a holding pattern, not a destination

Giving everyone Claude inside an isolated VM, no sensitive data allowed, feels like the safe way to start. It is a fine way to start. The trouble is what happens when you leave people there: the leak it was built to stop walks out by copy-paste anyway, the friction recruits the shadow AI you were trying to prevent, and the value never compounds because nothing in an ephemeral box survives the session. A sandbox is a scaffold. Scaffolds come down.

An MCP server is unreviewed code with your file system in scope

Treat every MCP server as untrusted code that runs with the access your agent has, because that is what it is. Anthropic docs say the directory lists connectors but does not security-audit them. A registry of approved servers with nothing enforcing it is a memo. The control that binds is a managed allowlist matched by URL or command, never by name.

Your Claude Code deny rules are not a security boundary

Before you hand Claude Code to hundreds of people you add deny rules for .env and credentials and feel locked down. You are not. Those rules govern Claude own tools, not a Python one-liner that opens the same file, and the control that actually holds, the OS sandbox, reads your whole machine by default and fails open when it cannot start. The baseline worth setting is real. Its dangerous gaps are the defaults you never changed.

How to schedule Claude Code on your own machine

You want a Claude job to run every few hours on your Mac, not in the cloud. A cloud routine cannot do it, because it never touches your machine. Here are the local options that can, why launchd beats cron for this, and a working LaunchAgent that pulls every one of my repos on a schedule.

Claude Code loop is for the work you watch

Claude Code /loop reruns a prompt on an interval inside your session. It is perfect for babysitting a deploy or a test run, and wrong for anything that has to keep going while you are away. It also exposes a durable flag that, on version 2.1.185, quietly writes nothing to disk. Here is how to use it well.