← Posts

Breaking the Limits of LLMs: RAG, Tool Use, and Agentic AI

RAG, Tool Use, Agentic AI. Three approaches built to overcome the four limitations we discussed in the previous article. The LLM was a “prediction machine”; these three approaches transform it into a system that “knows and takes action.” Their shared philosophy fits in a single sentence: don’t predict, know.

1. RAG (Retrieval-Augmented Generation): Taking an Open-Book Exam

In the previous article, when we explained hallucination, we emphasized that the model is not an encyclopedia but a probability calculator. The model doesn’t “remember” knowledge; it generates the statistically most likely token. RAG addresses exactly this problem.

A standard LLM is like a student taking a closed-book exam. It answers everything from memory, and if its memory is wrong, it confidently gives the wrong answer. RAG turns this into an open-book exam: the model finds and reads relevant sources before answering, then bases its response on those sources.

How does it work? Your question reaches the system. The question is matched by a database with the document chunks that are semantically closest to the query. But this database doesn’t perform keyword searches like traditional SQL databases — these are vector databases, capable of querying by semantic similarity. When you say city, it also returns results containing town. The found chunks are added to the model’s context. The model is instructed to generate its answer based on this real data.

The difference is dramatic. In a study published in JMIR Cancer, a chatbot without RAG produced hallucinations at a rate of 40%, while in a RAG-powered system supported by reliable medical sources, this rate dropped to 0-6%. Moreover, there’s an additional benefit: while traditional chatbots respond to all questions (100%), the response rate of RAG-based chatbots ranges between 36% and 81%. In other words, when the system can’t find the information in its sources, it says “I don’t know” instead of making things up.

RAG also cleverly overcomes the context window limitation. Remember, in the previous article we mentioned the “needle in a haystack” test — the test that measures the model’s ability to find information hidden within a long context. RAG, instead of piling the entire haystack into the context window, finds just the needle and brings it to the model. It doesn’t enlarge the window; it uses it more intelligently.

Why it matters: RAG transforms the LLM from a system that “speaks from rote memory” to one that “speaks based on sources.” It reduces hallucination and uses the context window efficiently.

2. Tool Use: Don’t Predict, Calculate

When you ask an LLM “What is 47 × 83?”, the model isn’t actually performing multiplication. It predicts the most likely token based on similar patterns it saw in its training data. The result sometimes comes out correct (3901), and sometimes it doesn’t. Because it’s not a calculator — it’s a statistical language model.

Tool Use eliminates this blind spot. The model can say, “this is a math problem, let me call the calculator instead of guessing.” The result: exact computation instead of prediction.

When current information is needed, it decides to perform a web search instead of relying on memory and calls the web search tool. When data analysis is needed, it writes and executes code, then uses the result. When asked about exchange rates or weather, it pulls real-time data from an API. In every case, the same principle applies: instead of guessing what it doesn’t know, it asks the tool that does.

The impact isn’t limited to accuracy alone. In Meta’s Toolformer research published in 2023, a small model with 6.7 billion parameters using tools reached the same level as the tool-less 175 billion parameter GPT-3. Tool use created an impact equivalent to scaling the model by 26x.

Did you notice their shared philosophy with RAG? Both solve the same principle from different angles: don’t predict, know. RAG says “fetch from the source instead of fabricating knowledge,” Tool Use says “call the calculator instead of pretending to calculate — actually compute it.”

We previously discussed the thinking machine architecture that Cahit Arf drew in 1959. Arf’s concept of auxiliary memory — where he said “one should ask such-and-such person, one should look in such-and-such book” — is exactly what we now call RAG and Tool Use.

Why it matters: Tool Use transforms the LLM from a system that “tries to do everything itself” to one that “delegates the right task to the right tool.” It reduces hallucination, expands capabilities, and enables smaller models to compete with giant ones.

3. Agentic AI: Orchestration

RAG finds the right information. Tool Use performs the right action. But who coordinates these — who decides which tool to call and when?

Agentic AI is the orchestration layer that brings these capabilities together with planning, decision-making, and feedback loops. We covered this concept in detail in earlier articles in the series. We discussed the difference between AI Agent and Agentic AI, systems that interact with their environment and solve problems iteratively through feedback loops. The “give the goal, let it find the way” paradigm we saw in the OpenClaw example is also made possible by Agentic AI.

Rather than repeating here, let’s see the whole picture: RAG gives the model knowledge, Tool Use gives the model capability, and Agentic AI gives the model agency.

Why it matters: Agentic AI goes beyond a single LLM call, enabling iterative problem-solving in real-world multi-step complex tasks — such as detecting and fixing a bug or researching and writing a report.

From Prediction Machine to Knowledgeable System

Previously, we saw the limitations of LLMs: context window and hallucination. Now we’ve seen how these limitations are overcome. RAG says “fetch the knowledge,” Tool Use says “ask the right tool,” Agentic AI says “plan and manage.”

Anyone who understands these three approaches will begin to see artificial intelligence not as a chat box, but as an engineering system.

These three approaches share a common trait: they all support the model from the outside without changing it. RAG brings knowledge from the outside, Tool Use calls tools from the outside, Agentic AI orchestrates from the outside. Of course, these approaches mean slower response times and increased token costs; but where accuracy and reliability are critical, this cost is worth paying. So when do methods that update the model itself — fine-tuning, RLHF/DPO, and LoRA — come into play? We’ll look at that in the next article.

Which of these three approaches do you use most in your daily workflow, or which one are you most eager to try?

Share