Understanding LLM-Based Agents - Beyond Simple Prompts and Chats

In a previous article, I discussed the concept of AI agents in general terms. Now, let's explore how LLM-based agents differ from simple prompting, chats, or automated workflows. While there's a lot of hype around "agents" and "agentic software," I believe there's genuine substance behind these terms. Let's cut through the confusion and explore a framework that I hope you'll find useful.

The Evolution: From Text Completion to Agents

1. Text Completion

Large language models (LLMs) are trained to predict the most probable next word and then the next and the next ... , creating what we call a "completion." It's important to note that a valid completion doesn't always mean answering a question. For instance, "What is the capital of England?" could be followed by "What is the capital of France?" if the model is generating a list of questions.

2. Question Answering

LLMs can be further trained to provide completions that actually answer questions. At this stage, each query is treated independently. Consequently it would be difficult to ask clarifying questions without restating all questions from the beginning.

3. Chat

The next level involves training models (and supporting them with appropriate systems) to engage in conversations. These systems keep track of the current dialogue, allowing for context-aware responses. For example, if you ask, "What is the capital of France?" followed by "What is the main airport?", the system would understand you are still talking about France.

4. Tools

LLMs are typically limited by their training data's scope and age. To address this, we can add "tools" to the system. A tool is essentially a set of special instructions that allows the LLM to signal when it needs more information. For instance, if asked about current weather, the LLM can request a function call to a weather tool, which then provides up-to-date information.

Useful tools might include web search, document search, or queries to proprietary systems like inventory or personnel records.

5. Retrieval Augmented Generation (RAG)

Another approach to deal with the limits of the training data is run a search on a more recent and relevant dataset and include the document fragments that are likely to be relevant to the question as part of the prompt that is submitted to the LLM.

This is different than Tool use because it is done before the LLM gets the prompt. So for example if you wanted to answer questions from your employee manual you could do a search to find relevant passages and include them as part of the prompt. The LLM would then use that information and restructure it to form a coherent answer.

6. Prompt Chains

We can create prompt chains by automatically feeding the output of one LLM into another. This approach is particularly useful for tasks like analysis or summarization, as breaking complex tasks into steps often improves the final output quality.

This is similar to a conversation but can be done by a program to automate common and valuable workflows. And of course any step could include Tools or RAG techniques.

7. Agents / Agentic Software

The most advanced stage involves systems that can make decisions as part of their workflow. For example, an LLM can evaluate its own output quality and decide whether to continue or try again. This introduces loops and branches into the process, creating a more dynamic system.

This image shows the simplest possible agent where the output is controlled by the feedback module which has the responsibility of deciding when the output is good enough. Care must be taken to limit the number of times the system is allowed to loop or you may end up making infinite calls to the worker module.

A simple distinction: I'd call a system with hard-coded logic "agentic software," while one where LLMs make the decisions would be an "agent."

And of course each module can contain a set of prompts/llms and control logic letting you build arbitrarily complex workflows. To explore this further review ReAct: Synergizing Reasoning and Acting in Language Models where a framework is introduce to encourage the LLM/Agent to reason about the input and then act on it in separate steps.

Conclusion

As we've seen, there's much more you can do with LLMs than just engage in simple chat interactions. By creating prompt chains, you can automate complex workflows. Taking it a step further, you can enhance these chains with branches and loops to create true agents or agentic software.

This progression from basic text completion to sophisticated agents showcases the versatility and potential of LLM technology. As we continue to explore and develop these systems, we're likely to uncover even more powerful applications and capabilities.

Want to get notified of new articles and insights?