Building AI Agents for automation

Foreword

In the rapidly evolving landscape of artificial intelligence, developers are increasingly focused on creating innovative and automated solutions powered by AI, with large language models (LLMs) at the core. However, working directly with LLMs can present challenges, from managing complexity to integrating external tools and services effectively. This is where AI agents come into play.

AI agents enhance the capabilities of LLMs by introducing tool-calling mechanisms, streamlining connectivity with APIs, and automating workflows. They simplify the development process, enabling applications to handle complex tasks, deliver intelligent responses, and adapt dynamically to user needs. By using AI agents as a bridge, developers can fully harness the potential of LLMs and build powerful, flexible, and context-aware applications that redefine user experiences.

Introduction

In this article, we’ll explore two ways to understand the concept of building apps with AI agents: we can either dive into detailed theory and examples, or we can learn by building a real-world application. Personally, I’ve always been a fan of learning by building. It has boosted my understanding of technology and given me a clearer picture of potential obstacles, along with strategies for tackling them.

So, let’s start by discussing some brief theoretical concepts behind AI agents and LLMs, and then we’ll dive into building a small app to apply our learning.

What is an AI Agent?

The term "AI agent" might have different meanings depending on who you ask, but here's how I understand it:

An AI agent is a system that can act autonomously, performing tasks and making decisions without needing explicit instructions for every action. This autonomy sets AI agents apart from traditional software programs, which generally require step-by-step instructions.

AI agents are dynamic and capable of adapting to new inputs, learning from interactions, and solving problems in real-time. They use reasoning, data processing, and integration with external tools (such as APIs) to perform actions that weren’t explicitly programmed. These tools could be anything from accessing information on Reddit to making API calls to services like dadJokes.com, which an agent might use to fetch a joke for you.

What is an LLM?

In contrast to AI agents, large language models (LLMs) are passive by nature. An LLM is a machine learning model trained on vast amounts of text data to understand and generate human-like language. Models like GPT (Generative Pretrained Transformer) use deep learning techniques to predict and generate coherent, contextually relevant responses based on the input they receive. LLMs excel in tasks such as language translation, summarization, question answering, and content generation, as they process patterns, context, and nuances in text data.

Key Difference Between AI Agents and LLMs

While both AI agents and LLMs are integral to modern AI applications, they serve distinct purposes:

AI Agents: These systems are designed to act autonomously, making decisions and interacting with external systems to complete tasks. They can take actions like calling APIs, fetching data, and solving problems dynamically.
LLMs: Large language models, on the other hand, are primarily designed to process and generate text. They respond to prompts but lack the ability to take independent actions or interact with external systems unless integrated with additional programming.

Example Scenario: Retrieving Weather Information

LLM Example (Without AI Agent): Here’s an example where an LLM generates a response based on a user’s prompt, but it can’t fetch real-time data from an external source like a weather API.

javascriptCopy codeconst generateResponse = (prompt) => {
  // LLM responds to the prompt but doesn't interact with external systems.
  if (prompt === "What's the weather today?") {
    return "I cannot provide real-time weather data. Please check your weather app.";
  }
};

console.log(generateResponse("What's the weather today?"));

In this case, the LLM simply responds with a static answer, as it cannot fetch real-time weather data from an API.

AI Agent Example (With LLM): An AI agent, however, can make decisions, call an external API, and integrate with the LLM to provide a more dynamic, proactive solution.

javascriptCopy codeconst axios = require('axios'); // For API calls

const getWeather = async (city) => {
  // AI agent can fetch real-time data via an API and process the response.
  const response = await axios.get(`https://api.weatherapi.com/v1/current.json?key=YOUR_API_KEY&q=${city}`);
  return `The weather in ${city} is ${response.data.current.temp_c}°C with ${response.data.current.condition.text}.`;
};

const aiAgent = async (prompt) => {
  if (prompt.toLowerCase().includes("weather")) {
    const city = "London";  // Hardcoded city for simplicity, could be dynamic
    return await getWeather(city);
  } else {
    return "I'm not sure how to help with that request.";
  }
};

aiAgent("What's the weather today in London?").then(console.log);

In this example, the AI agent:

Recognizes the user's request about the weather.
Makes an API call to a weather service.
Returns a dynamic response based on real-time data.

Another Example: Writing and Sending an Email

LLM Example (Without AI Agent): An LLM can help generate an email, but it cannot take action to send it to the recipient. Here’s how it would behave without an AI agent:

javascriptCopy codeconst generateEmail = () => {
  return "Here is your email content...";
};

console.log(generateEmail());

The LLM can generate the email content, but it can’t send it autonomously.

AI Agent Example (With LLM): Now, let’s introduce an AI agent. The agent will not only generate the email content but also send it to the recipient:

javascriptCopy codeconst sendEmail = (emailContent, recipient) => {
  // Logic to send the email (e.g., using an SMTP service or API)
  console.log(`Sending email to ${recipient}: ${emailContent}`);
};

const aiAgent = async () => {
  const emailContent = "Here is your email content...";
  const recipient = "example@example.com";

  sendEmail(emailContent, recipient);
};

aiAgent();

Now, the AI agent has the ability to take action beyond generating content—it can also send the email.

Key Takeaways

LLMs: These are powerful for generating text and responding to prompts. However, they don’t inherently have the capability to interact with external systems or take actions on their own.
AI Agents: These autonomous systems can make decisions, interact with external APIs, and take actions dynamically. They provide a bridge to integrate LLMs with external systems, enabling more sophisticated and proactive solutions.
AI Agents + LLMs: The combination of AI agents and LLMs allows you to create applications that not only generate text but also interact with the world in real time, fetch data, and take actions like sending emails.

Limitations and Challenges

AI Hallucinations (Inaccurate or Fabricated Information):
- Problem: AI agents sometimes generate plausible-sounding but inaccurate or fabricated responses.
- Example: "Albert Einstein was the first person to walk on the moon."
- Mitigation: Cross-check responses with reliable sources or databases to ensure accuracy.
Memory Limitations (Lack of Long-Term Context):
- Problem: LLMs don’t have long-term memory and forget past interactions once the session ends.
- Example: "I don’t remember past conversations."
- Mitigation: Store session data on the backend to maintain context across interactions.

Let’s build a small real world app using AI agent and LLM model like gpt:

We can understand this concept finally with a help of some practical coding example using real LLM models and creating an AI agent to work with them.

Let’s create a file which would serve as an entry point for our program it will take the user input and transfer it to the agent to make decisions and respond back.

We are also going to pass a set of tools to the agent which can help the Agent to make decisions with more preciseness. Like for example Reddit website is a good tool to pass the agent so if the user prompts asks the who won the NBA title on a specific date then Reddit certainly has the answer.

The beauty of AI agents is that no where the user would mention that fetch response from Reddit AI agent would make that decision him self through interacting with the LLMs which would translate for the AI agent which right tool should be used for this purpose.

And ofcourse for this particular reason we would have to provide the tools to the LLM as well.

Let’s first build an index.js file which will work as an entry point for our user input

import 'dotenv/config'
import { runAgent } from 'agent'
import { tools } from 'tools'

const userMessage = process.argv[2]

if (!userMessage) {
  console.error('Please provide a message')
  process.exit(1)
}

const messages = await runAgent({
  userMessage,
  tools,
})

Now let’s also initiate a LLM with which our agent can interact.

import { openai } from 'ai'

export const runLLM = async ({
  model = 'gpt-4o-mini',
  message,
  temperature = 0.1,
  tools,
}) => {

  const response = await openai.chat.completions.create({
    model,
    messages: [
      {
        role: 'user',
        content: message,
      },
    ],
    temperature,
    tools,
  })

  return response.choices[0].message
}

Let’s also create an agent which will handle our decision making


import { runLLM } from 'llm'

export const runAgent = async ({
  userMessage,
  tools = [],
}) => {

  const response = await runLLM({
      messages: history,
      tools,
    })

  }
}

Now, let’s stop here for a while. What’s happening there? A user is providing a message to the agent, and the agent is first transferring that message along with the tools to the LLM. But why pass this to the LLM? The answer would be that an AI agent is merely a wrapper over an LLM. The main brain behind the scenes is the LLM, and an AI agent is merely an executor or decision maker for the suggestions and responses given by the LLM.

Let’s also introduce here what we are using I will make a toolRunner file for this:

import  OpenAI from 'openai'
import { generateImage } from 'generateImage'
import { reddit } from 'reddit'
import { dadJoke } from 'dadJoke'

export const runTool = async (
  toolCall
  userMessage: string
) => {
  const input = {
    userMessage,
    toolArgs: JSON.parse(toolCall.function.arguments),
  }
  switch (toolCall.function.name) {
    case 'generate_image':
      const image = await generateImage(input)
      return image

    case 'dad_joke':
      return dadJoke(input)

    case 'reddit':
      return reddit(input)

    default:
      throw new Error(`Unknown tool: ${toolCall.function.name}`)
  }
}

As you can see in above our toolRunner file contains a set of tools and based LLMs prediction on which tool to call the AI agent uses this tool runner file to make those decisions.

Here’s a tools file which we pass from the main entry point to the agent and then agent decides to which LLM it should pass the message and the tools:

import { generateImageToolDefinition } from 'generateImage'
import { redditToolDefinition } from 'reddit'
import { dadJokeToolDefinition } from 'dadJoke'

export const tools = [
  generateImageToolDefinition,
  redditToolDefinition,
  dadJokeToolDefinition,
]

So as we have all our logic in place lets do some fun. Let’s give our AI agent some user prompt to see how it would react.

Prompt: “Give me a random meme image from a dadJoke”

So what would be the steps you think agent will follow ?

The agent would first collect the user input and set of tools available and pass it down to the LLM then the LLM would make prediction based on the user message on which suitable tool call to make and pass it back to the agent then the agent would simply react and make the call and decision based on the feedback provided by the LLM.

So that was our simple app which can help us understand the entire workflow and and journey behind and AI agent.

Automating Tasks with AI Agents and LLMs