Day 1 - Module 1: Basic LLM Interaction & Tool Calling

Objective: Understand how to interact with Large Language Models (LLMs) programmatically, handle streaming responses, and enable models to use external tools.

Source Code: src/01-basics/

Introduction

At the heart of AI agents lies the ability to interact with powerful Large Language Models (LLMs). These models can understand and generate human-like text, answer questions, translate languages, and much more. In this first module, we will explore the fundamental ways to communicate with an LLM using the OpenAI API pattern (as used by the GitHub Models endpoint in this repository) and introduce the concept of "tool calling," which allows the LLM to interact with external systems or functions.

We will cover:

Basic API Calls: Sending a prompt to the model and receiving a complete response.
Streaming Responses: Receiving the model's response incrementally as it's generated.
Tool Calling: Defining functions (tools) that the model can request to use to gather information or perform actions.

Core Concept: LLM Interaction

Understanding how to structure API calls, manage conversation history (messages), and interpret responses is foundational for building any LLM-powered application.

Setup Review

Prerequisites

Before running the examples, ensure you have:

Cloned the agentic-playground repository.
Installed the required Python packages (pip install -r requirements.txt).
Created a .env file in the repository root with your GITHUB_TOKEN (a Personal Access Token with no specific permissions needed for GitHub Models inference).

# .env file content
GITHUB_TOKEN="your_github_pat_here"

1. Hello World: Basic API Interaction

File: src/01-basics/hello-world.py

This script demonstrates the simplest form of interaction: sending a message to the LLM and getting a single, complete response back.

Code Breakdown:

Import necessary libraries: os for environment variables, OpenAI for the client, load_dotenv to load the .env file.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

Initialize the OpenAI Client:
- base_url: Points to the GitHub Models inference endpoint.
- api_key: Reads the GITHUB_TOKEN from your environment variables (loaded from .env).

client = OpenAI(
    base_url="https://models.inference.ai.azure.com",
    api_key=os.environ["GITHUB_TOKEN"],
)

Define the Conversation:
- Messages are provided as a list of dictionaries, each with a role (system, user, or assistant) and content.
- The system message sets the context or instructions for the model (e.g., "antworte alles in französisch" - answer everything in French).
- The user message contains the user's query.

messages=[
    {
        "role": "system",
        "content": "antworte alles in französisch",
    },
    {
        "role": "user",
        "content": "What is the capital of France?",
    }
]

Call the Chat Completions API:
- client.chat.completions.create() sends the request.
- messages: The conversation history/prompt.
- model: Specifies the model to use (e.g., gpt-4o-mini).
- temperature, max_tokens, top_p: Control the creativity, length, and sampling strategy of the response.

response = client.chat.completions.create(
    messages=messages,
    model="gpt-4o-mini",
    temperature=1,
    max_tokens=4096,
    top_p=1
)

Print the Response:
- The model's reply is found within the response object.

print(response.choices[0].message.content)

To Run:

cd /home/ubuntu/agentic-playground/src/01-basics
python hello-world.py

You should see the answer to "What is the capital of France?" printed in French.

2. Streaming Output

File: src/01-basics/streaming-output.py

Waiting for the entire response can take time, especially for longer answers. Streaming allows you to receive the response piece by piece, improving the perceived responsiveness of the interaction.

Code Breakdown:

Client Setup: Similar to hello-world.py.
API Call with Streaming:
- The key difference is stream=True.
- stream_options={'include_usage': True} optionally requests token usage information at the end.

response = client.chat.completions.create(
    messages=[
        # ... (system and user messages) ...
    ],
    model=model_name,
    stream=True,
    stream_options={'include_usage': True}
)

Processing the Stream:
- The response object is now an iterator.
- We loop through each update in the stream.
- Each update can contain a small piece of the response text (delta.content). We print these pieces immediately.
- If usage information is included, it appears in a final update.

usage = None
for update in response:
    if update.choices and update.choices[0].delta:
        print(update.choices[0].delta.content or "", end="") # Print chunk without newline
    if update.usage:
        usage = update.usage

if usage:
    print("\n") # Add newline after full response
    for k, v in usage.model_dump().items():
        print(f"{k} = {v}")

To Run:

cd /home/ubuntu/agentic-playground/src/01-basics
python streaming-output.py

You will see the reasons for exercising appear on the console incrementally, followed by the token usage statistics.

3. Tool Calling: Extending LLM Capabilities

File: src/01-basics/tool-calling.py

LLMs are trained on vast datasets but lack real-time information and the ability to perform actions in the real world. Tool calling allows the LLM to request the execution of predefined functions (tools) to overcome these limitations.

Concept:

Define Tools: You describe available functions (like getting the current time, searching the web, etc.) to the LLM, including their names, descriptions, and expected parameters.
LLM Request: When the LLM determines it needs a tool to answer a user's query, it doesn't directly answer but instead outputs a special message indicating which tool to call and with what arguments.
Execute Tool: Your code receives this request, executes the corresponding function with the provided arguments.
Provide Result: You send the function's return value back to the LLM.
Final Response: The LLM uses the tool's result to formulate the final answer to the user.

Why Tool Calling is Powerful

Tool calling transforms LLMs from passive text generators into active agents capable of interacting with APIs, databases, or custom code, vastly expanding their potential applications.

Code Breakdown:

Import additional libraries: json for parsing arguments, pytz and datetime for the time function.
Define the Tool Function: A standard Python function (get_current_time) that takes a city name and returns the time. Note the docstring, which helps the LLM understand what the function does.

import pytz
from datetime import datetime

def get_current_time(city_name: str) -> str:
    """Returns the current time in a given city."""
    # ... (implementation using pytz) ...

Define the Tool Schema: A dictionary describing the function to the LLM.
- type: Always "function".
- function: Contains details:
  - name: Must match the Python function name.
  - description: Crucial for the LLM to know when to use the tool.
  - parameters: Describes the arguments (name, type, description, required).

tool={
    "type": "function",
    "function": {
        "name": "get_current_time",
        "description": """Returns information about the current time...""",
        "parameters": {
            "type": "object",
            "properties": {
                "city_name": {
                    "type": "string",
                    "description": "The name of the city...",
                }
            },
            "required": ["city_name"],
        },
    },
}

Designing Good Tool Descriptions

The description field in the tool schema is critical. It should clearly and concisely explain what the tool does and when it should be used. Use natural language that the LLM can easily understand.

Initial API Call with Tools:
- The tools parameter is added to the create call, listing the available tools.

response = client.chat.completions.create(
    messages=messages,
    tools=[tool], # Pass the tool definition
    model=model_name,
)

Handling Tool Call Response:
- Check if finish_reason is tool_calls.
- Append the model's request message to the history.
- Extract the tool_call information (ID, function name, arguments).
- Parse the JSON arguments.
- Crucially, call the actual Python function (locals()[tool_call.function.name](**function_args)).
- Append the tool's result back to the message history, using the tool_call_id and role: "tool".

if response.choices[0].finish_reason == "tool_calls":
    messages.append(response.choices[0].message) # Append assistant's request
    tool_call = response.choices[0].message.tool_calls[0]
    if tool_call.type == "function":
        function_args = json.loads(tool_call.function.arguments)
        callable_func = locals()[tool_call.function.name]
        function_return = callable_func(**function_args)
        messages.append( # Append tool result
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": tool_call.function.name,
                "content": function_return,
            }
        )

Second API Call: Call the model again with the updated message history (including the tool result).

response = client.chat.completions.create(
    messages=messages,
    tools=[tool],
    model=model_name,
)
print(f"Model response = {response.choices[0].message.content}")

To Run:

cd /home/ubuntu/agentic-playground/src/01-basics
# Install pytz if you haven't: pip install pytz
python tool-calling.py

You will see output indicating the function call (Calling function 'get_current_time'...), the function's return value, and finally the model's response incorporating the time information.