Deploying Your StudyBuddy Agent — Arjunagi A. Rehman

In Part 2, we used tool calls to equip our Study Buddy agent with real capabilities. Now we'll deploy it so it's accessible to the world through a REST API.

There are two main deployment approaches for ADK agents:

Google ADK-Supported — AI Engine, Cloud Run, GKE. These are tightly integrated with Google Cloud and handle scaling automatically.
Custom FastAPI Deployment — Build your own REST API layer using FastAPI, giving you full control over routing, middleware, and infrastructure.

We'll go with the Custom FastAPI Deployment approach. It's more flexible, vendor-agnostic, and teaches you the fundamentals that apply regardless of where you host your agent.

API Architecture Overview

The architecture is straightforward: a FastAPI App receives HTTP requests, passes them to a Runner, which coordinates the StudyBuddy Agent with Session Management and Event Handling.

text Architecture Diagram

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   FastAPI App   │───▶│     Runner      │───▶│   StudyBuddy    │
│                 │    │                 │    │     Agent       │
└────────┬────────┘    └────────┬────────┘    └─────────────────┘
│ Session Mgmt    │    │ Event Handling  │
│ Request/Response│    │ Message Routing │
└─────────────────┘    └─────────────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│        InMemorySessionService           │
│  ┌──────────────┬──────────────────┐    │
│  │   user_id    │    Sessions      │    │
│  │  "student"   │ [session1, ...]  │    │
│  └──────────────┴──────────────────┘    │
└─────────────────────────────────────────┘

The Runner is the orchestration layer between your API and your agent. Here's how to set it up:

python src/api/api_server.py

from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService

runner = Runner(
    app_name="School Agents API",
    agent=root_agent,
    session_service=InMemorySessionService()
)

The Runner handles four critical responsibilities:

Message Routing — Directs incoming user messages to the correct agent.
Event Handling — Processes the stream of events the agent produces (text responses, tool calls, errors).
Session Coordination — Maintains conversation context so the agent remembers what was said before.
Error Management — Catches and handles failures gracefully without crashing the API.

Session Management

Session management is crucial for maintaining conversation context. Without it, every request would be treated as a brand-new conversation — the agent would forget everything between messages.

Here's the request model that supports both new and continuing conversations:

python src/api/api_server.py

class QueryRequest(BaseModel):
    query: str                           # The student's question
    session_id: Optional[str] = None     # For continuing conversations

The session logic is simple: if no session_id is provided, create a new session. If one is provided, retrieve the existing session to continue the conversation:

python Session Logic

# If no session_id provided, create a new session
if not request.session_id:
    session = await session_service.create_session(
        app_name="School Agents API",
        user_id="student"
    )
    session_id = session.id
else:
    # Retrieve existing session to continue conversation
    session_id = request.session_id

Session Flow Visualization

text Session Flow

First Request (no session_id):

┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
│   Student    │───▶│   Create New     │───▶│   Return     │
│  "Hello!"    │    │    Session       │    │  session_id  │
└──────────────┘    └──────────────────┘    └──────────────┘

Follow-up Request (with session_id):

┌──────────────┐    ┌──────────────────┐    ┌──────────────┐
│   Student    │───▶│  Find Existing   │───▶│   Continue   │
│"What's 2+2?" │    │    Session       │    │ Conversation │
└──────────────┘    └──────────────────┘    └──────────────┘

The InMemorySessionService provides three key methods:

create_session — Initializes a new conversation thread with a unique ID.
get_session — Retrieves an existing conversation by its session ID.
list_sessions — Lists all active sessions (useful for debugging).

Important: Since sessions are stored in memory, they are lost when the server restarts. For production, consider using a persistent session store backed by a database.

User ID vs Session ID

These two concepts are often confused, but they serve very different purposes:

User ID — Identifies who is talking. It remains consistent across all interactions for a given user (e.g., user_id = "student"). Think of it as the person's identity.
Session ID — Identifies a specific conversation thread. Each new conversation gets a unique, auto-generated UUID. Think of it as a chat window.

The relationship is one-to-many: a single User ID can have many Session IDs. One student might open multiple study sessions throughout the day — each is a separate conversation with its own context, but they all belong to the same user.

Complete API Implementation

Here's the full FastAPI setup with CORS middleware and rate limiting:

python src/api/api_server.py

import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
from slowapi import Limiter
from slowapi.util import get_remote_address

from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from google.genai import types

from src.agents.studdy_buddy.agent import root_agent

# Initialize services
session_service = InMemorySessionService()
runner = Runner(
    app_name="School Agents API",
    agent=root_agent,
    session_service=session_service
)

# Rate limiting
limiter = Limiter(key_func=get_remote_address)

# FastAPI app
app = FastAPI(title="School Agents API", version="1.0.0")
app.state.limiter = limiter

# CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Now the query endpoint — this is where messages are processed:

python src/api/api_server.py

class QueryRequest(BaseModel):
    query: str
    session_id: Optional[str] = None

@app.post("/query")
@limiter.limit("50/day")
@limiter.limit("10/minute")
@limiter.limit("100/hour")
@limiter.limit("5/second")
async def query_agent(request: Request, body: QueryRequest):
    # Session management
    if not body.session_id:
        session = await session_service.create_session(
            app_name="School Agents API",
            user_id="student"
        )
        session_id = session.id
    else:
        session_id = body.session_id

    # Create message content
    content = types.Content(
        role="user",
        parts=[types.Part.from_text(text=body.query)]
    )

    # Process through runner
    final_response = ""
    async for event in runner.run_async(
        user_id="student",
        session_id=session_id,
        new_message=content
    ):
        if event.is_final_response():
            for part in event.content.parts:
                if part.text:
                    final_response += part.text

    return {
        "response": final_response,
        "session_id": session_id
    }

The rate limiting configuration protects your API from abuse:

50/day — Maximum 50 requests per IP per day.
10/minute — Burst protection: no more than 10 requests per minute.
100/hour — Hourly cap as a safety net.
5/second — Prevents rapid-fire automated requests.

Running the API

There are several ways to start your API server:

bash Terminal

# Option 1: Run directly
python src/api/api_server.py

# Option 2: Using uvicorn with reload (development)
uvicorn src.api.api_server:app --host 0.0.0.0 --port 8080 --reload

# Option 3: Module-style import
uvicorn src.api:app --host 0.0.0.0 --port 8080 --reload

Once running, test your API with curl:

bash Testing with curl

# First request (creates new session)
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain photosynthesis"}'

# Follow-up with session_id (continues conversation)
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Can you go deeper into the light reactions?", "session_id": "your-session-id-here"}'

Building the Chat Interface

For the frontend, we're using vanilla HTML, CSS, and JavaScript instead of React or any framework. Here's why:

Zero build process — No webpack, no bundler, no npm install. Just files.
Faster load times — No framework overhead means the chat loads instantly.
Direct deployment — The HTML can be served straight from FastAPI as a template.
Easier debugging — What you see in the source is what runs in the browser.

Frontend Architecture

text Frontend Architecture

┌─────────────────────────┐
│     HTML/CSS/JS         │ ← Modern chat interface
│    Chat Interface       │
└────────────┬────────────┘
             │
         ▼ JavaScript fetch()

┌─────────────────────────┐
│     FastAPI Server      │ ← Serves HTML + handles API
│                         │
├─────────────────────────┤
│  GET /         │ ← Returns chat interface
│  POST /query   │ ← Processes messages
└─────────────────────────┘

SSR vs CSR

We're using Server-Side Rendering (SSR) here — FastAPI renders the HTML on the server and sends a complete page to the browser. This is different from Client-Side Rendering (CSR) where the browser downloads JavaScript that then builds the page. SSR gives us faster initial load and better SEO with zero JavaScript framework needed.

python src/api/api_server.py

from fastapi.responses import HTMLResponse
from pathlib import Path

@app.get("/", response_class=HTMLResponse)
async def serve_chat_interface():
    html_path = Path(__file__).parent / "templates" / "chat.html"
    return HTMLResponse(content=html_path.read_text())

Key frontend features:

Session management integration — Automatically stores and reuses session_id for multi-turn conversations.
Progressive enhancement — Works without JavaScript for basic display, enhanced with JS for interactivity.
Zero dependencies — No CDN links, no npm packages, just pure browser APIs.
Simple state management — Session ID stored in a variable, messages appended to the DOM.

Complete API Endpoints

Here's the full set of endpoints your API exposes:

GET / — Serves the chat interface (SSR HTML page).
POST /query — Main interaction endpoint. Accepts a query and optional session_id, returns the agent's response.
GET /docs — Auto-generated Swagger/OpenAPI documentation (built into FastAPI).
GET /health — Health check endpoint for monitoring and load balancers.
GET /info — Returns API metadata like version, agent name, and available endpoints.

Environment Setup

Create a .env file in your project root with the required configuration:

bash .env

GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=your-google-api-key-here

Setting GOOGLE_GENAI_USE_VERTEXAI=FALSE tells the ADK to use the standard Google AI API directly instead of going through Vertex AI. This simplifies local development — you only need a Google API key rather than a full GCP project setup.

Docker Containerization

Containerizing your agent ensures it runs the same way everywhere — your laptop, a CI server, or production cloud infrastructure.

dockerfile Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install uv for fast dependency management
RUN pip install uv

# Copy dependency files first (better caching)
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --frozen --no-dev

# Copy application code
COPY . .

# Create non-root user for security
RUN useradd --create-home appuser
USER appuser

EXPOSE 8080

CMD ["uv", "run", "uvicorn", "src.api.api_server:app", "--host", "0.0.0.0", "--port", "8080"]

Keep your image lean with a .dockerignore:

bash .dockerignore

.git
.env
__pycache__
*.pyc
.venv
node_modules
.pytest_cache

For multi-container setups, use Docker Compose:

yaml docker-compose.yml

version: "3.8"
services:
  study-buddy:
    build: .
    ports:
      - "8080:8080"
    env_file:
      - .env
    restart: unless-stopped

Build and run:

bash Terminal

# Build and run with Docker
docker build -t study-buddy-api .
docker run -p 8080:8080 --env-file .env study-buddy-api

# Or use Docker Compose
docker compose up --build

Deployment Options

Once containerized, you have many options for hosting your agent:

Cloud Platforms

Google Cloud Run — Serverless containers. Deploy with a single command:

bash Google Cloud Run

gcloud run deploy study-buddy \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

AWS — Push to ECR, then deploy via ECS, EC2, or EKS depending on your scale and complexity needs.
Azure — Container Apps or Azure Kubernetes Service for managed orchestration.

Self-Hosted

Dokploy — An open-source alternative to Vercel/Netlify for self-hosted deployments.
Docker Swarm — Native Docker clustering for small-to-medium deployments.
Kubernetes — Full orchestration for complex, multi-service architectures.
VPS (DigitalOcean, Hetzner, etc.) — Simple and cost-effective. Just docker compose up on your server.

Dokploy: Open Source Self-Hosted Deployment

Dokploy is a powerful, open-source alternative to Vercel/Netlify for self-hosted deployments. It's perfect for developers who want the convenience of a PaaS with the control of self-hosting.

Official Website: dokploy.com
GitHub Repository: github.com/Dokploy/dokploy

Complete Dokploy Tutorial

For a comprehensive step-by-step guide on setting up and deploying with Dokploy, watch this detailed tutorial:

This video covers everything from server setup to deployment — perfect for getting your StudyBuddy app live quickly and cost-effectively.

Real-World Example

Our StudyBuddy demo is deployed using Dokploy:

Live Demo: study_buddy.chotuai.in
Source Code: github.com/arjunagi-a-rehman/school-agents

What We Covered

Let's recap everything we built across this series so far:

Custom Agent — A personalized Study Buddy agent with identity and behavior.
Function Calling — Google Search and custom tools for real capabilities.
REST API — FastAPI-based deployment with session management and event handling.
Modern UI — Server-rendered chat interface with zero dependencies.
Rate Limiting — Protection against abuse with multi-tier limits.
Containerization — Docker and Docker Compose for portable deployments.
Live Deployment — From local dev to production-ready infrastructure.

Coming Next

Part 4: RAG Implementation

Give your agent access to custom knowledge bases using Retrieval-Augmented Generation — vector stores, document chunking, and semantic search.

Deploying Your StudyBuddy Agent: From Code to Production

API Architecture Overview

Session Management

Session Flow Visualization

User ID vs Session ID

Complete API Implementation

Running the API

Building the Chat Interface

Frontend Architecture

SSR vs CSR

Complete API Endpoints

Environment Setup

Docker Containerization

Deployment Options

Cloud Platforms

Self-Hosted

Dokploy: Open Source Self-Hosted Deployment

Complete Dokploy Tutorial

Real-World Example

What We Covered

Part 4: RAG Implementation

Comments

API Architecture Overview

Session Management

Session Flow Visualization

User ID vs Session ID

Complete API Implementation

Running the API

Building the Chat Interface

Frontend Architecture

SSR vs CSR

Complete API Endpoints

Environment Setup

Docker Containerization

Deployment Options

Cloud Platforms

Self-Hosted

Dokploy: Open Source Self-Hosted Deployment

Complete Dokploy Tutorial

Real-World Example

What We Covered

Part 4: RAG Implementation

Comments

Get new posts in your inbox