In Part 2, we used tool calls to equip our Study Buddy agent with real capabilities. Now we'll deploy it so it's accessible to the world through a REST API.
There are two main deployment approaches for ADK agents:
- Google ADK-Supported — AI Engine, Cloud Run, GKE. These are tightly integrated with Google Cloud and handle scaling automatically.
- Custom FastAPI Deployment — Build your own REST API layer using FastAPI, giving you full control over routing, middleware, and infrastructure.
We'll go with the Custom FastAPI Deployment approach. It's more flexible, vendor-agnostic, and teaches you the fundamentals that apply regardless of where you host your agent.
API Architecture Overview
The architecture is straightforward: a FastAPI App receives HTTP requests, passes them to a Runner, which coordinates the StudyBuddy Agent with Session Management and Event Handling.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ FastAPI App │───▶│ Runner │───▶│ StudyBuddy │
│ │ │ │ │ Agent │
└────────┬────────┘ └────────┬────────┘ └─────────────────┘
│ Session Mgmt │ │ Event Handling │
│ Request/Response│ │ Message Routing │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ InMemorySessionService │
│ ┌──────────────┬──────────────────┐ │
│ │ user_id │ Sessions │ │
│ │ "student" │ [session1, ...] │ │
│ └──────────────┴──────────────────┘ │
└─────────────────────────────────────────┘ The Runner is the orchestration layer between your API and your agent. Here's how to set it up:
from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService
runner = Runner(
app_name="School Agents API",
agent=root_agent,
session_service=InMemorySessionService()
) The Runner handles four critical responsibilities:
- Message Routing — Directs incoming user messages to the correct agent.
- Event Handling — Processes the stream of events the agent produces (text responses, tool calls, errors).
- Session Coordination — Maintains conversation context so the agent remembers what was said before.
- Error Management — Catches and handles failures gracefully without crashing the API.
Session Management
Session management is crucial for maintaining conversation context. Without it, every request would be treated as a brand-new conversation — the agent would forget everything between messages.
Here's the request model that supports both new and continuing conversations:
class QueryRequest(BaseModel):
query: str # The student's question
session_id: Optional[str] = None # For continuing conversations The session logic is simple: if no session_id is provided, create a new session. If one is provided, retrieve the existing session to continue the conversation:
# If no session_id provided, create a new session
if not request.session_id:
session = await session_service.create_session(
app_name="School Agents API",
user_id="student"
)
session_id = session.id
else:
# Retrieve existing session to continue conversation
session_id = request.session_id Session Flow Visualization
First Request (no session_id):
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Student │───▶│ Create New │───▶│ Return │
│ "Hello!" │ │ Session │ │ session_id │
└──────────────┘ └──────────────────┘ └──────────────┘
Follow-up Request (with session_id):
┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Student │───▶│ Find Existing │───▶│ Continue │
│"What's 2+2?" │ │ Session │ │ Conversation │
└──────────────┘ └──────────────────┘ └──────────────┘ The InMemorySessionService provides three key methods:
- create_session — Initializes a new conversation thread with a unique ID.
- get_session — Retrieves an existing conversation by its session ID.
- list_sessions — Lists all active sessions (useful for debugging).
Important: Since sessions are stored in memory, they are lost when the server restarts. For production, consider using a persistent session store backed by a database.
User ID vs Session ID
These two concepts are often confused, but they serve very different purposes:
- User ID — Identifies who is talking. It remains consistent across all interactions for a given user (e.g.,
user_id = "student"). Think of it as the person's identity. - Session ID — Identifies a specific conversation thread. Each new conversation gets a unique, auto-generated UUID. Think of it as a chat window.
The relationship is one-to-many: a single User ID can have many Session IDs. One student might open multiple study sessions throughout the day — each is a separate conversation with its own context, but they all belong to the same user.
Complete API Implementation
Here's the full FastAPI setup with CORS middleware and rate limiting:
import uvicorn
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
from slowapi import Limiter
from slowapi.util import get_remote_address
from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService
from google.genai import types
from src.agents.studdy_buddy.agent import root_agent
# Initialize services
session_service = InMemorySessionService()
runner = Runner(
app_name="School Agents API",
agent=root_agent,
session_service=session_service
)
# Rate limiting
limiter = Limiter(key_func=get_remote_address)
# FastAPI app
app = FastAPI(title="School Agents API", version="1.0.0")
app.state.limiter = limiter
# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
) Now the query endpoint — this is where messages are processed:
class QueryRequest(BaseModel):
query: str
session_id: Optional[str] = None
@app.post("/query")
@limiter.limit("50/day")
@limiter.limit("10/minute")
@limiter.limit("100/hour")
@limiter.limit("5/second")
async def query_agent(request: Request, body: QueryRequest):
# Session management
if not body.session_id:
session = await session_service.create_session(
app_name="School Agents API",
user_id="student"
)
session_id = session.id
else:
session_id = body.session_id
# Create message content
content = types.Content(
role="user",
parts=[types.Part.from_text(text=body.query)]
)
# Process through runner
final_response = ""
async for event in runner.run_async(
user_id="student",
session_id=session_id,
new_message=content
):
if event.is_final_response():
for part in event.content.parts:
if part.text:
final_response += part.text
return {
"response": final_response,
"session_id": session_id
} The rate limiting configuration protects your API from abuse:
- 50/day — Maximum 50 requests per IP per day.
- 10/minute — Burst protection: no more than 10 requests per minute.
- 100/hour — Hourly cap as a safety net.
- 5/second — Prevents rapid-fire automated requests.
Running the API
There are several ways to start your API server:
# Option 1: Run directly
python src/api/api_server.py
# Option 2: Using uvicorn with reload (development)
uvicorn src.api.api_server:app --host 0.0.0.0 --port 8080 --reload
# Option 3: Module-style import
uvicorn src.api:app --host 0.0.0.0 --port 8080 --reload Once running, test your API with curl:
# First request (creates new session)
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "Explain photosynthesis"}'
# Follow-up with session_id (continues conversation)
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "Can you go deeper into the light reactions?", "session_id": "your-session-id-here"}' Building the Chat Interface
For the frontend, we're using vanilla HTML, CSS, and JavaScript instead of React or any framework. Here's why:
- Zero build process — No webpack, no bundler, no
npm install. Just files. - Faster load times — No framework overhead means the chat loads instantly.
- Direct deployment — The HTML can be served straight from FastAPI as a template.
- Easier debugging — What you see in the source is what runs in the browser.
Frontend Architecture
┌─────────────────────────┐
│ HTML/CSS/JS │ ← Modern chat interface
│ Chat Interface │
└────────────┬────────────┘
│
▼ JavaScript fetch()
┌─────────────────────────┐
│ FastAPI Server │ ← Serves HTML + handles API
│ │
├─────────────────────────┤
│ GET / │ ← Returns chat interface
│ POST /query │ ← Processes messages
└─────────────────────────┘ SSR vs CSR
We're using Server-Side Rendering (SSR) here — FastAPI renders the HTML on the server and sends a complete page to the browser. This is different from Client-Side Rendering (CSR) where the browser downloads JavaScript that then builds the page. SSR gives us faster initial load and better SEO with zero JavaScript framework needed.
from fastapi.responses import HTMLResponse
from pathlib import Path
@app.get("/", response_class=HTMLResponse)
async def serve_chat_interface():
html_path = Path(__file__).parent / "templates" / "chat.html"
return HTMLResponse(content=html_path.read_text()) Key frontend features:
- Session management integration — Automatically stores and reuses
session_idfor multi-turn conversations. - Progressive enhancement — Works without JavaScript for basic display, enhanced with JS for interactivity.
- Zero dependencies — No CDN links, no npm packages, just pure browser APIs.
- Simple state management — Session ID stored in a variable, messages appended to the DOM.
Complete API Endpoints
Here's the full set of endpoints your API exposes:
GET /— Serves the chat interface (SSR HTML page).POST /query— Main interaction endpoint. Accepts a query and optional session_id, returns the agent's response.GET /docs— Auto-generated Swagger/OpenAPI documentation (built into FastAPI).GET /health— Health check endpoint for monitoring and load balancers.GET /info— Returns API metadata like version, agent name, and available endpoints.
Environment Setup
Create a .env file in your project root with the required configuration:
GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=your-google-api-key-here Setting GOOGLE_GENAI_USE_VERTEXAI=FALSE tells the ADK to use the standard Google AI API directly instead of going through Vertex AI. This simplifies local development — you only need a Google API key rather than a full GCP project setup.
Docker Containerization
Containerizing your agent ensures it runs the same way everywhere — your laptop, a CI server, or production cloud infrastructure.
FROM python:3.11-slim
WORKDIR /app
# Install uv for fast dependency management
RUN pip install uv
# Copy dependency files first (better caching)
COPY pyproject.toml uv.lock ./
# Install dependencies
RUN uv sync --frozen --no-dev
# Copy application code
COPY . .
# Create non-root user for security
RUN useradd --create-home appuser
USER appuser
EXPOSE 8080
CMD ["uv", "run", "uvicorn", "src.api.api_server:app", "--host", "0.0.0.0", "--port", "8080"] Keep your image lean with a .dockerignore:
.git
.env
__pycache__
*.pyc
.venv
node_modules
.pytest_cache For multi-container setups, use Docker Compose:
version: "3.8"
services:
study-buddy:
build: .
ports:
- "8080:8080"
env_file:
- .env
restart: unless-stopped Build and run:
# Build and run with Docker
docker build -t study-buddy-api .
docker run -p 8080:8080 --env-file .env study-buddy-api
# Or use Docker Compose
docker compose up --build Deployment Options
Once containerized, you have many options for hosting your agent:
Cloud Platforms
- Google Cloud Run — Serverless containers. Deploy with a single command:
gcloud run deploy study-buddy \
--source . \
--region us-central1 \
--allow-unauthenticated - AWS — Push to ECR, then deploy via ECS, EC2, or EKS depending on your scale and complexity needs.
- Azure — Container Apps or Azure Kubernetes Service for managed orchestration.
Self-Hosted
- Dokploy — An open-source alternative to Vercel/Netlify for self-hosted deployments.
- Docker Swarm — Native Docker clustering for small-to-medium deployments.
- Kubernetes — Full orchestration for complex, multi-service architectures.
- VPS (DigitalOcean, Hetzner, etc.) — Simple and cost-effective. Just
docker compose upon your server.
Dokploy: Open Source Self-Hosted Deployment
Dokploy is a powerful, open-source alternative to Vercel/Netlify for self-hosted deployments. It's perfect for developers who want the convenience of a PaaS with the control of self-hosting.
- Official Website: dokploy.com
- GitHub Repository: github.com/Dokploy/dokploy
Complete Dokploy Tutorial
For a comprehensive step-by-step guide on setting up and deploying with Dokploy, watch this detailed tutorial:
This video covers everything from server setup to deployment — perfect for getting your StudyBuddy app live quickly and cost-effectively.
Real-World Example
Our StudyBuddy demo is deployed using Dokploy:
- Live Demo: study_buddy.chotuai.in
- Source Code: github.com/arjunagi-a-rehman/school-agents
What We Covered
Let's recap everything we built across this series so far:
- Custom Agent — A personalized Study Buddy agent with identity and behavior.
- Function Calling — Google Search and custom tools for real capabilities.
- REST API — FastAPI-based deployment with session management and event handling.
- Modern UI — Server-rendered chat interface with zero dependencies.
- Rate Limiting — Protection against abuse with multi-tier limits.
- Containerization — Docker and Docker Compose for portable deployments.
- Live Deployment — From local dev to production-ready infrastructure.