#! pip install fastapi
#! pip install "uvicorn[standard]"
Basics
- Build APIs based on standard Python type hints
- Automatically generate interactive documentation
- Fast to code, fewer bugs
- If the following contents are in main.py, can run via
uvicorn main:app --reload
.main
refers to main.py and app refers to the object inside main.py.--reload
reloads the page upon changes, to be used during dev, not prod.
- Can see documentation conforming to OpenAPI standard in
http://127.0.0.1:8000/docs
, from which you can use the endpoints! http://127.0.0.1:8000/redoc
returns documentation in alternative format.- Use
async def
to make the functions non-blocking, enabling other tasks to run concurrently. Useful when function performs I/O-bound operations, such as database queries, file I/O, or network requests, and when need to handle a large number of concurrent requests efficiently. - Type hints will be validated with Pydantic, so if use a non-int in
/items/{item_id}
, will get an error. - Order matters: If
read_user_current
is placed afterread_user
, will get an error since FastAPI will read functions top-down and try to validate input to be an integer. - Use
Enums
if path parameter must come from a certain list of values. If improper parameter is passed, FastAPI will list available values! - To have paths be read correctly, use
:path
path converter, allowing the parameter to capture the entire path, including slashes. read_animal
without additional parameters will read off animals 0-10. With additional parameters, can specify which ones we want via query parameters, as in http://127.0.0.1:8000/animals/?skip=0&limit=2. Here, ? denotes start of query parameters and & separates them. Can also pass optional parameter as http://127.0.0.1:8000/animals/?skip=0&limit=2&optional_param=3, just make sure to specify it as typing.Optional.- Can pass and use optional parameters as in
read_user_item
. - Request body is data sent by client to the API and response body is data sent from API to client. Use Pydantic to specify request body with POST request type.
- To send a post request, could test it out in /docs or with curl -X POST “http://127.0.0.1:8000/books/” -H “Content-Type: application/json” -d ‘{ “name”: “The Great Gatsby”, “author”: “F. Scott Fitzgerald”, “description”: “A novel set in the 1920s”, “price”: 10.99 }’
- Then can go to /books endpoint to see the books printed.
from fastapi import FastAPI
from enum import Enum
import typing as t
from pydantic import BaseModel
= FastAPI()
app
@app.get("/") #route/endpoint
def home_page():
return {"message":"Hello World!"}
@app.get("/items/{item_id}") #item_id is the path parameter
async def read_item(item_id: int):
return {"item_id":item_id}
@app.get("/users/me") # will not work if placed after, must be before to be valid
async def read_user_current():
return {"user_id":"Current user"}
@app.get("/users/{user_id}")
async def read_user(user_id: int):
return {"user_id":user_id}
class ModelName(str,Enum):
= 'ALEXNET'
ALEXNET = 'RESNET'
RESNET = 'LENET'
LENET
@app.get("/models/{model_name}")
async def get_model(model_name: ModelName):
if model_name == ModelName.ALEXNET:
return {'model_name':model_name}
elif model_name.value == "LENET":
return {'model_name': model_name}
else:
return {'model_name':f"You have selected {model_name.value}"}
@app.get("files/{file_path:path}")
async def read_file(file_path:str):
return {"file_path":file_path}
= [{"animal_name":'cat'},{"animal_name":'llama'},{"animal_name":'alpaca'}]
animal_db
@app.get("/animals/")
async def read_animal(skip: int=0, limit: int=10, optional_param: t.Optional[int]=None):
return {"animals": animal_db[skip:skip+limit], "optional_parameter":optional_param}
@app.get("/users/{user_id}/items/{item_id}")
async def read_user_item(
int, item_id: int, q: t.Optional[str]=None, short:bool=False
user_id:
):= {"item_id":item_id, "owner_id":user_id}
item if q:
"q":q})
item.update({if not short:
'description':'great item with long description'})
item.update({return item
= []
books_db class Book(BaseModel):
str
name:str
author:str]
description:t.Optional[float
price:
@app.post("/books/")
async def create_item(book:Book):
books_db.append(book)return book
@app.get("/books/")
async def get_books():
return books_db
Notes following “Building Data Science Applications with FastAPI” by François Voron Chapter 2: Python specificities -> asyncio
Q: What’s the difference between WSGI and ASGI gateways as it pertains to Django and FastAPI? WSGI (Web Server Gateway Interface) and ASGI (Asynchronous Server Gateway Interface) are two different specifications for Python web servers and applications. They serve as interfaces between web servers and web applications or frameworks. Here’s a detailed comparison of WSGI and ASGI, particularly in the context of Django and FastAPI:
- WSGI (Web Server Gateway Interface) Synchronous:
WSGI is designed for synchronous web applications. It handles one request at a time per worker, which can lead to inefficiencies when dealing with I/O-bound operations like database queries or external API calls. Django:
Django is traditionally a WSGI-based framework. It works well for most web applications but can struggle with real-time features like WebSockets or long-polling due to its synchronous nature. Common WSGI servers for Django include Gunicorn and uWSGI. Concurrency:
WSGI applications handle concurrency by using multiple worker processes or threads. Each worker handles one request at a time. Deployment:
WSGI applications are typically deployed using WSGI servers like Gunicorn, uWSGI, or mod_wsgi (for Apache). - ASGI (Asynchronous Server Gateway Interface) Asynchronous:
ASGI is designed for asynchronous web applications. It supports both synchronous and asynchronous code, allowing for more efficient handling of I/O-bound operations and real-time features. FastAPI:
FastAPI is an ASGI-based framework. It is built from the ground up to support asynchronous programming, making it ideal for applications that require high concurrency, real-time communication, or WebSockets. Common ASGI servers for FastAPI include Uvicorn and Daphne. Concurrency:
ASGI applications can handle many requests concurrently using asynchronous I/O. This allows for more efficient use of resources, especially for I/O-bound tasks. Deployment:
ASGI applications are typically deployed using ASGI servers like Uvicorn, Daphne, or Hypercorn.
#!pip install nest_asyncio # run asyncio within Jupyter's already running even loop
Requirement already satisfied: nest_asyncio in /home/mainuser/anaconda3/envs/mintonano/lib/python3.11/site-packages (1.6.0)
# import asyncio
# async def printer(name: str, times: int)->None:
# for i in range(times):
# print(name)
# await asyncio.sleep(1)
# async def main():
# await asyncio.gather(
# printer("A",3),
# printer("B",3)
# )
# asyncio.run(main())
# adopting code since Jupyter has it's own event loop
import asyncio
import nest_asyncio
# Apply nest_asyncio to allow nested event loops
apply()
nest_asyncio.
async def printer(name: str, times: int) -> None:
for i in range(times):
print(name)
await asyncio.sleep(1)
async def main():
await asyncio.gather(
"A", 3),
printer("B", 3)
printer(
)
# Await the main coroutine directly
await main()
A
B
A
B
A
B
asyncio.sleep(1) was added since writing code in a coroutine doesn’t necessarily mean it will not block. Computations are blocking! I/O opps will not block or we could use multiprocessing.
Path parameters and their validation
from fastapi import FastAPI, Path
= FastAPI()
app
@app.get('/license-plates/{license}')
async def get_license_plate(id: int = Path(...,regex=r"^\w{2}-\d{3}-\w{2}")):
return {"license":license}
/tmp/ipykernel_6460/3090869963.py:5: DeprecationWarning: `regex` has been deprecated, please use `pattern` instead
async def get_license_plate(id: int = Path(...,regex=r"^\w{2}-\d{3}-\w{2}")):
- In FastAPI, … above indicate that we don’t want a default value. RegEx validates French license plates like AB-123-CD.
Notes by Key Topic
Installation, virtual environment (conda), running, first app
%% bash
--name fastapi-env python=3.11
conda create -env
conda activate fastapiall] pip install fastapi[
- If FastAPI app is called app in main file, run as follows:
uvicorn main:app --reload
- Access interactive documentation using
http://127.0.0.1:8000/docs
(using Swagger UI) orhttp://127.0.0.1:8000/redoc
(using ReDoc)
Defining routes with path and query parameters (for user input) and validating requests.
- Defining path parameters: user_id is a path parameter FastAPI will convert to an integer.
from fastapi import FastAPI
= FastAPI()
app
@app.get("/users/{user_id}")
def read_user(user_id:int):
return {"user_id":user_id}
- Defining query parameters via function parameters with default values:
@app.get("/users/")
def read_user(skip:int=0,limit:int=10):
return {"skip":skip, "limit":limit}
If a user accesses /users/?skip=5&limit=15, FastAPI will return {"skip":5, "limit":15}
- Request validation with Pydantic below. Make the request as follows:
curl -X POST "http://127.0.0.1:8000/users/" -H "Content-Type: application/json" -d '{"id":1, "name":"John Smith", "email":"john@example.com"}' -
-X POST: use HTTP method to post data -
-H “Content-Type: application/json”`: add HTTP header to the request and specify that the data being sent is in JSON format- -d ‘{“id”:1, “name”:“John Smith”, “email”:“john@example.com”}’: send the specified data in the request body
from pydantic import BaseModel
class User(BaseModel):
id: int
str
name: str
email:
@app.post("/users/")
def create_user(user:User):
return {"id": user.id, "name": user.name, "email":user.email}
- Combining path ahd query parameters with Pydantic: user_id is a path parameter and details is a query parameter that modifies the response.
- Read simply as
curl "http://127.0.0.1:8000/users/1"
@app.get("/users/{user_id}")
def read_user(user_id: int, details: bool=False):
if details:
return {"user_id":user_id, "details":"Detailed info"}
return {"user_id":user_id}
Request and response models
- Request models define the structure of the data that your API expects to receive in the request body. They are used to validate and parse the incoming data.
- Response models define the structure of the data that your API returns in response. They ensure that the response data is correctly formatted and validated.
curl -X POST "http://127.0.0.1:8000/users/" -H "Content-Type: application/json" -d '{"id":1, "name":"John Smith", "email":"john@example.com", "age":30}'
from fastapi import FastAPI
from pydantic import BaseModel
from typing import Optional
= FastAPI()
app
class UserCreate(BaseModel):
id: int
str
name: str
email: int] = None
age: Optional[
class UserResponse(BaseModel):
id: int
str
name: str
email: int] = None
age: Optional[bool
is_active:
@app.post("/users/", response_model=UserResponse)
async def create_user(user:UserCreate): #validate incoming data
= UserResponse( #validate outgoing data
user_response id= user.id,
=user.name,
name=user.email,
email=user.age,
age=True
is_active
)return user_response
Dependency Injection
Inject dependencies (database connections, configuration settings, other shared resources) into your functions or classes.
Separate concerns between the logic of the endpoint and the more generic logic for the pagination parameters.
Ideal for utility logic to retrieve or validate data, make security checks, or call external logic that will be needed several times across the application.
Notes following “Building Data Science Applications with FastAPI” by François Voron Chapter 2: Python specificities -> asyncio
from fastapi import Depends, FastAPI
= FastAPI()
app
async def pagination(skip:int=0,limit:int=10)->tuple[int,int]:
return (skip,limit)
@app.get("/items")
async def list_items(p:tuple[int,int]=Depends(pagination)):
= p
skip,limit return {"skip":skip, "limit":limit}
@app.get("/things")
async def list_things(p:tuple[int,int]=Depends(pagination)):
= p
skip,limit return {"skip":skip, "limit":limit}
- FastAPI limitation:
Depends
function is not able to forward the type of the dependency function, so we have to do this manually above. - Raising a 404 error:
from fastapi import Depends, FastAPI, HTTPException, status
from pydantic import BaseModel
class Post(BaseModel):
id: int
str
title: str
content:
class PostUpdate(BaseModel):
str | None
title: str | None
content:
class DummyDatabase:
dict[int, Post] = {}
posts:
= DummyDatabase()
db = {
db.posts 1: Post(id=1, title="Post 1", content="Content 1"),
2: Post(id=2, title="Post 2", content="Content 2"),
3: Post(id=3, title="Post 3", content="Content 3"),
}
= FastAPI()
app
async def get_post_or_404(id: int) -> Post:
try:
return db.posts[id]
except KeyError:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND)
@app.get("/posts/{id}")
async def get(post: Post = Depends(get_post_or_404)):
return post
@app.patch("/posts/{id}")
async def update(post_update: PostUpdate, post: Post = Depends(get_post_or_404)):
= post.copy(update=post_update.dict())
updated_post id] = updated_post
db.posts[post.return updated_post
@app.delete("/posts/{id}", status_code=status.HTTP_204_NO_CONTENT)
async def delete(post: Post = Depends(get_post_or_404)):
id) db.posts.pop(post.
Creating and using a parametrized dependency with a class
- Suppose we wanted to dynamically cap the limit value in the pagination example…-> would need to do this with a class!
from fastapi import Depends, FastAPI, Query
= FastAPI()
app
class Pagination:
def __init__(self, maximum_limit:int = 100):
self.maximum_limit = maximum_limit
async def __call__(
self,
int = Query(0, ge=0),
skip: int = Query(10,ge=0)
limit: -> tuple[int,int]:
) = min(self.maximum_limit, limit)
capped_limit return (skip, capped_limit)
# hardcoded below, but could come from config file or env variable
= Pagination(maximum_limit=50)
pagination
@app.get("/items")
async def list_items(p: tuple[int, int] = Depends(pagination)):
= p
skip, limit return {"skip": skip, "limit": limit}
@app.get("/things")
async def list_things(p: tuple[int, int] = Depends(pagination)):
= p
skip, limit return {"skip": skip, "limit": limit}
Note: n FastAPI, Query is used to define and validate query parameters for your API endpoints. Query parameters are the key-value pairs that appear after the ? in a URL. They are typically used to filter, sort, or paginate data.
Depends
simply expects a callable: in can be__call__
or another function as below. Note that the pattern below could be used to apply different preprocessing steps, depending on the data, in the ML context:
from fastapi import Depends, FastAPI, Query
= FastAPI()
app
class Pagination:
def __init__(self, maximum_limit: int = 100):
self.maximum_limit = maximum_limit
async def skip_limit(
self,
int = Query(0, ge=0),
skip: int = Query(10, ge=0),
limit: -> tuple[int, int]:
) = min(self.maximum_limit, limit)
capped_limit return (skip, capped_limit)
async def page_size(
self,
int = Query(1, ge=1),
page: int = Query(10, ge=0),
size: -> tuple[int, int]:
) = min(self.maximum_limit, size)
capped_size return (page, capped_size)
= Pagination(maximum_limit=50)
pagination
@app.get("/items")
async def list_items(p: tuple[int, int] = Depends(pagination.skip_limit)):
= p
skip, limit return {"skip": skip, "limit": limit}
@app.get("/things")
async def list_things(p: tuple[int, int] = Depends(pagination.page_size)):
= p
page, size return {"page": page, "size": size}
- Using dependency injection to manage db connection, ensuring that each request gets a fresh connection and that connections are properly closed after use:
from fastapi import FastAPI, Depends
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
= "sqlite:///./test.db"
DATABASE_URL = create_engine(DATABASE_URL)
engine = sessionmaker(autocommit=False, autoflush=False, bind=engine)
SessionLocal = declarative_base()
Base
# Define a User model
class User(Base):
= "users"
__tablename__ id = Column(Integer, primary_key=True, index=True)
= Column(String, index=True)
name = Column(String, unique=True, index=True)
email
= FastAPI()
app
def get_db():
= SessionLocal()
db try:
yield db
finally:
db.close()# Endpoint to create a new user
@app.post("/users/")
async def create_user(name: str, email: str, db: Session = Depends(get_db)):
= User(name=name, email=email)
user
db.add(user)
db.commit()
db.refresh(user)return user
@app.get("/users/")
async def read_users(db: Session=Depends(get_db)):
= db.query(User).all()
users return users
# Run the application
if __name__ == "__main__":
import uvicorn
="127.0.0.1", port=8000) uvicorn.run(app, host
- An example in LLM context:
from fastapi import FastAPI, Depends
from pydantic import BaseModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
= FastAPI()
app
class LLM:
def __init__(self, model_name: str = "distilbert-base-uncased-finetuned-sst-2-english"):
# Load the pre-trained model and tokenizer from Hugging Face
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
self.model.to("cuda" if torch.cuda.is_available() else "cpu")
def predict(self, text: str, preprocess_steps: list):
# Preprocess the text
= self.text_processor.preprocess(text, preprocess_steps)
text # Tokenize the input text
= self.tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda" if torch.cuda.is_available() else "cpu") for k, v in inputs.items()}
inputs # Perform prediction using the model
with torch.no_grad():
= self.model(**inputs)
outputs # Get the predicted class
= torch.nn.functional.softmax(outputs.logits, dim=-1)
predictions = torch.argmax(predictions, dim=1).item()
predicted_class return predicted_class
# Create a global LLM instance
= LLM()
llm_instance
# Dependency to get the global LLM instance
def get_llm():
return llm_instance
# Request model for prediction
class PredictionRequest(BaseModel):
str
text:
# Response model for prediction
class PredictionResponse(BaseModel):
int
prediction:
# Use the LLM dependency in an endpoint
@app.post("/predict/", response_model=PredictionResponse)
async def predict(request: PredictionRequest, llm: LLM = Depends(get_llm)):
= llm.predict(request.text)
prediction return {"prediction": prediction}
# Run the application
if __name__ == "__main__":
import uvicorn
="127.0.0.1", port=8000) uvicorn.run(app, host
- Run as follows:
curl -X POST "http://127.0.0.1:8000/predict/" -H "Content-Type: application/json" -d '{"text": "I love FastAPI!"}'
- Will return something like the following:
{ "prediction": 1 }
async/await syntax for LLM calls that are I/O-bound
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
import httpx
import os
from dotenv import load_dotenv
load_dotenv()
= FastAPI()
app
= os.getenv("OPENAI_API_KEY")
OPENAI_API_KEY = os.getenv("HUGGINGFACE_API_KEY")
HUGGINGFACE_API_KEY
class OpenAIRequest(BaseModel):
str
prompt:
class HuggingFaceRequest(BaseModel):
str
prompt:
class OpenAIResponse(BaseModel):
id: str
object: str
int
created: str
model: list
choices:
class HuggingFaceResponse(BaseModel):
str
generated_text:
@app.post("/openai-generate/", response_model=OpenAIResponse) #respone should conform to the given Pydantic model
async def openai_generate(request: OpenAIRequest):
= "https://api.openai.com/v1/completions"
url = {
headers "Authorization": f"Bearer {OPENAI_API_KEY}",
"Content-Type": "application/json",
}= {
data "model": "text-davinci-003",
"prompt": request.prompt,
"max_tokens": 100,
}async with httpx.AsyncClient() as client: #always async with context managers
try:
= await client.post(url, headers=headers, json=data)
response # Raises an error for 4xx/5xx responses
response.raise_for_status() return response.json()
except httpx.HTTPStatusError as exc:
raise HTTPException(status_code=exc.response.status_code, detail=exc.response.text)
@app.post("/huggingface-generate", response_model=HuggingFaceResponse)
async def huggingface_generate(request: HuggingFaceRequest):
= "https://api-inference.huggingface.co/models/gpt2"
url = {
headers "Authorization": f"Bearer {HUGGINGFACE_API_KEY}",
}= {
data "inputs": request.prompt,
"options": {"use_cache": False},
} async with httpx.AsyncClient() as client:
try:
= await client.post(url, headers = headers, json=data)
response
response.raise_for_status()return {"generated_text": response.json()[0]['generated_text']}
except httpx.HTTPStatusError as exc:
raise HTTPException(status_code=exc.response.status_code, detail=exc.response.text)
# To run the app: uvicorn your_file_name:app --reload
Custom error handling
from fastapi import FastAPI, HTTPException, Depends, Request
from fastapi.responses import JSONResponse
from pydantic import BaseModel
import httpx
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
= os.getenv("LLM_API_KEY")
LLM_API_KEY
class LLMError(Exception):
def __init__(self, message: str):
self.message = message
@app.exception_handler(LLMError)
async def llm_exception_handler(request: Request, exc: LLMError):
return JSONResponse(
=500,
status_code={"message":f"LLM error occurred: {exc.message}"}
content
)class GenerateRequest(BaseModel):
str
prompt:
class GenerateResponse(BaseModel):
str
generated_text:
@app.post("/generate", response_model=GenerateResponse)
async def generate(request: GenerateRequest):
if not request.prompt:
raise HTTPException(status_code=400, detail="Prompt cannot be empty!")
= "https://api.llm.example/generate"
url = {
headers "Authorization": f"Bearer {LLM_API_KEY}",
"Content-Type": "application/json",
}= {"prompt": request.prompt}
data
async with httpx.AsyncClient() as client:
try:
= await client.post(url, headers=headers, json=data)
response # Raise an error for 4xx/5xx responses
response.raise_for_status() = response.json()
response_data return {"generated_text": response_data.get("generated_text", "")}
except httpx.HTTPStatusError as exc:
raise LLMError(message = exc.response.text)
except Exception as exc:
raise LLMError(message=str(exc))
# To run the app: uvicorn your_file_name:app --reload
Serving static files (like HTML, CSS, JS) with FastAPI: Jinja2+JS
# Given the following file structure
# /my_fastapi_app
# ├── app.py # Your FastAPI application
# ├── static # Directory for static files
# │ ├── css
# │ │ └── styles.css
# │ ├── js
# │ │ └── script.js
# │ └── index.html
# └── requirements.txt
# index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ title }}</title>
<link rel="stylesheet" href="/static/css/styles.css">
</head>
<body>
<h1>{{ title }}</h1>
<p>{{ description }}</p>
<form id="generate-form">
<label for="prompt">Enter your prompt:</label>
<input type="text" id="prompt" name="prompt" required>
<button type="submit">Generate</button>
</form>
<div id="result"></div>
<script src="/static/js/scripts.js"></script>
</body>
</html>
# script.js
'generate-form').addEventListener('submit', async (event) => {
document.getElementById(;
event.preventDefault()= document.getElementById('prompt').value;
const prompt = await fetch('/generate', {
const response 'POST',
method:
headers: {'Content-Type': 'application/json'
},
body: JSON.stringify({ prompt });
})= await response.json();
const data 'result').innerText = data.generated_text;
document.getElementById(; })
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
= FastAPI()
app
#mount the static directory
"/static",StaticFiles(directory="static"),name='static')
app.mount(
# @app.get("/",response_class=HTMLResponse)
# async def read_index():
# with open("static/index.html") as f:
# return f.read()
# Or better, serve HTML templates more dynamically:
= Jinja2Templates(directory="templates")
templates
@app.get("/", response_class=HTMLResponse)
async def read_root(request: Request):
= {
context "request": request,
"title": "LLM Text Generator",
"description": "Generate text using a large language model."
}return templates.TemplateResponse("index.html", context)
# To run the app: uvicorn main:app --reload
- When you navigate to http://127.0.0.1:8000/, it will serve your index.html file.
app.mount
: specify URL path+directory for static files, allowing FastAPI to handle requests to those files automatically.
Serving static files (like HTML, CSS, JS) with FastAPI: Jinja2+HTMX
from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
import httpx
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
# Mount the static directory
"/static", StaticFiles(directory="static"), name="static")
app.mount(
# Serve HTML templates
= Jinja2Templates(directory="templates")
templates
@app.get('/', response_class=HTMLResponse)
async def read_root(request:Request):
= {
context "request":request,
"title": "LLM Text Generator",
"description": "Generate text using a language model."
}return templates.TemplateResponse("index.html", context)
@app.post("/generate",response_class=HTMLResponse)
async def generate_text(request: Request, prompt: str = Form(...))
if not prompt:
return HTMLResponse("<div id='result'><p>Error: Error: Prompt cannot be empty!</p></div>")
= "https://api.llm.example/generate"
url = {
headers "Authorization": f"Bearer {os.getenv('LLM_API_KEY')}",
"Content-Type": "application/json",
}= {"prompt": prompt}
data
async with httpx.AsyncClient() as client:
try:
= await client.post(url, headers=headers, json=data)
response # Raise an error for 4xx/5xx responses
response.raise_for_status() = response.json()
response_data = response_data.get("generated_text", "")
generated_text return HTMLResponse(f"<div id='result'><h2>Generated Text:</h2><p>{generated_text}</p></div>") # update the #result endpoint
except httpx.HTTPStatusError as exc:
return HTMLResponse(f"<div id='result'><p>Error: {exc.response.text}</p></div>")
except Exception as exc:
return HTMLResponse(f"<div id='result'><p>Error: {str(exc)}</p></div>")
No longer any JavaScript found in the directory, all taken care of with HTMX as follows:
- HTML Template with HTMX:
- The index.html template includes the HTMX library by adding a
<script>
tag that loads HTMX from a CDN. - The form uses HTMX attributes (hx-post, hx-target, hx-swap) to handle form submission and update the result dynamically:
- hx-post=“/generate”: Sends a POST request to the /generate endpoint when the form is submitted.
- hx-target=“#result”: Specifies the element (#result) to update with the server’s response.
- hx-swap=“innerHTML”: Replaces the inner HTML of the target element with the server’s response.
- The index.html template includes the HTMX library by adding a
- FastAPI Endpoint:
- The /generate endpoint processes the form submission, interacts with the LLM API, and returns an HTML response with the generated text or an error message.
- The response is an HTMLResponse that updates the #result element in the HTML template.*
- HTML Template with HTMX:
# .
# ├── main.py
# ├── static
# │ ├── css
# │ │ └── styles.css
# └── templates
# └── index.html
# templates/index.htmx
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ title }}</title>
<link rel="stylesheet" href="/static/css/styles.css">
<script src="https://unpkg.com/htmx.org@1.6.1"></script>
</head>
<body>
<h1>{{ title }}</h1>
<p>{{ description }}</p>
<form hx-post="/generate" hx-target="#result" hx-swap="innerHTML">
<label for="prompt">Enter your prompt:</label>
<input type="text" id="prompt" name="prompt" required>
<button type="submit">Generate</button>
</form>
<div id="result"></div>
</body>
</html>
Packaging with Docker
.
├── Dockerfile
├── main.py
├── requirements.txt
├── static
│ ├── css
│ │ └── styles.css
└── templates └── index.html
- templates/index.html remains the same file compatible with HTMX
# requirements.txt
fastapi
uvicorn==1.9.0+cu111 # Ensure this matches the CUDA version in the Docker image
torch
transformers-dotenv
python-cloud-storage google
- For GPU support…
Install NVIDIA Docker: Install the NVIDIA Docker runtime on your host machine.
Modify the Dockerfile: Use a base image that includes CUDA and cuDNN libraries.
Update the Docker Run Command: Use the –gpus flag to allocate GPU resources to the container.
Ensure PyTorch is Installed with CUDA Support: Make sure the PyTorch version installed in the container supports CUDA.
# Use the official NVIDIA CUDA runtime as a parent image
/cuda:11.3.1-cudnn8-runtime-ubuntu20.04
FROM nvidia
# Set the working directory in the container
/app
WORKDIR
# Copy the current directory contents into the container at /app
/app
COPY .
# Install any needed packages specified in requirements.txt
-get update && apt-get install -y \
RUN apt-pip \
python3&& rm -rf /var/lib/apt/lists/*
--no-cache-dir -r requirements.txt
RUN pip3 install
# Expose port 80 to the outside world
80
EXPOSE
# Run the FastAPI application
"uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"] CMD [
from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
# Mount the static directory
"/static", StaticFiles(directory="static"), name="static")
app.mount(
# Serve HTML templates
= Jinja2Templates(directory="templates")
templates
# Load the model and tokenizer from a shared storage location or model server
= os.getenv("MODEL_NAME", "gpt2") # Replace with your model path or name
model_name = AutoTokenizer.from_pretrained(model_name)
tokenizer = AutoModelForCausalLM.from_pretrained(model_name)
model "cuda" if torch.cuda.is_available() else "cpu")
model.to(
@app.get("/", response_class=HTMLResponse)
async def read_root(request: Request):
= {
context "request": request,
"title": "LLM Text Generator",
"description": "Generate text using a large language model."
}return templates.TemplateResponse("index.html", context)
@app.post("/generate", response_class=HTMLResponse)
async def generate_text(request: Request, prompt: str = Form(...)):
if not prompt:
return HTMLResponse("<div id='result'><p>Error: Prompt cannot be empty!</p></div>")
= tokenizer(prompt, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return HTMLResponse(f"<div id='result'><h2>Generated Text:</h2><p>{generated_text}</p></div>")
%% bash
-t my-fastapi-app .
docker build --gpus all --env-file .env -d -p 80:80 --name my-fastapi-container my-fastapi-app docker run
Deploying on GCP
Step 1: Set Up GCP Project: Create a GCP project and enable necessary APIs.
Step 2: Install and Configure Google Cloud SDK: Install the Google Cloud SDK and authenticate.
Step 3:Build and Push Docker Image to GCR: Build the Docker image and push it to Google Container Registry.
Step 4:Create a GKE Cluster: Create a Kubernetes cluster on GKE.
Step 5:Deploy the Application on GKE: Deploy the FastAPI application on the GKE cluster.
Step 6:Set Up Google Cloud Storage: Store the model in GCS and access it from the application.
Step 1: Set Up GCP Project: Create a GCP project and enable necessary APIs.
Create a GCP Project: Go to the Google Cloud Console.
Create a new project.
Enable the following APIs: Kubernetes Engine API, Container Registry API, Cloud Storage API
Step 2: Install and Configure Google Cloud SDK
Install Google Cloud SDK: https://cloud.google.com/sdk/docs/install
Authenticate with GCP: gcloud init; gcloud auth login
Set the Project: gcloud config set project YOUR_PROJECT_ID
Step 3: Build and Push Docker Image to GCR
docker build -t gcr.io/YOUR_PROJECT_ID/my-fastapi-app .
docker push gcr.io/YOUR_PROJECT_ID/my-fastapi-app
Step 4:Create a GKE Cluster.
gcloud container clusters create my-cluster --num-nodes=3
gcloud container clusters get-credentials my-cluster
Step 5:Deploy the Application on GKE: Deploy the FastAPI application on the GKE cluster.
- Create a Kubernetes Deployment
deployment.yaml
## deployment.yaml
/v1
apiVersion: apps
kind: Deployment
metadata:-fastapi-app
name: my
spec:3
replicas:
selector:
matchLabels:-fastapi-app
app: my
template:
metadata:
labels:-fastapi-app
app: my
spec:
containers:- name: my-fastapi-app
/YOUR_PROJECT_ID/my-fastapi-app
image: gcr.io
ports:- containerPort: 80
env:- name: MODEL_NAME
"gs://YOUR_BUCKET_NAME/model" value:
- Create a Kubernetes Service
service.yaml
apiVersion: v1
kind: Service
metadata:-fastapi-app
name: my
spec:type: LoadBalancer
selector:-fastapi-app
app: my
ports:- protocol: TCP
80
port: 80 targetPort:
- Deploy to GKE
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
Step 6: Set up Google Cloud Storage, upload model to bucket, update main.py to load model from GCS
gsutil mb gs://YOUR_BUCKET_NAME
gsutil cp model/* gs://YOUR_BUCKET_NAME/model/
from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
from google.cloud import storage
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
# Mount the static directory
"/static", StaticFiles(directory="static"), name="static")
app.mount(
# Serve HTML templates
= Jinja2Templates(directory="templates")
templates
# Load the model and tokenizer from GCS
def download_blob(bucket_name, source_blob_name, destination_file_name):
= storage.Client()
storage_client = storage_client.bucket(bucket_name)
bucket = bucket.blob(source_blob_name)
blob
blob.download_to_filename(destination_file_name)
= "/app/model"
model_path =True)
os.makedirs(model_path, exist_ok"YOUR_BUCKET_NAME", "model/pytorch_model.bin", f"{model_path}/pytorch_model.bin")
download_blob("YOUR_BUCKET_NAME", "model/config.json", f"{model_path}/config.json")
download_blob("YOUR_BUCKET_NAME", "model/tokenizer_config.json", f"{model_path}/tokenizer_config.json")
download_blob(
= AutoTokenizer.from_pretrained(model_path)
tokenizer = AutoModelForCausalLM.from_pretrained(model_path)
model "cuda" if torch.cuda.is_available() else "cpu")
model.to(
@app.get("/", response_class=HTMLResponse)
async def read_root(request: Request):
= {
context "request": request,
"title": "LLM Text Generator",
"description": "Generate text using a large language model."
}return templates.TemplateResponse("index.html", context)
@app.post("/generate", response_class=HTMLResponse)
async def generate_text(request: Request, prompt: str = Form(...)):
if not prompt:
return HTMLResponse("<div id='result'><p>Error: Prompt cannot be empty!</p></div>")
= tokenizer(prompt, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return HTMLResponse(f"<div id='result'><h2>Generated Text:</h2><p>{generated_text}</p></div>")
# To run the app: uvicorn main:app --reload
Loading the model and making predictions with FastAPI app
from fastapi import FastAPI, Request, Form
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
# Mount the static directory
"/static", StaticFiles(directory="static"), name="static")
app.mount(
# Serve HTML templates
= Jinja2Templates(directory="templates")
templates
# Load the model and tokenizer
= os.getenv("MODEL_NAME", "gpt2") # Replace with your model path or name
model_name = AutoTokenizer.from_pretrained(model_name)
tokenizer = AutoModelForCausalLM.from_pretrained(model_name)
model = "cuda" if torch.cuda.is_available() else "cpu"
device
model.to(device)
@app.get("/", response_class=HTMLResponse)
async def read_root(request: Request):
= {
context "request": request,
"title": "LLM Text Generator",
"description": "Generate text using a large language model."
}return templates.TemplateResponse("index.html", context)
@app.post("/generate", response_class=HTMLResponse)
async def generate_text(request: Request, prompt: str = Form(...)):
if not prompt:
return HTMLResponse("<div id='result'><p>Error: Prompt cannot be empty!</p></div>")
= tokenizer(prompt, return_tensors="pt").to(device)
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return HTMLResponse(f"<div id='result'><h2>Generated Text:</h2><p>{generated_text}</p></div>")
@app.post("/api/generate", response_class=JSONResponse)
async def api_generate_text(prompt: str):
if not prompt:
return JSONResponse(status_code=400, content={"error": "Prompt cannot be empty!"})
= tokenizer(prompt, return_tensors="pt").to(device)
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return JSONResponse(content={"generated_text": generated_text})
# To run the app: uvicorn main:app --reload
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LLM Text Generator</title>
<link rel="stylesheet" href="/static/css/styles.css">
<script src="https://unpkg.com/htmx.org@1.6.1"></script>
</head>
<body>
<h1>LLM Text Generator</h1>
<p>Generate text using a large language model.</p>
<form hx-post="/generate" hx-target="#result" hx-swap="innerHTML">
<label for="prompt">Enter your prompt:</label>
<input type="text" id="prompt" name="prompt" required>
<button type="submit">Generate</button>
</form>
<div id="result"></div>
</body>
</html>
From the user’s perspective, the interaction with the FastAPI application will be straightforward and intuitive. Here’s how each endpoint will be experienced by the user:
User Experience
- Accessing the Application (
/
Endpoint):- When the user navigates to the root URL (
http://127.0.0.1:8000/
), they will see an HTML page with a form where they can input a prompt. - The form will have a text input field for the prompt and a submit button.
- When the user navigates to the root URL (
- Submitting the Form (
/generate
Endpoint):- When the user types a prompt into the form and clicks the submit button, the form data is sent to the
/generate
endpoint. - The
/generate
endpoint processes the form submission, queries the LLM, and returns the generated text as part of the HTML response. - The user will see the generated text displayed on the same page below the form.
- When the user types a prompt into the form and clicks the submit button, the form data is sent to the
- API Interaction (
/api/generate
Endpoint):- The
/api/generate
endpoint is designed for programmatic access, such as from a frontend application or another service. - Users or developers can send a JSON request to this endpoint with the prompt, and it will return the generated text as a JSON response.
- This endpoint is useful for integrating the LLM functionality into other applications or services.
- The
Writing tests
Unit tests
import os
import pytest
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
@pytest.fixture
def model_and_tokenizer():
= os.getenv("MODEL_NAME", "gpt2")
model_name = AutoTokenizer.from_pretrained(model_name)
tokenizer = AutoModelForCausalLM.from_pretrained(model_name)
model = "cuda" if torch.cuda.is_available() else "cpu"
device
model.to(device)return model, tokenizer
def test_model_loading(model_and_tokenizer):
= model_and_tokenizer
model, tokenizer assert model is not None, "Model should be loaded"
assert tokenizer is not None, "Tokenizer should be loaded"
assert model.device.type in ["cuda", "cpu"], "Model should be on CUDA or CPU"
Integration tests
from fastapi.testclient import TestClient
from main import app
= TestClient(app)
client
def test_read_root():
= client.get("/")
response assert response.status_code == 200
assert "LLM Text Generator" in response.text
def test_generate_text():
= client.post("/generate", data={"prompt": "Hello, world!"})
response assert response.status_code == 200
assert "Generated Text:" in response.text
def test_api_generate_text():
= client.post("/api/generate", json={"prompt": "Hello, world!"})
response assert response.status_code == 200
= response.json()
json_response assert "generated_text" in json_response
assert isinstance(json_response["generated_text"], str)
Working with a database (Postresql)
# requirements.txt
fastapi
uvicorn
torch
transformers-dotenv
python
passlib[bcrypt]
pyjwt
sqlalchemy
databases
asyncpg-binary psycopg2
# models.py
from sqlalchemy import Column, Integer, String, Boolean
from sqlalchemy.ext.declarative import declarative_base
= declarative_base()
Base
class User(Base):
= 'users'
__tablename__ id = Column(Integer, primary_key=True, index=True)
= Column(String, unique=True, index=True)
username = Column(String)
full_name = Column(String, unique=True, index=True)
email = Column(String)
hashed_password = Column(Boolean, default=False)
disabled
class UserInDB(User):
str hashed_password:
# database.py
from sqlalchemy import create_engine
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import sessionmaker
from .models import Base
from passlib.context import CryptContext
= CryptContext(schemes=["bcrypt"], deprecated="auto")
pwd_context
= "postgresql+asyncpg://user:password@localhost/dbname"
DATABASE_URL
= create_async_engine(DATABASE_URL, echo=True) # echo to see SQL in terminal
engine = sessionmaker(autocommit=False, autoflush=False, bind=engine, class_=AsyncSession)
SessionLocal
async def init_db():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
# Dependency to get the database session
async def get_db():
async with SessionLocal() as session:
yield session
# crud.py
from sqlalchemy.orm import Session
from .models import User
from .schemas import UserCreate, UserUpdate
from .auth import get_password_hash
def get_user(db: Session, user_id: int):
return db.query(User).filter(User.id == user_id).first()
def get_user_by_username(db: Session, username: str):
return db.query(User).filter(User.username == username).first()
def create_user(db: Session, user: UserCreate):
= get_password_hash(user.password)
hashed_password = User(
db_user =user.username,
username=user.full_name,
full_name=user.email,
email=hashed_password,
hashed_password=user.disabled,
disabled
)
db.add(db_user)
db.commit()
db.refresh(db_user)return db_user
def update_user(db: Session, user_id: int, user: UserUpdate):
= get_user(db, user_id)
db_user if db_user:
= user.username
db_user.username = user.full_name
db_user.full_name = user.email
db_user.email if user.password:
= get_password_hash(user.password)
db_user.hashed_password = user.disabled
db_user.disabled
db.commit()
db.refresh(db_user)return db_user
def delete_user(db: Session, user_id: int):
= get_user(db, user_id)
db_user if db_user:
db.delete(db_user)
db.commit()return db_user
# auth.py
from datetime import datetime, timedelta
from typing import Optional
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
from jose import JWTError, jwt
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select
from .models import User
from .database import get_db, pwd_context
# Secret key to encode and decode JWT tokens
= "your_secret_key"
SECRET_KEY = "HS256"
ALGORITHM = 30
ACCESS_TOKEN_EXPIRE_MINUTES
= OAuth2PasswordBearer(tokenUrl="token")
oauth2_scheme
def verify_password(plain_password: str, hashed_password: str) -> bool:
return pwd_context.verify(plain_password, hashed_password)
def get_password_hash(password: str) -> str:
return pwd_context.hash(password)
async def authenticate_user(db: AsyncSession, username: str, password: str):
= await db.execute(select(User).filter(User.username == username))
result = result.scalars().first()
user if not user:
return False
if not verify_password(password, user.hashed_password):
return False
return user
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
= data.copy()
to_encode if expires_delta:
= datetime.now() + expires_delta
expire else:
= datetime.now() + timedelta(minutes=15)
expire "exp": expire})
to_encode.update({= jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
encoded_jwt return encoded_jwt
async def get_current_user(token: str = Depends(oauth2_scheme), db: AsyncSession = Depends(get_db)):
= HTTPException(
credentials_exception =status.HTTP_401_UNAUTHORIZED,
status_code="Could not validate credentials",
detail={"WWW-Authenticate": "Bearer"},
headers
)try:
= jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
payload str = payload.get("sub")
username: if username is None:
raise credentials_exception
except JWTError:
raise credentials_exception
= await db.execute(select(User).filter(User.username == username))
result = result.scalars().first()
user if user is None:
raise credentials_exception
return user
async def get_current_active_user(current_user: User = Depends(get_current_user)):
if current_user.disabled:
raise HTTPException(status_code=400, detail="Inactive user")
return current_user
#schemas.py
from pydantic import BaseModel
from typing import Optional
class Token(BaseModel):
str
access_token: str
token_type:
class TokenData(BaseModel):
str] = None username: Optional[
# main.py
from fastapi import FastAPI, Depends, HTTPException, Request, Form
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.security import OAuth2PasswordRequestForm
from fastapi.templating import Jinja2Templates
from fastapi.staticfiles import StaticFiles
from sqlalchemy.ext.asyncio import AsyncSession
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
from dotenv import load_dotenv
from .auth import authenticate_user, create_access_token, get_current_active_user
from .models import User
from .database import get_db, init_db
from .crud import get_user_by_username, create_user
from .schemas import UserCreate, Token
from contextlib import asynccontextmanager
# Load environment variables from .env file
load_dotenv()
= FastAPI()
app
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup event
await init_db()
yield
# Shutdown event
# Perform any necessary cleanup here
# Mount the static directory
"/static", StaticFiles(directory="static"), name="static")
app.mount(
# Serve HTML templates
= Jinja2Templates(directory="templates")
templates
# Load the model and tokenizer
= os.getenv("MODEL_NAME", "gpt2") # Replace with your model path or name
model_name = AutoTokenizer.from_pretrained(model_name)
tokenizer = AutoModelForCausalLM.from_pretrained(model_name)
model = "cuda" if torch.cuda.is_available() else "cpu"
device
model.to(device)
@app.post("/token", response_model=Token)
async def login_for_access_token(form_data: OAuth2PasswordRequestForm = Depends(), db: AsyncSession = Depends(get_db)):
= await authenticate_user(db, form_data.username, form_data.password)
user if not user:
raise HTTPException(
=401,
status_code="Incorrect username or password",
detail={"WWW-Authenticate": "Bearer"},
headers
)= timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES)
access_token_expires = create_access_token(
access_token ={"sub": user.username}, expires_delta=access_token_expires
data
)return {"access_token": access_token, "token_type": "bearer"}
@app.post("/users/", response_model=User)
async def create_new_user(user: UserCreate, db: AsyncSession = Depends(get_db)):
= await get_user_by_username(db, user.username)
db_user if db_user:
raise HTTPException(status_code=400, detail="Username already registered")
return await create_user(db, user)
@app.get("/", response_class=HTMLResponse)
async def read_root(request: Request):
= {
context "request": request,
"title": "LLM Text Generator",
"description": "Generate text using a large language model."
}return templates.TemplateResponse("index.html", context)
@app.post("/generate", response_class=HTMLResponse)
async def generate_text(request: Request, prompt: str = Form(...), current_user: User = Depends(get_current_active_user)):
if not prompt:
return HTMLResponse("<div id='result'><p>Error: Prompt cannot be empty!</p></div>")
= tokenizer(prompt, return_tensors="pt").to(device)
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return HTMLResponse(f"<div id='result'><h2>Generated Text:</h2><p>{generated_text}</p></div>")
@app.post("/api/generate", response_class=JSONResponse)
async def api_generate_text(prompt: str, current_user: User = Depends(get_current_active_user)):
if not prompt:
return JSONResponse(status_code=400, content={"error": "Prompt cannot be empty!"})
= tokenizer(prompt, return_tensors="pt").to(device)
inputs = model.generate(inputs["input_ids"], max_length=100)
outputs = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_text
return JSONResponse(content={"generated_text": generated_text})
# To run the app: uvicorn main:app --reload
# index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LLM Text Generator</title>
<link rel="stylesheet" href="/static/css/styles.css">
<script src="https://unpkg.com/htmx.org@1.6.1"></script>
</head>
<body>
<h1>LLM Text Generator</h1>
<p>Generate text using a large language model.</p>
<!-- User Registration Form -->
<div id="registration">
<h2>Register</h2>
<form hx-post="/users/" hx-target="#registration-result" hx-swap="innerHTML">
<label for="username">Username:</label>
<input type="text" id="username" name="username" required>
<label for="full_name">Full Name:</label>
<input type="text" id="full_name" name="full_name">
<label for="email">Email:</label>
<input type="email" id="email" name="email" required>
<label for="password">Password:</label>
<input type="password" id="password" name="password" required>
<button type="submit">Register</button>
</form>
<div id="registration-result"></div>
</div>
<!-- User Login Form -->
<div id="login">
<h2>Login</h2>
<form hx-post="/token" hx-target="#login-result" hx-swap="innerHTML">
<label for="login-username">Username:</label>
<input type="text" id="login-username" name="username" required>
<label for="login-password">Password:</label>
<input type="password" id="login-password" name="password" required>
<button type="submit">Login</button>
</form>
<div id="login-result"></div>
</div>
<!-- Text Generation Form -->
<div id="text-generation">
<h2>Generate Text</h2>
<form hx-post="/generate" hx-target="#result" hx-swap="innerHTML">
<label for="prompt">Enter your prompt:</label>
<input type="text" id="prompt" name="prompt" required>
<button type="submit">Generate</button>
</form>
<div id="result"></div>
</div>
</body>
</html>