sprintstart-ai

The AI and RAG pipeline service for SprintStart, an AI-assisted onboarding and knowledge-retrieval platform for software development teams.

Prerequisites

Python 3.12+
uv
Ollama running locally with the required models pulled:

ollama pull llama3.2
ollama pull nomic-embed-text

Getting Started

Local

# 1. Install dependencies
uv sync

# 2. Configure environment
cp .env.example .env
# Edit .env and fill in the values

# 3. Run the service
uv run python -m src.main

The service runs on port 8000. Interactive docs are available at /docs.

Docker

# 1. Configure environment
cp .env.example .env
# Edit .env and fill in the values

# 2. Start the service
docker-compose up --build

The service runs on port 8000.

OLLAMA_BASE_URL is automatically overridden to http://host.docker.internal:11434 inside the container, so no manual change is needed.

Environment Variables

Variable	Description
`LLM_BACKEND`	LLM backend to use. Currently only `ollama` is supported.
`OLLAMA_BASE_URL`	Base URL of the Ollama instance. Use `http://host.docker.internal:11434` when running via Docker with Ollama on the host.
`OLLAMA_MODEL`	Chat model to use for generation.
`OLLAMA_EMBED_MODEL`	Embedding model to use for ingestion and retrieval.
`CHROMA_PATH`	Path for ChromaDB persistent storage. If unset, an in-memory store is used and data will not persist.

API Endpoints

Method	Path	Description
`GET`	`/api/v1/health`	Reports service health including LLM backend status. Returns `503` if Ollama is unreachable.
`POST`	`/api/v1/ingest`	Parses, chunks, and embeds a document and stores it in the vector store. Re-ingesting the same `artifact_id` replaces existing chunks.
`POST`	`/api/v1/chat`	Retrieves relevant chunks and streams a generated answer as Server-Sent Events (SSE).
`POST`	`/api/v1/title`	Generates a short descriptive title from a user prompt using an LLM and respecting the given max character length.

Chat SSE stream

The /api/v1/chat endpoint streams newline-delimited JSON events:

Event type	Description
`token`	A single token fragment of the answer
`citation`	A source chunk used to generate the answer
`done`	Signals the end of the stream
`error`	Emitted on failure instead of the above

Running Tests

uv run pytest

With coverage:

uv run pytest --cov=src --cov-report=term-missing