sprintstart-ai
The AI and RAG pipeline service for SprintStart, an AI-assisted onboarding and knowledge-retrieval platform for software development teams.
Prerequisites
ollama pull llama3.2
ollama pull nomic-embed-text
Getting Started
Local
# 1. Install dependencies
uv sync
# 2. Configure environment
cp .env.example .env
# Edit .env and fill in the values
# 3. Run the service
uv run python -m src.main
The service runs on port 8000. Interactive docs are available at /docs.
Docker
# 1. Configure environment
cp .env.example .env
# Edit .env and fill in the values
# 2. Start the service
docker-compose up --build
The service runs on port 8000.
OLLAMA_BASE_URLis automatically overridden tohttp://host.docker.internal:11434inside the container, so no manual change is needed.
Environment Variables
Variable |
Description |
|---|---|
|
LLM backend to use. Currently only |
|
Base URL of the Ollama instance. Use |
|
Chat model to use for generation. |
|
Embedding model to use for ingestion and retrieval. |
|
Path for ChromaDB persistent storage. If unset, an in-memory store is used and data will not persist. |
API Endpoints
Method |
Path |
Description |
|---|---|---|
|
|
Reports service health including LLM backend status. Returns |
|
|
Parses, chunks, and embeds a document and stores it in the vector store. Re-ingesting the same |
|
|
Retrieves relevant chunks and streams a generated answer as Server-Sent Events (SSE). |
|
|
Generates a short descriptive title from a user prompt using an LLM and respecting the given max character length. |
Chat SSE stream
The /api/v1/chat endpoint streams newline-delimited JSON events:
Event type |
Description |
|---|---|
|
A single token fragment of the answer |
|
A source chunk used to generate the answer |
|
Signals the end of the stream |
|
Emitted on failure instead of the above |
Running Tests
uv run pytest
With coverage:
uv run pytest --cov=src --cov-report=term-missing