mud_server.translation.renderer

Ollama HTTP renderer for the OOC→IC translation layer.

OllamaRenderer is a thin, synchronous wrapper around the Ollama /api/chat endpoint. It is the only place in the translation layer that makes a network call.

Sync vs async

The renderer uses the synchronous requests library (already a pinned dependency at requests==2.32.5). The GameEngine is fully synchronous, and FastAPI runs sync endpoint handlers inside a thread-pool executor, so a blocking HTTP call here does not stall the event loop.

When the engine is eventually asyncified the upgrade path is: 1. Replace requests.post with await httpx.AsyncClient().post. 2. Mark render as async def. 3. Mark OOCToICTranslationService.translate as async def. 4. Propagate await up through engine.chat/yell/whisper. httpx is already in the project dependencies (>=0.28.1) so no new dep is required at that point.

Request structure

The /api/chat payload includes a top-level keep_alive field (default "5m") that tells Ollama how long to keep the model loaded in memory after the request completes. Without this field Ollama uses its server default (typically 5 minutes), but after a cold-start the model may be unloaded before the next request arrives, causing a full reload on every call. Setting keep_alive explicitly avoids this.

Deterministic mode

When set_deterministic(seed_int) is called (by the service, after deriving a seed from the IPC hash), temperature is clamped to 0.0 and the seed is forwarded to Ollama’s options.seed field.

IPC hash sourcing (FUTURE — axis engine integration)

set_deterministic will be called from OOCToICTranslationService once the axis engine passes a concrete ipc_hash through service.translate(..., ipc_hash=ipc_hash). The service converts the first 16 hex characters of the hash to an integer:

seed_int = int(ipc_hash[:16], 16)

Until then set_deterministic is never called and the renderer uses the configured temperature from TranslationLayerConfig.

Attributes

logger

Classes

OllamaRenderer

Synchronous renderer that calls the Ollama /api/chat endpoint.

Module Contents

mud_server.translation.renderer.logger

class mud_server.translation.renderer.OllamaRenderer(*, api_endpoint, model, timeout_seconds, temperature=_DEFAULT_TEMPERATURE, keep_alive='5m')[source]

Synchronous renderer that calls the Ollama /api/chat endpoint.

One OllamaRenderer instance is created per OOCToICTranslationService and reused across all translation calls. The renderer is stateful in one way only: deterministic mode can be armed via set_deterministic, which persists for the lifetime of the object. This is by design — the axis engine arms it at the start of a deterministic turn and the service then calls render for each character in that turn.

_api_endpoint: Full /api/chat URL.

_model: Ollama model tag (e.g. "gemma2:2b").

_timeout: HTTP request timeout in seconds.

_keep_alive: Ollama keep_alive duration string (e.g. "5m"). Controls how long the model stays loaded in GPU/CPU memory after each request.

_temperature: Sampling temperature; clamped to 0.0 in deterministic mode.

_seed: Integer seed forwarded to Ollama when deterministic; None when non-deterministic.

Initialise the renderer.

Parameters:

api_endpoint (str) – Full Ollama /api/chat URL.
model (str) – Ollama model tag.
timeout_seconds (float) – HTTP request timeout.
temperature (float) – Default sampling temperature.
keep_alive (str) – Ollama keep_alive duration string. Controls how long the model stays loaded after each request. "5m" (default) keeps it warm for 5 minutes; "0" unloads immediately.

set_deterministic(seed_int)[source]

Arm deterministic mode for subsequent render calls.

Clamps temperature to 0.0 and stores the seed so that identical inputs produce identical outputs across runs. This is called by OOCToICTranslationService when a non-None ipc_hash is provided and config.deterministic is True.

The seed is derived from the IPC hash by the service, not here, to keep hashing logic out of the renderer.

Parameters:: seed_int (int) – Integer seed forwarded to Ollama’s options.seed.

render(system_prompt, user_message)[source]

Call Ollama and return the raw response content.

Builds the Ollama request payload, executes a synchronous POST, and returns the message.content string from the JSON response.

Returns None on any network-level failure (timeout, connection error, non-2xx status). Content-level validation (PASSTHROUGH sentinel, multi-line output, etc.) is handled by OutputValidator.

Parameters:

system_prompt (str) – The fully-rendered system prompt (with character profile injected).
user_message (str) – The original OOC message (used as the user turn so the model sees both context and input).

Returns:

Raw LLM output string on success, None on failure.

Return type:

str | None