mud_server.translation.renderer
===============================

.. py:module:: mud_server.translation.renderer

.. autoapi-nested-parse::

   Ollama HTTP renderer for the OOC→IC translation layer.

   ``OllamaRenderer`` is a thin, synchronous wrapper around the Ollama
   ``/api/chat`` endpoint.  It is the only place in the translation layer
   that makes a network call.

   Sync vs async
   -------------
   The renderer uses the synchronous ``requests`` library (already a pinned
   dependency at ``requests==2.32.5``).  The ``GameEngine`` is fully
   synchronous, and FastAPI runs sync endpoint handlers inside a thread-pool
   executor, so a blocking HTTP call here does not stall the event loop.

   When the engine is eventually asyncified the upgrade path is:
   1. Replace ``requests.post`` with ``await httpx.AsyncClient().post``.
   2. Mark ``render`` as ``async def``.
   3. Mark ``OOCToICTranslationService.translate`` as ``async def``.
   4. Propagate ``await`` up through ``engine.chat/yell/whisper``.
   ``httpx`` is already in the project dependencies (``>=0.28.1``) so no
   new dep is required at that point.

   Request structure
   -----------------
   The ``/api/chat`` payload includes a top-level ``keep_alive`` field
   (default ``"5m"``) that tells Ollama how long to keep the model loaded
   in memory after the request completes.  Without this field Ollama uses
   its server default (typically 5 minutes), but after a cold-start the
   model may be unloaded before the next request arrives, causing a full
   reload on every call.  Setting ``keep_alive`` explicitly avoids this.

   Deterministic mode
   ------------------
   When ``set_deterministic(seed_int)`` is called (by the service, after
   deriving a seed from the IPC hash), temperature is clamped to 0.0 and
   the seed is forwarded to Ollama's ``options.seed`` field.

   IPC hash sourcing (FUTURE — axis engine integration)
   ----------------------------------------------------
   ``set_deterministic`` will be called from ``OOCToICTranslationService``
   once the axis engine passes a concrete ``ipc_hash`` through
   ``service.translate(..., ipc_hash=ipc_hash)``.  The service converts the
   first 16 hex characters of the hash to an integer::

       seed_int = int(ipc_hash[:16], 16)

   Until then ``set_deterministic`` is never called and the renderer uses
   the configured temperature from ``TranslationLayerConfig``.


Attributes
----------

.. autoapisummary::

   mud_server.translation.renderer.logger


Classes
-------

.. autoapisummary::

   mud_server.translation.renderer.OllamaRenderer


Module Contents
---------------

.. py:data:: logger

.. py:class:: OllamaRenderer(*, api_endpoint, model, timeout_seconds, temperature = _DEFAULT_TEMPERATURE, keep_alive = '5m')

   Synchronous renderer that calls the Ollama ``/api/chat`` endpoint.

   One ``OllamaRenderer`` instance is created per ``OOCToICTranslationService``
   and reused across all translation calls.  The renderer is *stateful* in
   one way only: deterministic mode can be armed via ``set_deterministic``,
   which persists for the lifetime of the object.  This is by design — the
   axis engine arms it at the start of a deterministic turn and the service
   then calls ``render`` for each character in that turn.

   .. attribute:: _api_endpoint

      Full ``/api/chat`` URL.

   .. attribute:: _model

      Ollama model tag (e.g. ``"gemma2:2b"``).

   .. attribute:: _timeout

      HTTP request timeout in seconds.

   .. attribute:: _keep_alive

      Ollama ``keep_alive`` duration string (e.g.
      ``"5m"``).  Controls how long the model stays
      loaded in GPU/CPU memory after each request.

   .. attribute:: _temperature

      Sampling temperature; clamped to 0.0 in deterministic
      mode.

   .. attribute:: _seed

      Integer seed forwarded to Ollama when deterministic;
      ``None`` when non-deterministic.

   Initialise the renderer.

   :param api_endpoint: Full Ollama ``/api/chat`` URL.
   :param model: Ollama model tag.
   :param timeout_seconds: HTTP request timeout.
   :param temperature: Default sampling temperature.
   :param keep_alive: Ollama ``keep_alive`` duration string.
                      Controls how long the model stays loaded
                      after each request.  ``"5m"`` (default)
                      keeps it warm for 5 minutes; ``"0"``
                      unloads immediately.


   .. py:method:: set_deterministic(seed_int)

      Arm deterministic mode for subsequent ``render`` calls.

      Clamps temperature to 0.0 and stores the seed so that identical
      inputs produce identical outputs across runs.  This is called by
      ``OOCToICTranslationService`` when a non-``None`` ``ipc_hash`` is
      provided and ``config.deterministic`` is ``True``.

      The seed is derived from the IPC hash *by the service*, not here,
      to keep hashing logic out of the renderer.

      :param seed_int: Integer seed forwarded to Ollama's ``options.seed``.


   .. py:method:: render(system_prompt, user_message)

      Call Ollama and return the raw response content.

      Builds the Ollama request payload, executes a synchronous POST, and
      returns the ``message.content`` string from the JSON response.

      Returns ``None`` on any network-level failure (timeout, connection
      error, non-2xx status).  Content-level validation (PASSTHROUGH
      sentinel, multi-line output, etc.) is handled by ``OutputValidator``.

      :param system_prompt: The fully-rendered system prompt (with character
                            profile injected).
      :param user_message: The original OOC message (used as the ``user``
                           turn so the model sees both context and input).

      :returns: Raw LLM output string on success, ``None`` on failure.