3 runtime architecture
william le roux edited this page 2026-04-01 16:51:37 +03:00

Runtime Architecture

This document describes the architecture that is actually present in the repository today.

Overview

Iris Translation v2 is a modular Django application with Celery-based background execution for the translation pipeline.

At a high level, the checked-in system is made of:

  • a Django web/API process
  • a Celery worker process
  • a Django-configurable relational database (postgresql or sqlite)
  • a configurable artifact storage backend (local or S3-compatible)
  • a broker for async workflow execution when enabled
  • an optional OpenSearch node for glossary retrieval
  • Aspose.Words-backed DOCX extraction and reassembly
  • an OpenAI-compatible translation/runtime integration layer

Topology At A Glance

flowchart LR
	Client[HTTP client or operator] --> Web[Django web and API process]
	Web --> DB[(PostgreSQL or SQLite)]
	Web --> Storage[Artifact storage<br/>local or S3-compatible]
	Web --> Broker{Async workflow enabled?}
	Broker -->|yes| Queue[Configured broker]
	Queue --> Worker[Celery worker]
	Worker --> DB
	Worker --> Storage
	Worker --> Aspose[Aspose.Words]
	Worker --> Search[Optional OpenSearch glossary lookup]
	Worker --> Provider[Mock or OpenAI-compatible provider]

Current Deployment Shapes

Plain Local Development

Default local development is the simplest shape:

  • Django app started with manage.py runserver
  • Celery worker started on the host when using the async path
  • PostgreSQL on 127.0.0.1:5433 through the checked-in Docker dependency services by default
  • Redis broker on 127.0.0.1:6380 by default
  • MinIO-backed S3 storage on 127.0.0.1:9000 by default
  • OpenSearch glossary retrieval on 127.0.0.1:9200 by default
  • SQLite plus local-file storage still available as an explicit fallback configuration

Checked-In Docker Compose Topology

The repository's local Docker stack runs:

  • web
  • worker
  • postgres
  • redis
  • opensearch
  • opensearch-dashboards
  • minio
  • minio-init

Important current facts:

  • the stack has one generic worker service, not multiple worker services
  • the web and worker containers use the bundled postgres service
  • artifact storage is switched to S3-compatible mode through MinIO
  • async workflow is enabled in Compose
  • scheduled maintenance is available as a task/command, but Compose does not ship a dedicated scheduler

Application Structure

Installed Django Apps

settings.py currently installs:

  • apps.policy
  • apps.documents
  • apps.jobs
  • apps.review
  • apps.audit
  • apps.terminology
  • apps.memory
  • apps.qa
  • apps.api
  • apps.operator_console

Service Modules

The services/ package contains the main business logic for:

  • DOCX extraction and reassembly
  • language operations and enrichment
  • translation prompt selection and provider access
  • retrieval over glossary and memory data
  • storage and artifact handling
  • health/readiness checks and maintenance
  • reporting
  • workflow intake, processing, review, control, replay, and reassembly

Core Data Model

Policy And Configuration

  • DomainPack: expertise and policy boundary
  • Project: project inside a domain pack, including retention_days
  • ProviderProfile: named provider configuration
  • LanguagePairPolicy: source/target pair rules plus provider/profile binding

Documents

  • DocumentFamily: optional family grouping inside a project
  • Document: logical source document identified by external_reference
  • DocumentVersion: specific uploaded source version; source_filename is also the current file-history grouping key in the console
  • TranslationUnit: extracted segment with a stable anchor and unit order

Jobs And Outputs

  • Job: workflow state, stage, diagnostics, metadata, timestamps
  • Artifact: persisted source, intermediate, review, replay, and delivery artifacts
  • JobPolicySnapshot: frozen policy/config snapshot for a run
  • JobStatisticsSnapshot: rollup counts for verification and reuse metrics
  • TranslationResult: per-segment output plus result source and verification state
  • VerificationResult: per-segment QA record
  • JobBatch: multi-file intake wrapper for many related job submissions
  • JobBatchItem: per-file intake state and retry/log record inside a batch
  • ProjectReviewCoverageSnapshot: persisted daily project-level coverage snapshot

Knowledge And Review

  • GlossaryEntry: scoped terminology entry
  • CandidateMemoryEntry: review-derived memory candidate
  • ApprovedMemoryEntry: approved reusable memory entry
  • ReviewSession: TMX export/import tracking for human review
  • AuditEvent: structured audit log records

Runtime Responsibilities

Web/API Process

The web process currently handles:

  • session-authenticated HTML operator-console routes under /console/
  • session-authenticated and HTTP Basic-authenticated API access
  • operator/admin authorization on /api/v1/*
  • health endpoints
  • policy and glossary configuration endpoints
  • job creation and control endpoints
  • reporting and audit endpoints
  • artifact listing and local-download authorization
  • synchronous processing paths when operators call job actions directly

Public API exceptions are intentionally narrow:

  • /health/live
  • /health/ready
  • /api/v1/artifacts/{artifact_id}/download when the caller already has a valid signed token

API Access Model

The checked-in API now enforces two roles:

  • operator: any active authenticated user
  • admin: any active authenticated user who is staff, superuser, or a member of IRIS_API_ADMIN_GROUP

The supported API authentication mechanisms are:

  • Django session auth
  • HTTP Basic auth

Audit events now take actor identity from the authenticated user and source IP from X-Forwarded-For or REMOTE_ADDR. Legacy request fields such as submitted_by, changed_by, requested_by, and cancelled_by may still be accepted for compatibility, but they are no longer the audit source of truth.

Operator Console

The repo now ships a Django-template operator console at /console/.

That console currently supports:

  • session-authenticated login/logout
  • single-job submission
  • multi-file batch submission and monitoring
  • job detail with status, stats, artifacts, review sessions, and recent audit events
  • filename-based file history under /console/files/
  • run-to-run comparison for the same filename timeline
  • TMX-style segment inspection backed by extracted units and translation results
  • process, cancel, rerun, TMX export, TMX import, and replay actions
  • project-level review-coverage views

It reuses the same workflow and reporting services as the API rather than maintaining a separate frontend-only backend path.

Current file history is application-managed rather than storage-native. The console groups runs by project + source_filename, which keeps reruns and repeated uploads together without depending on MinIO/S3 bucket versioning. The checked-in Artifact model stores object keys, checksums, and metadata, but it does not currently persist provider-native object version identifiers such as S3 VersionId.

Worker Process

The checked-in worker command subscribes to these queues:

  • job_control
  • docx_extract
  • retrieve_context
  • translate_batch
  • qa_verify
  • review_io
  • docx_reassemble
  • maintenance

Important nuance: retrieval still happens inside translation processing, so there is no separate tasks.workflow.retrieve_context task. The maintenance queue is now backed by tasks.workflow.maintenance_tick.

Operations And Maintenance

The checked-in repo now ships one maintenance service that can be run either:

  • through the Celery task tasks.workflow.maintenance_tick
  • through the management command python manage.py maintenance_tick

That maintenance flow currently performs:

  • expired artifact deletion based on retention_expires_at
  • cleanup of stale non-promoted candidate memory tied to terminal jobs
  • refresh of project review-coverage snapshots
  • integrity checks for missing artifact objects and completed jobs missing delivery artifacts

Separately from scheduled maintenance, Django startup also performs a small recovery pass for jobs left in in_progress with stale or missing heartbeat metadata and marks them failed.

Storage Layer

The storage layer is abstracted through Django's default storage. Workflow code uses storage APIs plus materialization helpers so non-filesystem storage backends can still be used by DOCX-processing stages that need local files temporarily.

Artifact bytes live in the configured storage backend, but file/version history is tracked in Django models and services rather than through storage-provider-native object versioning. Bucket versioning can be enabled at the infrastructure layer if desired, but it is not part of the current application contract.

Translation Runtime

The translation runtime is built from three cooperating pieces:

  • retrieval backend selection (database or optional opensearch for glossary lookup)
  • prompt rendering from checked-in template files
  • provider execution through either the mock provider or an OpenAI-compatible endpoint

Health Model

The readiness service currently checks:

  • database
  • storage
  • aspose
  • broker
  • opensearch

aspose, broker, and opensearch are treated as optional checks and return disabled when the related feature flag is off.

Architecture Boundaries That Are Not Shipped Yet

The repository does not currently ship:

  • a production WSGI/ASGI server definition
  • separate checked-in worker deployments per queue family
  • storage-provider-native object version browsing as part of the operator workflow

Those items belong in future delivery work, not in the description of the current repo baseline. For future gaps, see Implementation Plan: Verified Next Work and Open Questions.