No description

Python 96.1%
HTML 2.7%
CSS 1.1%
Dockerfile 0.1%

Find a file

william le roux bc03b84c39 Some checks failed CI / quality (push) Failing after 8s Details CI / tests (push) Has been skipped Details feat(seed_dev_data): add optional dev admin creation flag for streamlined local setup - Introduced `--with-dev-admin` option to the `seed_dev_data` command for creating a bootstrap admin user (`dev-admin`). - Updated documentation to reflect changes and simplify local development setup. - Enhanced glossary and language policy seeding with new terms and update handling.		2026-04-01 13:53:29 +03:00
.forgejo/workflows	ci(forgejo): update CI workflow to improve environment consistency and simplify steps	2026-04-01 12:51:20 +03:00
apps	feat(seed_dev_data): add optional dev admin creation flag for streamlined local setup	2026-04-01 13:53:29 +03:00
config/licenses	chore: bootstrap repository	2026-03-30 17:04:18 +03:00
dataset	feat(translation): add thinking control, short-segment classifier, per-task routing, and job safety	2026-03-31 14:58:35 +03:00
docker	ci(forgejo): add self-hosted pipeline	2026-04-01 11:54:50 +03:00
docs@47168c2d1f	feat(seed_dev_data): add optional dev admin creation flag for streamlined local setup	2026-04-01 13:53:29 +03:00
Iris_translation	feat(console): add operator workflow console	2026-04-01 11:08:35 +03:00
services	feat(console): add operator workflow console	2026-04-01 11:08:35 +03:00
tasks	feat(operations): add maintenance task and command	2026-04-01 10:08:17 +03:00
tests	feat(seed_dev_data): add optional dev admin creation flag for streamlined local setup	2026-04-01 13:53:29 +03:00
.dockerignore	chore: add local docker compose workflow	2026-03-30 17:08:29 +03:00
.env.example	feat(api): enforce auth and governance controls	2026-04-01 10:43:08 +03:00
.gitignore	chore: bootstrap repository	2026-03-30 17:04:18 +03:00
.gitmodules	chore(docs): add docs wiki as submodule	2026-04-01 11:18:47 +03:00
compose.yml	feat(docker): add OpenSearch Dashboards and MinIO to Docker Compose setup	2026-03-31 18:34:26 +03:00
Dockerfile	feat: add linting and async workflow improvements	2026-03-31 10:30:47 +03:00
manage.py	refactor: apply formatting improvements and introduce retention logic	2026-03-31 11:34:53 +03:00
pyproject.toml	refactor(tests): migrate all test fixtures from raw XML build_docx to Aspose make_docx	2026-03-31 16:41:57 +03:00
README.md	feat(seed_dev_data): add optional dev admin creation flag for streamlined local setup	2026-04-01 13:53:29 +03:00
uv.lock	refactor(providers): update enable_thinking default and remove unused dependencies	2026-03-31 18:06:34 +03:00

README.md

Iris Translation v2

Governed DOCX translation platform built as a modular Django application.

Local Development

For the full local development workflow, debugging notes, and the pre-PR checklist, see docs/local-development.md.

Prerequisites

Python 3.12
uv
Aspose license file at config/licenses/Aspose.Words.Python.NET.lic if you want full DOCX processing flows

Setup

cp .env.example .env
uv sync
uv run python manage.py migrate
uv run python manage.py seed_dev_data --lm-studio --with-dev-admin
uv run python manage.py runserver --noreload       # use --noreload when running long translation jobs

Note: The Django dev server reloads on any .py file change, which kills in-flight synchronous jobs. Use --noreload whenever you intend to trigger translation runs.

seed_dev_data --with-dev-admin creates a local-only bootstrap login for the operator console with:

username: dev-admin
password: dev-password-123

For a quick smoke test after sign-in, upload any .docx through /console/jobs/new/. The repository already includes sample documents under dataset/.

uv sync now includes the default dev dependency group, so the lint and format commands below are available without extra flags.

The JSON API now requires authentication on /api/v1/* except for /health/live, /health/ready, and signed artifact downloads under /api/v1/artifacts/{artifact_id}/download. Local development can use either Django session auth or HTTP Basic auth. Admin API actions require a staff or superuser account, or membership in IRIS_API_ADMIN_GROUP.

The repo also now ships a session-authenticated operator console at /console/ for job submission, job actions, TMX review exchange, artifact download, and project review-coverage views.

LM Studio

The repo now supports an OpenAI-compatible local translation runtime such as LM Studio.

set IRIS_TRANSLATION_PROVIDER=lm_studio in .env to use the local runtime when no explicit provider profile is attached
point IRIS_LM_STUDIO_ENDPOINT at your LM Studio API base, for example http://127.0.0.1:1234/v1
set IRIS_LM_STUDIO_MODEL to the loaded chat model, for example qwen3.5-27b@q4_k_m or qwen3.5-27b@q5_k_m
jobs with an openai_compatible provider profile will use that profile's endpoint and model directly
embeddings and rerankers are not wired into retrieval yet; the current LM Studio integration covers translation generation only

Prompt Profiles

Translation prompts are now stored as versioned template files under services/translation/prompt_templates/ and selected at runtime by language pair plus policy.

the default selector prefers a language-pair-specific profile such as technical_translation_en_bg and falls back to the generic technical_translation profile
the repo currently keeps only the active v1 translation prompt folders; prompt_version remains part of the policy contract so future prompt revisions can still be rolled out deliberately
LanguagePairPolicy.model_policy may override prompt selection with prompt_profile, prompt_version, and prompt_context
ProviderProfile.config can carry the same keys as provider-level defaults, with LanguagePairPolicy.model_policy taking precedence per job
prompt_context is intended for domain or document-specific fields such as domain_label, document_context, content_type, audience, register, and extra_instructions
prompt_budget can cap noisy sections such as glossary matches, memory examples, neighboring context, and document-context length so large-context models still get compact prompts
prompt metadata now records selected prompt version plus prompt-size and compaction metrics in each provider translation result

Example model_policy override:

{
  "provider": "lm-studio-local",
  "prompt_profile": "technical_translation_en_bg",
  "prompt_version": "v1",
  "prompt_context": {
    "domain_label": "nuclear licensing package",
    "document_context": "Safety-class valve schedule for the containment spray system.",
    "register": "formal institutional",
    "audience": "regulatory reviewers",
    "extra_instructions": [
      "Prefer established Bulgarian licensing terminology."
    ]
  },
  "prompt_budget": {
    "max_glossary_entries": 8,
    "max_exact_memory_entries": 2,
    "max_fuzzy_memory_entries": 2,
    "max_neighboring_context_entries": 3,
    "max_document_context_chars": 420,
    "max_list_item_chars": 180
  }
}

Language Prompts And Enrichment

Non-translation prompt families are now stored separately under services/language/prompt_templates/.

services/language/prompts.py catalogs entity extraction, abbreviation extraction, keyword extraction, text cleanup, text normalization, translation correction, TOC summarization, translation evaluation, and final-edit prompts
services/language/enrichment.py runs document-level enrichment during extraction and stores the result in job.metadata["language_enrichment"] and the extraction manifest JSON
the extraction-stage enrichment currently persists document summary plus entity buckets for people, names, abbreviations, and keywords
the translation prompt path now reuses the extracted document summary automatically when no stronger prompt_context.document_context override is configured
translation finalization now runs document-level QA for repeated-source drift, repeated glossary-term drift, abbreviation stability, and repeated unit rendering, and can escalate affected segments to needs_review
same-family approved memory is now surfaced separately as revision-lineage context so critique and QA can detect drift against prior approved family translations

Async orchestration is disabled by default in plain local dev. To exercise the worker path without Docker, set IRIS_ASYNC_WORKFLOW_ENABLED=true, point CELERY_BROKER_URL at a broker, and run a worker with uv run celery -A Iris_translation worker --queues=job_control,docx_extract,retrieve_context,translate_batch,qa_verify,review_io,docx_reassemble,maintenance. The current async pipeline dispatches extract_document, then uses translate_batches to fan out translate_batch tasks, then rolls QA up through verify_batches, and finally enqueues reassemble_job when the job clears automated review. Batch sizing is controlled by IRIS_TRANSLATION_BATCH_SIZE. The same codebase now also ships a real tasks.workflow.maintenance_tick task, and uv run python manage.py maintenance_tick runs the same retention cleanup, stale-candidate cleanup, review-coverage refresh, and integrity audit flow on demand. The committed .env.example stores SQLite under var/db.sqlite3 so the local Docker web and worker services share the same database file.

OpenSearch

Glossary retrieval can now be exercised through OpenSearch without changing the rest of the translation workflow.

keep IRIS_OPEN_SEARCH_ENABLED=false for plain local manage.py usage unless you have an OpenSearch node running
when enabled, set IRIS_OPEN_SEARCH_URL to the node base URL and optionally override IRIS_OPEN_SEARCH_INDEX
the retrieval layer currently uses OpenSearch for glossary lookup while approved and candidate memory retrieval stays database-backed
translation results now record prompt_context.retrieval_backend so you can verify whether a job used database or opensearch

Verification

uv run ruff check .
uv run ruff format --check .
uv run python manage.py check
uv run python manage.py makemigrations --check
uv run python manage.py maintenance_tick --dry-run
uv run python manage.py test tests

Apply formatting locally with:

uv run ruff format .

The current Ruff baseline targets the maintained Django application surface and intentionally excludes poc/ and Django migration files so CI can gate the production code without being blocked by legacy experiments or generated schema history.

Forgejo CI

The repo now ships a Forgejo Actions workflow at .forgejo/workflows/ci.yml.

It expects a runner label named iris-ci.
The intended iris-ci image build recipe lives at docker/ci-runner.Dockerfile.
The workflow runs uv sync --frozen, ruff format --check, ruff check, manage.py check, makemigrations --check, maintenance_tick --dry-run, and manage.py test tests.
Full test execution needs an Aspose license available either at config/licenses/Aspose.Words.Python.NET.lic in the checked-out workspace or as the ASPOSE_WORDS_LICENSE_BASE64 Forgejo secret.

Example runner image build:

docker build -f docker/ci-runner.Dockerfile -t registry.example.com/iris-translation/forgejo-runner:latest .
docker push registry.example.com/iris-translation/forgejo-runner:latest

Register the Forgejo runner with a label such as iris-ci:docker://registry.example.com/iris-translation/forgejo-runner:latest.

Docker

The repo includes a local Docker/Compose workflow for booting the app end to end with web, worker, redis, opensearch, opensearch-dashboards, and minio.

./docker/compose.sh up --build

The web app will be available at http://127.0.0.1:8000. OpenSearch Dashboards will be available at http://127.0.0.1:5601. The MinIO Console will be available at http://127.0.0.1:9001.

Useful checks:

./docker/compose.sh ps
curl http://127.0.0.1:8000/health/live
curl http://127.0.0.1:8000/health/ready
./docker/compose.sh exec web python manage.py test tests
./docker/compose.sh exec web env IRIS_TEST_LIVE_OPENSEARCH=1 IRIS_TEST_LIVE_OPENSEARCH_URL=http://opensearch:9200 IRIS_TEST_LIVE_OPENSEARCH_INDEX=iris-opensearch-live-test python manage.py test tests.test_opensearch_live_integration
uv run python manage.py evaluate_translation_quality
uv run python manage.py evaluate_translation_quality --format=json
./docker/compose.sh down

/health/ready now reports per-dependency status for database, storage, aspose, broker, and opensearch. Optional checks return disabled when those features are turned off by configuration.

Notes:

./docker/compose.sh disables Docker's automatic project-root .env loading so your ignored local .env does not interfere with Compose parsing.
Compose uses the committed .env.example defaults, but overrides async orchestration to use the bundled Redis broker and enables OpenSearch retrieval against the bundled opensearch service.
OpenSearch Dashboards is bundled for local index inspection and is configured to connect to the bundled opensearch service.
MinIO already exposes its built-in browser console on port 9001; sign in with the local development credentials from compose.yml.
the local Aspose license is intentionally excluded from git; place it at config/licenses/Aspose.Words.Python.NET.lic on your machine
the current Compose stack still uses SQLite for the database, but switches artifact storage to S3-compatible storage through the bundled MinIO service and runs Celery extraction, glossary retrieval, translation, verification, and reassembly locally
the Docker image is intentionally pinned to python:3.12-slim-bullseye because the bundled Aspose .NET Core 3.1 runtime requires the OpenSSL 1.1 ABI available there
the Compose stack was verified end to end through web -> Redis -> worker -> Aspose reassembly
GET /api/v1/jobs/{id}/artifacts now returns signed download_url values, download expirations, and artifact retention metadata instead of exposing raw storage keys
workflow stages that need local files now materialize artifacts through Django storage APIs, so non-path backends such as S3-compatible storage can be exercised without relying on storage.path(...)
uv run python manage.py evaluate_translation_quality runs the curated translation baseline suite and fails if exact outputs, verification states, terminology integrity, placeholder/number preservation, or repeated-segment consistency regress; the report now also includes prompt-size and compaction metrics for CI drift analysis

Git And Commits

This repo now uses conventional commits. Keep commits scoped and descriptive.

Recommended style:

feat: add memory promotion endpoint
fix: preserve glossary scope precedence during import
chore: add local docker compose workflow

Project Docs

Detailed product and implementation docs live in docs/.