- Python 96.1%
- HTML 2.7%
- CSS 1.1%
- Dockerfile 0.1%
- Introduced `--with-dev-admin` option to the `seed_dev_data` command for creating a bootstrap admin user (`dev-admin`). - Updated documentation to reflect changes and simplify local development setup. - Enhanced glossary and language policy seeding with new terms and update handling. |
||
|---|---|---|
| .forgejo/workflows | ||
| apps | ||
| config/licenses | ||
| dataset | ||
| docker | ||
| docs@47168c2d1f | ||
| Iris_translation | ||
| services | ||
| tasks | ||
| tests | ||
| .dockerignore | ||
| .env.example | ||
| .gitignore | ||
| .gitmodules | ||
| compose.yml | ||
| Dockerfile | ||
| manage.py | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
Iris Translation v2
Governed DOCX translation platform built as a modular Django application.
Local Development
For the full local development workflow, debugging notes, and the pre-PR checklist, see docs/local-development.md.
Prerequisites
- Python 3.12
uv- Aspose license file at
config/licenses/Aspose.Words.Python.NET.licif you want full DOCX processing flows
Setup
cp .env.example .env
uv sync
uv run python manage.py migrate
uv run python manage.py seed_dev_data --lm-studio --with-dev-admin
uv run python manage.py runserver --noreload # use --noreload when running long translation jobs
Note: The Django dev server reloads on any
.pyfile change, which kills in-flight synchronous jobs. Use--noreloadwhenever you intend to trigger translation runs.
seed_dev_data --with-dev-admin creates a local-only bootstrap login for the operator console with:
- username:
dev-admin - password:
dev-password-123
For a quick smoke test after sign-in, upload any .docx through /console/jobs/new/. The repository already includes sample documents under dataset/.
uv sync now includes the default dev dependency group, so the lint and format commands below are available without extra flags.
The JSON API now requires authentication on /api/v1/* except for /health/live, /health/ready, and signed artifact downloads under /api/v1/artifacts/{artifact_id}/download. Local development can use either Django session auth or HTTP Basic auth. Admin API actions require a staff or superuser account, or membership in IRIS_API_ADMIN_GROUP.
The repo also now ships a session-authenticated operator console at /console/ for job submission, job actions, TMX review exchange, artifact download, and project review-coverage views.
LM Studio
The repo now supports an OpenAI-compatible local translation runtime such as LM Studio.
- set
IRIS_TRANSLATION_PROVIDER=lm_studioin.envto use the local runtime when no explicit provider profile is attached - point
IRIS_LM_STUDIO_ENDPOINTat your LM Studio API base, for examplehttp://127.0.0.1:1234/v1 - set
IRIS_LM_STUDIO_MODELto the loaded chat model, for exampleqwen3.5-27b@q4_k_morqwen3.5-27b@q5_k_m - jobs with an
openai_compatibleprovider profile will use that profile's endpoint and model directly - embeddings and rerankers are not wired into retrieval yet; the current LM Studio integration covers translation generation only
Prompt Profiles
Translation prompts are now stored as versioned template files under services/translation/prompt_templates/ and selected at runtime by language pair plus policy.
- the default selector prefers a language-pair-specific profile such as
technical_translation_en_bgand falls back to the generictechnical_translationprofile - the repo currently keeps only the active
v1translation prompt folders;prompt_versionremains part of the policy contract so future prompt revisions can still be rolled out deliberately LanguagePairPolicy.model_policymay override prompt selection withprompt_profile,prompt_version, andprompt_contextProviderProfile.configcan carry the same keys as provider-level defaults, withLanguagePairPolicy.model_policytaking precedence per jobprompt_contextis intended for domain or document-specific fields such asdomain_label,document_context,content_type,audience,register, andextra_instructionsprompt_budgetcan cap noisy sections such as glossary matches, memory examples, neighboring context, and document-context length so large-context models still get compact prompts- prompt metadata now records selected prompt version plus prompt-size and compaction metrics in each provider translation result
Example model_policy override:
{
"provider": "lm-studio-local",
"prompt_profile": "technical_translation_en_bg",
"prompt_version": "v1",
"prompt_context": {
"domain_label": "nuclear licensing package",
"document_context": "Safety-class valve schedule for the containment spray system.",
"register": "formal institutional",
"audience": "regulatory reviewers",
"extra_instructions": [
"Prefer established Bulgarian licensing terminology."
]
},
"prompt_budget": {
"max_glossary_entries": 8,
"max_exact_memory_entries": 2,
"max_fuzzy_memory_entries": 2,
"max_neighboring_context_entries": 3,
"max_document_context_chars": 420,
"max_list_item_chars": 180
}
}
Language Prompts And Enrichment
Non-translation prompt families are now stored separately under services/language/prompt_templates/.
services/language/prompts.pycatalogs entity extraction, abbreviation extraction, keyword extraction, text cleanup, text normalization, translation correction, TOC summarization, translation evaluation, and final-edit promptsservices/language/enrichment.pyruns document-level enrichment during extraction and stores the result injob.metadata["language_enrichment"]and the extraction manifest JSON- the extraction-stage enrichment currently persists document summary plus entity buckets for
people,names,abbreviations, andkeywords - the translation prompt path now reuses the extracted document summary automatically when no stronger
prompt_context.document_contextoverride is configured - translation finalization now runs document-level QA for repeated-source drift, repeated glossary-term drift, abbreviation stability, and repeated unit rendering, and can escalate affected segments to
needs_review - same-family approved memory is now surfaced separately as revision-lineage context so critique and QA can detect drift against prior approved family translations
Async orchestration is disabled by default in plain local dev. To exercise the worker path without Docker, set IRIS_ASYNC_WORKFLOW_ENABLED=true, point CELERY_BROKER_URL at a broker, and run a worker with uv run celery -A Iris_translation worker --queues=job_control,docx_extract,retrieve_context,translate_batch,qa_verify,review_io,docx_reassemble,maintenance. The current async pipeline dispatches extract_document, then uses translate_batches to fan out translate_batch tasks, then rolls QA up through verify_batches, and finally enqueues reassemble_job when the job clears automated review. Batch sizing is controlled by IRIS_TRANSLATION_BATCH_SIZE.
The same codebase now also ships a real tasks.workflow.maintenance_tick task, and uv run python manage.py maintenance_tick runs the same retention cleanup, stale-candidate cleanup, review-coverage refresh, and integrity audit flow on demand.
The committed .env.example stores SQLite under var/db.sqlite3 so the local Docker web and worker services share the same database file.
OpenSearch
Glossary retrieval can now be exercised through OpenSearch without changing the rest of the translation workflow.
- keep
IRIS_OPEN_SEARCH_ENABLED=falsefor plain localmanage.pyusage unless you have an OpenSearch node running - when enabled, set
IRIS_OPEN_SEARCH_URLto the node base URL and optionally overrideIRIS_OPEN_SEARCH_INDEX - the retrieval layer currently uses OpenSearch for glossary lookup while approved and candidate memory retrieval stays database-backed
- translation results now record
prompt_context.retrieval_backendso you can verify whether a job useddatabaseoropensearch
Verification
uv run ruff check .
uv run ruff format --check .
uv run python manage.py check
uv run python manage.py makemigrations --check
uv run python manage.py maintenance_tick --dry-run
uv run python manage.py test tests
Apply formatting locally with:
uv run ruff format .
The current Ruff baseline targets the maintained Django application surface and intentionally excludes poc/ and Django migration files so CI can gate the production code without being blocked by legacy experiments or generated schema history.
Forgejo CI
The repo now ships a Forgejo Actions workflow at .forgejo/workflows/ci.yml.
- It expects a runner label named
iris-ci. - The intended
iris-ciimage build recipe lives atdocker/ci-runner.Dockerfile. - The workflow runs
uv sync --frozen,ruff format --check,ruff check,manage.py check,makemigrations --check,maintenance_tick --dry-run, andmanage.py test tests. - Full test execution needs an Aspose license available either at
config/licenses/Aspose.Words.Python.NET.licin the checked-out workspace or as theASPOSE_WORDS_LICENSE_BASE64Forgejo secret.
Example runner image build:
docker build -f docker/ci-runner.Dockerfile -t registry.example.com/iris-translation/forgejo-runner:latest .
docker push registry.example.com/iris-translation/forgejo-runner:latest
Register the Forgejo runner with a label such as iris-ci:docker://registry.example.com/iris-translation/forgejo-runner:latest.
Docker
The repo includes a local Docker/Compose workflow for booting the app end to end with web, worker, redis, opensearch, opensearch-dashboards, and minio.
./docker/compose.sh up --build
The web app will be available at http://127.0.0.1:8000.
OpenSearch Dashboards will be available at http://127.0.0.1:5601.
The MinIO Console will be available at http://127.0.0.1:9001.
Useful checks:
./docker/compose.sh ps
curl http://127.0.0.1:8000/health/live
curl http://127.0.0.1:8000/health/ready
./docker/compose.sh exec web python manage.py test tests
./docker/compose.sh exec web env IRIS_TEST_LIVE_OPENSEARCH=1 IRIS_TEST_LIVE_OPENSEARCH_URL=http://opensearch:9200 IRIS_TEST_LIVE_OPENSEARCH_INDEX=iris-opensearch-live-test python manage.py test tests.test_opensearch_live_integration
uv run python manage.py evaluate_translation_quality
uv run python manage.py evaluate_translation_quality --format=json
./docker/compose.sh down
/health/ready now reports per-dependency status for database, storage, aspose, broker, and opensearch. Optional checks return disabled when those features are turned off by configuration.
Notes:
./docker/compose.shdisables Docker's automatic project-root.envloading so your ignored local.envdoes not interfere with Compose parsing.- Compose uses the committed
.env.exampledefaults, but overrides async orchestration to use the bundled Redis broker and enables OpenSearch retrieval against the bundledopensearchservice. - OpenSearch Dashboards is bundled for local index inspection and is configured to connect to the bundled
opensearchservice. - MinIO already exposes its built-in browser console on port
9001; sign in with the local development credentials fromcompose.yml. - the local Aspose license is intentionally excluded from git; place it at
config/licenses/Aspose.Words.Python.NET.licon your machine - the current Compose stack still uses SQLite for the database, but switches artifact storage to S3-compatible storage through the bundled MinIO service and runs Celery extraction, glossary retrieval, translation, verification, and reassembly locally
- the Docker image is intentionally pinned to
python:3.12-slim-bullseyebecause the bundled Aspose.NET Core 3.1runtime requires the OpenSSL 1.1 ABI available there - the Compose stack was verified end to end through
web -> Redis -> worker -> Aspose reassembly GET /api/v1/jobs/{id}/artifactsnow returns signeddownload_urlvalues, download expirations, and artifact retention metadata instead of exposing raw storage keys- workflow stages that need local files now materialize artifacts through Django storage APIs, so non-path backends such as S3-compatible storage can be exercised without relying on
storage.path(...) uv run python manage.py evaluate_translation_qualityruns the curated translation baseline suite and fails if exact outputs, verification states, terminology integrity, placeholder/number preservation, or repeated-segment consistency regress; the report now also includes prompt-size and compaction metrics for CI drift analysis
Git And Commits
This repo now uses conventional commits. Keep commits scoped and descriptive.
Recommended style:
feat: add memory promotion endpointfix: preserve glossary scope precedence during importchore: add local docker compose workflow
Project Docs
Detailed product and implementation docs live in docs/.
docs/Home.mddocs/local-development.mddocs/product.mddocs/setup-and-configuration.mddocs/runtime-architecture.mddocs/jobs-and-workflow.mddocs/api-reference.mddocs/translation-and-retrieval.mddocs/quality-and-review.mddocs/storage-and-artifacts.mddocs/operations-and-scaling.mddocs/implementation-plan.mddocs/open-questions.md