william/iris_translation_v2

Fork 0

Table of Contents

Local Development

Related Docs
What You Need
Recommended Start Paths

1. Recommended path: host app, Dockerized dependencies
2. Fastest fallback: web-only local development
3. Full local stack with Docker Compose

Common First-Time Setup Steps
Debugging

Debugging the web-only path
Debugging the async path
Debugging DOCX flows
Debugging health and maintenance behavior
Logs and state you can inspect locally

Useful Development Commands
Before Opening A PR
Conventional Commits

Local Development

This document describes the fastest supported ways to get the repository running locally, how to debug the main execution paths, and what to run before opening a pull request.

What You Need

Required for the standard local path:

Python 3.12
uv

Optional, depending on what you want to exercise:

Docker and Docker Compose for the full local stack
an Aspose license file at config/licenses/Aspose.Words.Python.NET.lic for full DOCX extraction and reassembly flows
a local LM Studio or other OpenAI-compatible runtime if you want live model calls instead of the checked-in mock provider

The checked-in defaults in .env.example now target the recommended host-app local stack:

PostgreSQL on 127.0.0.1:5433
Redis on 127.0.0.1:6380
MinIO-backed S3 storage on 127.0.0.1:9000
OpenSearch on 127.0.0.1:9200
async workflow enabled
mock translation provider unless you switch to LM Studio or another provider
SQLite still available as an explicit fallback if you set IRIS_DB_ENGINE=sqlite

Recommended Start Paths

Pick one of these based on what you are trying to work on.

1. Recommended path: host app, Dockerized dependencies

Use this for normal local development. Django and Celery run on the host, while PostgreSQL, Redis, MinIO, and OpenSearch run in Docker.

cp .env.example .env
uv sync
./docker/compose.sh up -d postgres redis minio minio-init opensearch opensearch-dashboards
uv run python manage.py migrate
uv run python manage.py seed_dev_data --lm-studio --with-dev-admin
uv run python manage.py runserver --noreload 127.0.0.1:8000
uv run celery -A Iris_translation worker --loglevel=INFO --pool=solo --concurrency=1 --queues=job_control,docx_extract,retrieve_context,translate_batch,qa_verify,review_io,docx_reassemble,maintenance

Then open:

http://127.0.0.1:8000/console/
http://127.0.0.1:8000/console/batches/new/
http://127.0.0.1:8000/api/v1/...
http://127.0.0.1:8000/health/live
http://127.0.0.1:8000/health/ready

Why this is the easiest path:

.env.example already points the host app at the Dockerized dependencies
the operator console, worker, MinIO, and OpenSearch all reflect the same async path you use in practice
the default provider is still mock unless you intentionally switch it

Use --noreload for anything long-running. The Django auto-reloader restarts the process on Python file changes and will kill in-flight jobs in the web process.

Shared PyCharm configs under .run/ match this path directly:

10 Infra Dependencies
20 Django Migrate
30 Seed Dev Data
40 Django Backend
50 Celery Worker
90 Local Async Stack

2. Fastest fallback: web-only local development

Use this only when you explicitly want to avoid Dockerized dependencies.

Start from .env.example.
Change the database and dependency flags in .env:

IRIS_DB_ENGINE=sqlite
IRIS_STORAGE_BACKEND=local
IRIS_ASYNC_WORKFLOW_ENABLED=false
IRIS_OPEN_SEARCH_ENABLED=false
CELERY_BROKER_URL=memory://
CELERY_RESULT_BACKEND=cache+memory://

Run the Django process:

uv run python manage.py runserver --noreload

In this fallback path, intake, translation orchestration, and review actions can all stay inside one Django process.

3. Full local stack with Docker Compose

Use this when you want the app processes inside Docker as well.

./docker/compose.sh up --build

Useful follow-up commands:

./docker/compose.sh ps
curl http://127.0.0.1:8000/health/live
curl http://127.0.0.1:8000/health/ready
./docker/compose.sh exec web python manage.py test tests
./docker/compose.sh down

The checked-in Compose stack enables:

PostgreSQL database
async workflow
Redis broker
MinIO-backed S3-compatible artifact storage
OpenSearch glossary retrieval

It still uses Django runserver and startup-time migrations, so treat it as a development stack, not a production deployment shape.

Common First-Time Setup Steps

If you are starting from a fresh clone and want a usable local environment quickly:

Copy .env.example to .env.
Run uv sync.
Run ./docker/compose.sh up -d postgres redis minio minio-init opensearch opensearch-dashboards.
Run uv run python manage.py migrate.
Run uv run python manage.py seed_dev_data --lm-studio --with-dev-admin.
Start the app with uv run python manage.py runserver --noreload 127.0.0.1:8000.
Start a worker with uv run celery -A Iris_translation worker --loglevel=INFO --pool=solo --concurrency=1 --queues=job_control,docx_extract,retrieve_context,translate_batch,qa_verify,review_io,docx_reassemble,maintenance.
Sign in at /console/.

Notes:

seed_dev_data --lm-studio --with-dev-admin seeds the base policy objects, a small glossary set, and a local-only bootstrap login.
the default bootstrap login is dev-admin / dev-password-123
the repository includes sample .docx files under dataset/ if you want a quick upload target after sign-in
/console/files/ shows filename-based history for uploaded files, including version timelines, reruns, TMX-style segment inspection, and run comparison
/console/batches/new/ lets you submit many DOCX files at once; /console/batches/ and the per-file drilldown pages show intake state, extracted/translated counts, verification-state counts, retry actions, and file-level logs
/console/ uses Django session authentication.
/api/v1/* requires authentication by default except for GET /health/live, GET /health/ready, and signed artifact downloads.

Debugging

Debugging the web-only path

This is the best path when you want a single process that is easy to inspect.

set IRIS_DB_ENGINE=sqlite
set IRIS_STORAGE_BACKEND=local
leave IRIS_ASYNC_WORKFLOW_ENABLED=false
set IRIS_OPEN_SEARCH_ENABLED=false
run uv run python manage.py runserver --noreload

This keeps intake, translation orchestration, and review actions inside the Django process.

Debugging the async path

If you want jobs to dispatch through Celery but still keep failures visible quickly:

set IRIS_ASYNC_WORKFLOW_ENABLED=true
keep PostgreSQL, Redis, MinIO, and OpenSearch running in Docker
run the web process and the worker in separate terminals

In this mode, multi-file console batches no longer perform per-file intake and preflight inside the request. The request stores uploads, queues submit_batch_item tasks on job_control, and the worker turns each file into a normal Job before the usual extraction and translation stages continue.

If you want dispatched tasks to execute inline in the same process during debugging, the settings surface also supports:

CELERY_TASK_ALWAYS_EAGER=true
CELERY_TASK_EAGER_PROPAGATES=true

That is useful when you want Celery-dispatched code paths without leaving the local Django process.

Debugging DOCX flows

Full DOCX extraction and reassembly behavior depends on the Aspose runtime. To exercise those paths reliably, place the license at:

config/licenses/Aspose.Words.Python.NET.lic

Without that file, the checked-in DOCX-heavy tests and flows are not a trustworthy local baseline.

Debugging health and maintenance behavior

Useful commands:

curl http://127.0.0.1:8000/health/live
curl http://127.0.0.1:8000/health/ready
uv run python manage.py maintenance_tick --dry-run

/health/ready checks database, storage, Aspose, broker, and optional OpenSearch. maintenance_tick --dry-run is useful when you are working on retention, storage, or integrity behavior and want a non-destructive check first.

Logs and state you can inspect locally

The checked-in settings write logs to:

var/logs/iris.log

Helpful state locations:

PostgreSQL: 127.0.0.1:5433
Redis broker: 127.0.0.1:6380
MinIO API: 127.0.0.1:9000
OpenSearch: 127.0.0.1:9200
local media storage: var/media
static files target: var/static

Useful Development Commands

Local runtime:

uv run python manage.py runserver --noreload
uv run celery -A Iris_translation worker --queues=job_control,docx_extract,retrieve_context,translate_batch,qa_verify,review_io,docx_reassemble,maintenance
uv run python manage.py shell

Verification and maintenance:

uv run python manage.py check
uv run python manage.py makemigrations --check
uv run python manage.py maintenance_tick --dry-run
uv run python manage.py test tests
uv run coverage run manage.py test tests && uv run coverage report

Translation-quality baseline:

uv run python manage.py evaluate_translation_quality
uv run python manage.py evaluate_translation_quality --format=json

Run the translation-quality command when your change affects translation outputs, prompts, retrieval, glossary application, QA classification, or review/memory promotion logic.

For a live API smoke against your running local server:

uv run python manage.py runserver --noreload --nothreading
uv run python integration/api_workflow_smoke.py

Before Opening A PR

Run these commands from the repository root:

uv run ruff format .
uv run ruff check .
uv run python manage.py check
uv run python manage.py makemigrations --check
uv run python manage.py maintenance_tick --dry-run
uv run python manage.py test tests

Add this as needed when your change touches translation behavior:

uv run python manage.py evaluate_translation_quality

Practical expectations before opening a PR:

formatting applied locally with uv run ruff format .
linter clean with uv run ruff check .
no unintended migration drift
full test suite green
commit messages use conventional commits

Conventional Commits

This repo uses conventional commits. Keep the message scoped and descriptive.

Good examples:

feat(api): add project coverage endpoint
fix(workflow): preserve delivery artifact on replay
docs(dev): add local development guide
ci(forgejo): add self-hosted pipeline

If your change also updates the docs/ submodule, remember that the docs repository has its own commit history. Commit the docs change in the docs repository, then commit the updated submodule pointer in the main repository.