william/iris_translation_v2

Fork 0

Table of Contents

Jobs And Workflow

Related Docs
Workflow At A Glance
API Access And Roles
Intake
Batch Intake
File History
Job Statuses
Job Stages
Preflight Statuses
Synchronous And Asynchronous Paths

Default Local Path
Async Path

Staged Async Task Flow
Queue Routing
Maintenance Workflow
Batch Planning
State Transitions During Processing

Extraction
Translation And Segment Verification
QA Finalization
Reassembly

Review Workflow

TMX Export
TMX Import

Control And Recovery Actions

Cancel
Rerun
Startup Recovery
Replay Package
Comparison

Artifacts Produced Along The Way

Jobs And Workflow

This document describes the job lifecycle implemented by the current repository.

Workflow At A Glance

flowchart TD
	Intake[Intake] --> Preflight[Preflight]
	Preflight -->|blocked| Blocked[Blocked]
	Preflight -->|accepted| Queued[Queued]
	Queued --> Extraction[Extraction]
	Extraction --> Translation[Translation]
	Translation --> QA[QA Finalization]
	QA -->|blocked| Blocked
	QA -->|needs review| Review[TMX Review Loop]
	QA -->|clean| ReassemblyPending[Reassembly Pending]
	Review --> Export[TMX Export]
	Export --> Import[TMX Import]
	Import --> ReassemblyPending
	ReassemblyPending --> Reassembly[Reassembly]
	Reassembly --> Completed[Completed]

API Access And Roles

The workflow endpoints now sit behind authenticated /api/v1/* access.

operator access covers job intake, job actions, artifact listing, review coverage, domain packs, glossary listing, and memory listing
admin access is required for configuration endpoints, glossary imports, memory promotion, audit-event search, and governance deletion endpoints
/health/live, /health/ready, and signed artifact downloads remain public
the session-authenticated /console/ UI reuses the same intake, control, review, replay, and reporting services for the main operator workflow
the session-authenticated /console/batches/* UI groups many normal jobs into one batch run so operators can watch per-file progress, logs, and retries
the session-authenticated /console/files/* UI groups runs by filename, exposes per-run final DOCX and TMX access, and wraps run-to-run comparison plus TMX-style inspection

Audit actor identity now comes from the authenticated user. Compatibility fields such as submitted_by, requested_by, cancelled_by, and imported_by no longer override the recorded actor or source IP.

Intake

Jobs are created through POST /api/v1/jobs.

Required form fields:

file (DOCX upload)
project_code
domain_pack_code
source_language
target_language

Optional intake fields include:

external_reference
document_title
version_label
family_key
revision_strategy
submitted_by
idempotency_key
metadata as a JSON object encoded in the form field

Current intake behavior:

if idempotency_key matches an existing job, the existing job is returned instead of creating a new one
if external_reference is omitted, the service derives one from the uploaded filename stem
if version_label is omitted, the service uses a timestamp string
the persisted job.submitted_by value and job_created audit actor now come from the authenticated API user
intake creates a Document, DocumentVersion, source artifact, preflight artifact, job policy snapshot, and job statistics snapshot
if preflight blocks the document, the job stops in blocked status
if preflight passes, the job is set to queued

Batch Intake

The operator console also supports multi-file submission at /console/batches/new/.

Current batch behavior:

each uploaded DOCX is stored immediately and registered as a JobBatchItem
in async mode the request queues a submit_batch_item task per file on job_control
the worker creates the underlying Job, runs preflight, and then hands the file to the same extraction-first workflow as single-job intake
batch detail views aggregate file state, extracted counts, translated counts, review-required counts, and verification-state distributions
batch file detail views show both batch-item logs and job audit events
failed or cancelled underlying jobs can be retried from the batch file view
if intake itself fails before a Job exists, the batch item remains visible with its submission error and can be retried without re-uploading the file

File History

The operator console also exposes filename-based tracking under /console/files/.

Current file-history behavior:

grouping is by project + source_filename
that grouping spans new uploaded versions and reruns of earlier versions
the file detail view shows each DocumentVersion, the runs attached to that version, and download links for the latest final_docx and available TMX artifact per run
the file detail view also shows a TMX-style segment table backed by extracted units and TranslationResult rows, including source text, translated output, verification_state, and result_source
operators can compare one run against another with the same filename through the existing job-comparison service
that comparison is segment-level over extracted units and translation results; final DOCX outputs are versioned and downloadable per run, but the repo does not currently do a structural DOCX diff
the segment table remains available even when no exported TMX artifact exists, because it is built from stored workflow rows
file history is application-managed and does not depend on MinIO/S3 bucket versioning
if the same filename maps to multiple Document.external_reference values inside one project, the UI shows a warning because those histories are intentionally collapsed for now

Job Statuses

apps/jobs/models.py defines these job statuses:

Status	Meaning in the current code
`queued`	Accepted and waiting for the next processing step
`blocked`	Stopped by preflight or QA blockers
`in_progress`	Active extraction or translation work is underway
`needs_review`	Automated checks completed but human review is required
`reassembly_pending`	Ready for DOCX reassembly
`completed`	Final DOCX reassembly finished
`failed`	Processing raised an exception after work started
`cancelled`	Operator cancelled the job

Job Stages

apps/jobs/models.py defines these stages:

Stage	Meaning
`intake`	Record creation
`preflight`	DOCX inspection before extraction
`extraction`	Source DOCX is being parsed into translation units
`translation`	Segment translation is underway
`qa`	Automated QA and verification phase
`review`	Human review is expected or in progress
`reassembly`	Final DOCX is being generated
`delivery`	Final artifact already exists

Preflight Statuses

Preflight status	Meaning
`accepted`	Preflight passed
`accepted_with_warnings`	Preflight passed but diagnostics were recorded
`blocked`	Preflight rejected the input

Synchronous And Asynchronous Paths

Default Local Path

With .env.example copied unchanged, async orchestration is disabled. In that mode:

job creation still performs intake and preflight
the job is not auto-dispatched to Celery
operators can call POST /api/v1/jobs/{job_id}/process to execute extraction, translation, QA, and reassembly synchronously

Async Path

When IRIS_ASYNC_WORKFLOW_ENABLED=true:

intake dispatches extraction after the surrounding database transaction commits
work is routed through Celery queues defined in settings.py
reassembly can also be dispatched after successful QA or review import
multi-file batch intake persists uploads first and then queues per-file submit_batch_item tasks so request time does not scale with preflight or job creation

Staged Async Task Flow

The staged tasks currently implemented in tasks/workflow.py are:

extract_document
translate_batches
one or more translate_batch
verify_batches
reassemble_job when the job reaches reassembly_pending
maintenance_tick for cleanup and integrity work

For multi-file console batches, there is one extra task before the normal per-job flow:

submit_batch_item
extract_document
translate_batches
one or more translate_batch
verify_batches
reassemble_job when the job reaches reassembly_pending
maintenance_tick for cleanup and integrity work

Important current nuances:

process_job still exists as a Celery task and as the synchronous API path, but intake dispatches the extraction-first staged flow
retrieval happens inside translation processing; there is no checked-in tasks.workflow.retrieve_context task implementation
maintenance_tick is not part of the per-job pipeline; it is an operator/scheduler task for cleanup and integrity checks

Queue Routing

settings.py routes tasks to these queue names:

job_control
docx_extract
retrieve_context
translate_batch
qa_verify
review_io
docx_reassemble
maintenance

The checked-in Docker worker listens on all of them with a single process.

submit_batch_item is routed to job_control.

Maintenance Workflow

tasks.workflow.maintenance_tick and python manage.py maintenance_tick run the same maintenance service.

That workflow currently:

deletes expired artifacts whose retention_expires_at has passed
deletes stale non-promoted CandidateMemoryEntry rows tied to terminal jobs
refreshes daily ProjectReviewCoverageSnapshot rows
reports integrity issues for missing artifact objects and completed jobs missing final_docx or reassembly_manifest

The maintenance flow is idempotent for repeated runs against the same state. It is separate from job intake/translation dispatch and is intended to be scheduled externally.

Batch Planning

Translation work is batched over translatable translation units.

settings.py uses 25 as the fallback batch size
.env.example sets IRIS_TRANSLATION_BATCH_SIZE=10
job metadata stores translation_batches state, including completed batch indexes and QA dispatch/completion flags

State Transitions During Processing

Extraction

Extraction:

loads the source_docx artifact
creates TranslationUnit rows if they do not already exist
runs document language enrichment
writes an extraction_manifest artifact
updates job metadata with enrichment and extraction summary data
moves the job to queued + translation

Translation And Segment Verification

Translation:

loads or plans translation batches
builds retrieval context
renders prompts
calls the selected provider or approved memory path
writes TranslationResult and VerificationResult rows

QA Finalization

Finalization writes a qa_report artifact and then sets the job status according to verification results:

blocked if any unit is blocked
needs_review if any unit still requires review or is inconsistent with glossary or memory policy
reassembly_pending otherwise

Reassembly

Reassembly:

reads the original source_docx
applies translated text by stable anchor
writes a final_docx artifact and a reassembly_manifest
marks the job completed and stage delivery

Review Workflow

TMX Export

POST /api/v1/jobs/{job_id}/tmx-export:

requires the job to already have translation results
creates a review_tmx artifact
creates a ReviewSession with status exported

TMX Import

POST /api/v1/jobs/{job_id}/tmx-import:

requires an uploaded TMX file in the file field
validates job id, document version id, language pair, and anchors against the exported job
stores the uploaded review file as review_import_tmx
marks changed segments as human_verified
preserves the existing verification state for unchanged segments unless the TMX explicitly carries verificationStatus=human_verified
creates reviewed CandidateMemoryEntry rows from the imported targets
moves the job to reassembly_pending only when no review-required or blocked states remain after import
can reassemble immediately in sync mode or dispatch reassembly in async mode

Control And Recovery Actions

Cancel

POST /api/v1/jobs/{job_id}/cancel:

is allowed unless the job is already in a terminal completed, blocked, or failed state
records cancellation metadata on the job and emits an audit event using the authenticated actor identity

Rerun

POST /api/v1/jobs/{job_id}/rerun:

reuses the source artifact from the original job
re-runs preflight against the current active language pair policy
creates a new job linked back to the source job in metadata and audit events
attributes the rerun request to the authenticated user
dispatches extraction on commit when async is enabled

Startup Recovery

On Django startup, the app also performs a small recovery pass for abandoned jobs:

jobs stuck in in_progress with a stale or missing heartbeat are marked failed
this is intended to prevent permanently stuck jobs after process crashes or hard kills
recovery is best-effort and depends on the jobs table already existing

Replay Package

POST /api/v1/jobs/{job_id}/replay:

requires translation results and a TMX artifact
creates a ZIP replay package containing TMX plus JSON sidecars

Comparison

GET /api/v1/jobs/{job_id}/compare?other_job_id=... returns a structured comparison report between two jobs.

Current comparison behavior:

comparison is anchored on extracted segment identifiers, not on a binary DOCX diff
it reports changed translated text, verification_state, result_source, provider model, and review status deltas
the file-history console uses the same comparison service to compare two runs from the same filename timeline

Artifacts Produced Along The Way

A typical successful end-to-end run can create:

source_docx
preflight_report
extraction_manifest
qa_report
optional review_tmx
optional review_import_tmx
optional replay_package
reassembly_manifest
final_docx

The exact set depends on whether the job is blocked, reviewed, replayed, or directly completed. For artifact retention and download behavior, see Storage And Artifacts.