4 jobs and workflow
william le roux edited this page 2026-04-01 16:23:05 +03:00

Jobs And Workflow

This document describes the job lifecycle implemented by the current repository.

Workflow At A Glance

flowchart TD
	Intake[Intake] --> Preflight[Preflight]
	Preflight -->|blocked| Blocked[Blocked]
	Preflight -->|accepted| Queued[Queued]
	Queued --> Extraction[Extraction]
	Extraction --> Translation[Translation]
	Translation --> QA[QA Finalization]
	QA -->|blocked| Blocked
	QA -->|needs review| Review[TMX Review Loop]
	QA -->|clean| ReassemblyPending[Reassembly Pending]
	Review --> Export[TMX Export]
	Export --> Import[TMX Import]
	Import --> ReassemblyPending
	ReassemblyPending --> Reassembly[Reassembly]
	Reassembly --> Completed[Completed]

API Access And Roles

The workflow endpoints now sit behind authenticated /api/v1/* access.

  • operator access covers job intake, job actions, artifact listing, review coverage, domain packs, glossary listing, and memory listing
  • admin access is required for configuration endpoints, glossary imports, memory promotion, audit-event search, and governance deletion endpoints
  • /health/live, /health/ready, and signed artifact downloads remain public
  • the session-authenticated /console/ UI reuses the same intake, control, review, replay, and reporting services for the main operator workflow
  • the session-authenticated /console/batches/* UI groups many normal jobs into one batch run so operators can watch per-file progress, logs, and retries
  • the session-authenticated /console/files/* UI groups runs by filename, exposes per-run final DOCX and TMX access, and wraps run-to-run comparison plus TMX-style inspection

Audit actor identity now comes from the authenticated user. Compatibility fields such as submitted_by, requested_by, cancelled_by, and imported_by no longer override the recorded actor or source IP.

Intake

Jobs are created through POST /api/v1/jobs.

Required form fields:

  • file (DOCX upload)
  • project_code
  • domain_pack_code
  • source_language
  • target_language

Optional intake fields include:

  • external_reference
  • document_title
  • version_label
  • family_key
  • revision_strategy
  • submitted_by
  • idempotency_key
  • metadata as a JSON object encoded in the form field

Current intake behavior:

  • if idempotency_key matches an existing job, the existing job is returned instead of creating a new one
  • if external_reference is omitted, the service derives one from the uploaded filename stem
  • if version_label is omitted, the service uses a timestamp string
  • the persisted job.submitted_by value and job_created audit actor now come from the authenticated API user
  • intake creates a Document, DocumentVersion, source artifact, preflight artifact, job policy snapshot, and job statistics snapshot
  • if preflight blocks the document, the job stops in blocked status
  • if preflight passes, the job is set to queued

Batch Intake

The operator console also supports multi-file submission at /console/batches/new/.

Current batch behavior:

  • each uploaded DOCX is stored immediately and registered as a JobBatchItem
  • in async mode the request queues a submit_batch_item task per file on job_control
  • the worker creates the underlying Job, runs preflight, and then hands the file to the same extraction-first workflow as single-job intake
  • batch detail views aggregate file state, extracted counts, translated counts, review-required counts, and verification-state distributions
  • batch file detail views show both batch-item logs and job audit events
  • failed or cancelled underlying jobs can be retried from the batch file view
  • if intake itself fails before a Job exists, the batch item remains visible with its submission error and can be retried without re-uploading the file

File History

The operator console also exposes filename-based tracking under /console/files/.

Current file-history behavior:

  • grouping is by project + source_filename
  • that grouping spans new uploaded versions and reruns of earlier versions
  • the file detail view shows each DocumentVersion, the runs attached to that version, and download links for the latest final_docx and available TMX artifact per run
  • the file detail view also shows a TMX-style segment table backed by extracted units and TranslationResult rows, including source text, translated output, verification_state, and result_source
  • operators can compare one run against another with the same filename through the existing job-comparison service
  • that comparison is segment-level over extracted units and translation results; final DOCX outputs are versioned and downloadable per run, but the repo does not currently do a structural DOCX diff
  • the segment table remains available even when no exported TMX artifact exists, because it is built from stored workflow rows
  • file history is application-managed and does not depend on MinIO/S3 bucket versioning
  • if the same filename maps to multiple Document.external_reference values inside one project, the UI shows a warning because those histories are intentionally collapsed for now

Job Statuses

apps/jobs/models.py defines these job statuses:

Status Meaning in the current code
queued Accepted and waiting for the next processing step
blocked Stopped by preflight or QA blockers
in_progress Active extraction or translation work is underway
needs_review Automated checks completed but human review is required
reassembly_pending Ready for DOCX reassembly
completed Final DOCX reassembly finished
failed Processing raised an exception after work started
cancelled Operator cancelled the job

Job Stages

apps/jobs/models.py defines these stages:

Stage Meaning
intake Record creation
preflight DOCX inspection before extraction
extraction Source DOCX is being parsed into translation units
translation Segment translation is underway
qa Automated QA and verification phase
review Human review is expected or in progress
reassembly Final DOCX is being generated
delivery Final artifact already exists

Preflight Statuses

Preflight status Meaning
accepted Preflight passed
accepted_with_warnings Preflight passed but diagnostics were recorded
blocked Preflight rejected the input

Synchronous And Asynchronous Paths

Default Local Path

With .env.example copied unchanged, async orchestration is disabled. In that mode:

  • job creation still performs intake and preflight
  • the job is not auto-dispatched to Celery
  • operators can call POST /api/v1/jobs/{job_id}/process to execute extraction, translation, QA, and reassembly synchronously

Async Path

When IRIS_ASYNC_WORKFLOW_ENABLED=true:

  • intake dispatches extraction after the surrounding database transaction commits
  • work is routed through Celery queues defined in settings.py
  • reassembly can also be dispatched after successful QA or review import
  • multi-file batch intake persists uploads first and then queues per-file submit_batch_item tasks so request time does not scale with preflight or job creation

Staged Async Task Flow

The staged tasks currently implemented in tasks/workflow.py are:

  1. extract_document
  2. translate_batches
  3. one or more translate_batch
  4. verify_batches
  5. reassemble_job when the job reaches reassembly_pending
  6. maintenance_tick for cleanup and integrity work

For multi-file console batches, there is one extra task before the normal per-job flow:

  1. submit_batch_item
  2. extract_document
  3. translate_batches
  4. one or more translate_batch
  5. verify_batches
  6. reassemble_job when the job reaches reassembly_pending
  7. maintenance_tick for cleanup and integrity work

Important current nuances:

  • process_job still exists as a Celery task and as the synchronous API path, but intake dispatches the extraction-first staged flow
  • retrieval happens inside translation processing; there is no checked-in tasks.workflow.retrieve_context task implementation
  • maintenance_tick is not part of the per-job pipeline; it is an operator/scheduler task for cleanup and integrity checks

Queue Routing

settings.py routes tasks to these queue names:

  • job_control
  • docx_extract
  • retrieve_context
  • translate_batch
  • qa_verify
  • review_io
  • docx_reassemble
  • maintenance

The checked-in Docker worker listens on all of them with a single process.

submit_batch_item is routed to job_control.

Maintenance Workflow

tasks.workflow.maintenance_tick and python manage.py maintenance_tick run the same maintenance service.

That workflow currently:

  • deletes expired artifacts whose retention_expires_at has passed
  • deletes stale non-promoted CandidateMemoryEntry rows tied to terminal jobs
  • refreshes daily ProjectReviewCoverageSnapshot rows
  • reports integrity issues for missing artifact objects and completed jobs missing final_docx or reassembly_manifest

The maintenance flow is idempotent for repeated runs against the same state. It is separate from job intake/translation dispatch and is intended to be scheduled externally.

Batch Planning

Translation work is batched over translatable translation units.

  • settings.py uses 25 as the fallback batch size
  • .env.example sets IRIS_TRANSLATION_BATCH_SIZE=10
  • job metadata stores translation_batches state, including completed batch indexes and QA dispatch/completion flags

State Transitions During Processing

Extraction

Extraction:

  • loads the source_docx artifact
  • creates TranslationUnit rows if they do not already exist
  • runs document language enrichment
  • writes an extraction_manifest artifact
  • updates job metadata with enrichment and extraction summary data
  • moves the job to queued + translation

Translation And Segment Verification

Translation:

  • loads or plans translation batches
  • builds retrieval context
  • renders prompts
  • calls the selected provider or approved memory path
  • writes TranslationResult and VerificationResult rows

QA Finalization

Finalization writes a qa_report artifact and then sets the job status according to verification results:

  • blocked if any unit is blocked
  • needs_review if any unit still requires review or is inconsistent with glossary or memory policy
  • reassembly_pending otherwise

Reassembly

Reassembly:

  • reads the original source_docx
  • applies translated text by stable anchor
  • writes a final_docx artifact and a reassembly_manifest
  • marks the job completed and stage delivery

Review Workflow

TMX Export

POST /api/v1/jobs/{job_id}/tmx-export:

  • requires the job to already have translation results
  • creates a review_tmx artifact
  • creates a ReviewSession with status exported

TMX Import

POST /api/v1/jobs/{job_id}/tmx-import:

  • requires an uploaded TMX file in the file field
  • validates job id, document version id, language pair, and anchors against the exported job
  • stores the uploaded review file as review_import_tmx
  • marks changed segments as human_verified
  • preserves the existing verification state for unchanged segments unless the TMX explicitly carries verificationStatus=human_verified
  • creates reviewed CandidateMemoryEntry rows from the imported targets
  • moves the job to reassembly_pending only when no review-required or blocked states remain after import
  • can reassemble immediately in sync mode or dispatch reassembly in async mode

Control And Recovery Actions

Cancel

POST /api/v1/jobs/{job_id}/cancel:

  • is allowed unless the job is already in a terminal completed, blocked, or failed state
  • records cancellation metadata on the job and emits an audit event using the authenticated actor identity

Rerun

POST /api/v1/jobs/{job_id}/rerun:

  • reuses the source artifact from the original job
  • re-runs preflight against the current active language pair policy
  • creates a new job linked back to the source job in metadata and audit events
  • attributes the rerun request to the authenticated user
  • dispatches extraction on commit when async is enabled

Startup Recovery

On Django startup, the app also performs a small recovery pass for abandoned jobs:

  • jobs stuck in in_progress with a stale or missing heartbeat are marked failed
  • this is intended to prevent permanently stuck jobs after process crashes or hard kills
  • recovery is best-effort and depends on the jobs table already existing

Replay Package

POST /api/v1/jobs/{job_id}/replay:

  • requires translation results and a TMX artifact
  • creates a ZIP replay package containing TMX plus JSON sidecars

Comparison

GET /api/v1/jobs/{job_id}/compare?other_job_id=... returns a structured comparison report between two jobs.

Current comparison behavior:

  • comparison is anchored on extracted segment identifiers, not on a binary DOCX diff
  • it reports changed translated text, verification_state, result_source, provider model, and review status deltas
  • the file-history console uses the same comparison service to compare two runs from the same filename timeline

Artifacts Produced Along The Way

A typical successful end-to-end run can create:

  1. source_docx
  2. preflight_report
  3. extraction_manifest
  4. qa_report
  5. optional review_tmx
  6. optional review_import_tmx
  7. optional replay_package
  8. reassembly_manifest
  9. final_docx

The exact set depends on whether the job is blocked, reviewed, replayed, or directly completed. For artifact retention and download behavior, see Storage And Artifacts.