Table of Contents

Implementation Plan: Verified Next Work

Related Docs
What Is Already In Place
Recent Plan Items That Are No Longer Gaps
Highest-Value Verified Gaps

1. File Identity And Lineage Contract
2. File-Centric Comparison And Monitoring
3. Production Runtime Baseline
4. Supported Worker And Scheduler Topology
5. Retrieval Contract Cleanup
6. Database And Index Strategy For Retrieval
7. Large Artifact Ingress Hardening
8. Delivery Integrity Hardening
9. Observability And Operability
10. Product Contract Clarification

Suggested Work Order
Phase 12: File Identity And Lineage
Phase 13: File-Centric Comparison And Monitoring
Phase 14: Production Runtime Baseline
Phase 15: Worker Topology And Retrieval Contract
Phase 16: Artifact Transfer And Delivery Hardening
Phase 17: Observability And Operability
Phase 18: Product Contract Closure
Explicit Non-Goals Unless The Product Docs Change
Documentation Rule For Future Changes

Implementation Plan: Verified Next Work

This file is not a speculative roadmap. It records the next delivery work that is justified by the current repository state and the current repo-truth docs.

What Is Already In Place

The checked-in repo already has:

DOCX intake and preflight
extraction, translation, QA, and reassembly workflow code
TMX export/import review flow
glossary and memory APIs
project, provider, and policy configuration APIs
artifact retention metadata and signed download behavior
optional OpenSearch-backed glossary retrieval
Celery-based staged async workflow dispatch
maintenance execution for retention cleanup, stale-candidate cleanup, review-coverage refresh, and integrity checks
job comparison, audit search, and review-coverage reporting
authenticated API access with operator/admin role enforcement
governance deletion/deactivation controls for artifacts and memory records
a session-authenticated Django-template operator console for the core workflow
background batch intake through queued submit_batch_item tasks
filename-based file history with per-run artifact access and run-to-run comparison
startup recovery for stale in_progress jobs
an enforced 80% coverage floor for the maintained app surface plus an opt-in live integration smoke harness

Recent Plan Items That Are No Longer Gaps

The older version of this plan treated these as pending work. They are now part of the baseline and should not be planned again unless they need follow-on refinement:

API authentication and authorization
audit attribution hardening
governance deletion controls
maintenance task implementation
the minimum operator console
queued batch intake and monitoring
the file-history console and run comparison baseline

Highest-Value Verified Gaps

These are the most important remaining gaps after reviewing Current Product Scope and Runtime Architecture.

1. File Identity And Lineage Contract

The repo now ships a useful file-history console, but it is still a filename-based grouping layered over DocumentVersion, Job, and Artifact rows.

What is missing:

an explicit supported identity contract for "the same file across runs" beyond project + source_filename
a stronger collision policy than "collapse and warn" when two unrelated files share the same filename inside one project
a rename story if the same logical file later arrives under a different filename
optional storage provenance capture such as object etag or provider version_id, without making bucket versioning the primary lineage key

2. File-Centric Comparison And Monitoring

The console can already inspect segments, compare runs, and expose final DOCX and TMX artifacts per run. The remaining gap is turning that into a first-class, supportable file-centric contract rather than a helpful operator-only view.

What is missing:

file-centric APIs that mirror the /console/files/ lineage and comparison views
richer diff filtering and pagination for large TMX-style comparisons
a clearer version summary for extracted, translated, review-required, and delivered state across many runs
explicit retention and missing-artifact behavior when older versions age out

3. Production Runtime Baseline

The current product and architecture docs still describe a development-first runtime:

PostgreSQL is available and now used by the checked-in host-local path, but the production database contract is still thin
the web process still runs through Django runserver
migrations still run on every web-container start in the checked-in Docker path
the repo does not ship a production WSGI/ASGI contract

This is now the biggest gap between the current implementation and a supportable non-dev deployment.

4. Supported Worker And Scheduler Topology

The architecture docs still say the checked-in stack ships one generic worker and no dedicated scheduler.

What is missing:

a supported split-worker deployment pattern
a checked-in scheduler path for maintenance
a clear production broker recommendation and validation path

This is a runtime-contract gap, not just a docs gap.

5. Retrieval Contract Cleanup

The docs and settings still expose a retrieve_context queue even though retrieval happens inside translation processing and there is no separate checked-in task.

What is missing:

either a real retrieval task and queue boundary, or
removal of the dead queue route and matching docs cleanup

In parallel, the memory/glossary retrieval strategy is still too lightly defined for a production-supportable contract.

6. Database And Index Strategy For Retrieval

The older plan correctly anticipated this gap, and it still exists:

there is no PostgreSQL-backed retrieval strategy shipped in code
there is no checked-in indexing/reindexing workflow for richer memory retrieval
the current docs do not yet define whether approved-memory lookup should stay simple, move to pg_trgm, add embeddings, or adopt pgvector

This needs to become an explicit supported strategy rather than a future-looking placeholder.

7. Large Artifact Ingress Hardening

The product and architecture docs still describe the web process as the entry point for uploads.

What is missing:

signed direct-to-object-storage upload/finalize flows for large DOCX and TMX artifacts
explicit finalize validation and failure behavior
a documented upload contract for large files

This is the clearest remaining storage/runtime scaling gap after signed downloads were added.

8. Delivery Integrity Hardening

The workflow already reassembles and delivers final DOCX files, but the older plan's delivery-hardening gap still remains:

stronger post-reassembly structural validation is not shipped
delivery is not yet blocked on those deeper structural failures

That matters because the product promise is not merely artifact generation, but reliable translated DOCX delivery.

9. Observability And Operability

The runtime docs still do not describe OpenTelemetry-based traces, metrics, or end-to-end correlation across web and worker flows.

What is missing:

trace propagation across web -> Celery -> provider/storage paths
operational metrics for queue latency, job-stage durations, and failure rates
consistent correlation IDs for logs and audit-friendly diagnostics

This is now one of the main supportability gaps rather than a product-feature gap.

10. Product Contract Clarification

The remaining open questions are still meaningful enough to block a final deployment/support story:

production database target
supported worker topology
private deployment minimum, including whether the target promise is private-network deployment only or something closer to fully disconnected execution

The product docs are now intentionally narrower than a marketing roadmap, so these decisions should be resolved explicitly rather than left implicit.

Suggested Work Order

Based on the current product and runtime docs, the next work should start by making the new file-centric workflow contract explicit and only then continue with runtime hardening.

define the supported file-identity and collision contract for version history
expand file-centric comparison, monitoring, and API coverage
establish a production runtime baseline: PostgreSQL, production web process, explicit migration step
ship a supported worker/scheduler topology and a production broker recommendation
resolve the retrieval queue mismatch and define the real retrieval/index strategy
harden large artifact upload and finalize flows
add stronger post-reassembly structural validation
add observability and correlation across web and worker processes
close the remaining product/deployment contract questions in docs and code

Phase 12: File Identity And Lineage

Goal: Turn the current filename-based history view into an explicit supported lineage contract.

Tasks:

Decide whether the supported near-term identity remains project + source_filename or moves to an explicit tracked-file key.
If filename-based identity remains for now, enforce and document collision behavior at intake instead of only warning after grouping.
If a tracked-file key is introduced, backfill existing DocumentVersion rows into deterministic file timelines.
Persist optional storage provenance such as checksum, etag, and provider version_id when available, but keep lineage resolution application-side.
Document how reruns, reuploads, and renamed files are represented in the lineage model.

Checks:

every DocumentVersion belongs to one deterministic file timeline
collisions are either impossible or explicit to operators
object-storage versioning remains supplementary provenance rather than the primary lineage key

Tests to add:

filename-collision behavior tests
lineage backfill and idempotency tests
rename/history continuity tests if renamed-file support is added
artifact provenance persistence tests

Exit criteria:

operators can rely on file history as a supported contract rather than a best-effort filename grouping

Phase 13: File-Centric Comparison And Monitoring

Goal: Make version inspection and run comparison first-class at the file level.

Tasks:

Add file-history and comparison API endpoints that mirror the /console/files/ views.
Add comparison filters for changed text, verification_state, result_source, provider model, and review status.
Add per-run version summaries for extracted, translated, review-required, and delivered state, including percentages.
Add pagination or similar controls so large segment tables and comparisons stay usable.
Keep DOCX access versioned and downloadable per run; do not promise structural DOCX diffing until such a feature is actually shipped.

Checks:

users can answer "what changed between runs?" without falling back to raw database inspection
per-run DOCX and TMX access remains available from the file-centric surface
large files remain inspectable without loading every segment at once

Tests to add:

file-history API tests
comparison filtering and pagination tests
artifact-summary regression tests
large-history performance smoke tests

Exit criteria:

file/version monitoring is clearly file-centric instead of primarily job-centric

Phase 14: Production Runtime Baseline

Goal: Close the gap between the current development-first runtime and a supportable production deployment baseline.

Tasks:

Add a supported PostgreSQL configuration path and migration/test guidance.
Replace the implicit runserver deployment assumption with a documented production web-process contract.
Move migrations out of normal web-container startup and define a one-off migration step.
Define the supported storage/broker/database matrix for non-dev deployments.
Update Compose or add a separate checked-in deployment example that reflects the supported baseline more honestly.

Checks:

the app boots cleanly on PostgreSQL
the web process runs without runserver
migrations can be executed independently of normal web startup

Tests to add:

PostgreSQL smoke tests
deployment-startup smoke tests for the production web-process path
migration compatibility tests against PostgreSQL

Exit criteria:

the runtime docs no longer depend on development-only assumptions for the baseline deployment story

Phase 15: Worker Topology And Retrieval Contract

Goal: Make the queue topology and retrieval behavior match what the runtime docs claim and what operations can actually support.

Tasks:

Decide whether retrieve_context becomes a real task or is removed from routing.
Ship a supported worker-role topology, including scheduler execution for maintenance.
Validate and document the recommended broker path for production use.
Define the approved-memory and glossary retrieval strategy on PostgreSQL.
Add the indexing/reindexing workflow required by the chosen retrieval strategy.

Checks:

every routed queue corresponds to a real supported contract
the worker split and scheduler story can be run repeatedly without undocumented steps
retrieval behavior is explicit in code and docs

Tests to add:

split-worker and scheduler smoke tests
retrieval integration tests on the chosen database backend
indexing/reindexing idempotency tests

Exit criteria:

the runtime architecture no longer contains queue contracts or retrieval promises that are only partially implemented

Phase 16: Artifact Transfer And Delivery Hardening

Goal: Reduce web-tier bottlenecks for large files and tighten the correctness bar for final delivery artifacts.

Tasks:

Add signed direct-to-object-storage upload and finalize flows for large DOCX and TMX files.
Define server-side finalize validation for uploaded artifacts before they enter the workflow.
Add stronger post-reassembly structural validation for delivered DOCX files.
Block delivery when those structural checks fail.

Checks:

large uploads do not require the web tier to proxy the entire payload
finalize validation rejects incomplete or invalid uploads
structurally invalid reassemblies fail closed instead of shipping as completed jobs

Tests to add:

signed upload/finalize flow tests
invalid finalize-path tests
post-reassembly structural validation regression tests

Exit criteria:

the delivery path is materially safer and more scalable than the current proxy-through-web model

Phase 17: Observability And Operability

Goal: Make the system diagnosable and supportable across web, worker, storage, and provider boundaries.

Tasks:

Add OpenTelemetry-based tracing for web requests and Celery tasks.
Add metrics for queue latency, job-stage durations, throughput, and failure classes.
Add correlation IDs across web logs, worker logs, and task execution.
Define the minimum operational dashboards or log queries needed to support the system.

Checks:

one job can be traced from intake through worker execution and delivery
failures can be correlated across logs and task boundaries

Tests to add:

tracing/metrics smoke tests where practical
correlation-ID propagation tests

Exit criteria:

operators can diagnose real workflow failures without relying on ad hoc database inspection

Phase 18: Product Contract Closure

Goal: Resolve the remaining open decisions that still affect deployment claims and future plan stability.

Tasks:

Decide the supported production database target.
Decide the supported worker topology and scheduler story.
Decide the private deployment minimum, including whether full disconnected execution is a real target.
Update Open Questions to remove decisions that become resolved.
Re-check Current Product Scope and Runtime Architecture after those decisions land.

Checks:

the product and architecture docs stop carrying critical deployment ambiguity

Exit criteria:

the next implementation-plan revision can focus on concrete delivery work rather than unresolved platform promises

Explicit Non-Goals Unless The Product Docs Change

Do not put these back into the next plan without first changing the repo-truth docs:

a separate SPA or standalone frontend beyond the Django-template console
Django admin as a supported operator workflow
non-DOCX source intake as if it were already on the near-term product path

Documentation Rule For Future Changes

When any of the gaps above are implemented, update the scoped docs in this order:

Runtime Architecture
Setup And Configuration
Operations And Scaling
the workflow, storage, API, or product docs affected by the change

If a change is still undecided, put it in Open Questions instead of documenting it as current behavior.

Implementation Plan: Verified Next Work

Related Docs

What Is Already In Place

Recent Plan Items That Are No Longer Gaps

Highest-Value Verified Gaps

1. File Identity And Lineage Contract

2. File-Centric Comparison And Monitoring

3. Production Runtime Baseline

4. Supported Worker And Scheduler Topology

5. Retrieval Contract Cleanup

6. Database And Index Strategy For Retrieval

7. Large Artifact Ingress Hardening

8. Delivery Integrity Hardening

9. Observability And Operability

10. Product Contract Clarification

Suggested Work Order

Phase 12: File Identity And Lineage

Phase 13: File-Centric Comparison And Monitoring

Phase 14: Production Runtime Baseline

Phase 15: Worker Topology And Retrieval Contract

Phase 16: Artifact Transfer And Delivery Hardening

Phase 17: Observability And Operability

Phase 18: Product Contract Closure

Explicit Non-Goals Unless The Product Docs Change

Documentation Rule For Future Changes