feat(backend): add automatic DB migrations

Add a lightweight migration runner with schema_migrations tracking, run pending migrations during backend startup before the scheduler, and keep a manual backend-migrate entrypoint.

The change also moves the existing lockout and task-thread-ID schema steps into shared migration modules, updates docs, and archives the OpenSpec change.
This commit is contained in:
2026-05-05 01:36:58 +08:00
parent e243dccfd7
commit 3ab845798d
21 changed files with 911 additions and 145 deletions
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-04
@@ -0,0 +1,95 @@
## Context
The backend boot path currently creates tables with `Base.metadata.create_all(bind=engine)` and then starts runtime services. Two schema evolution scripts already live under `apps/backend/scripts/`, but they must be run manually and are not coordinated with service startup. That means a fresh or upgraded database can drift out of sync until an operator remembers to run the right script.
This change is scoped to the current backend stack and its SQLite-first runtime behavior. It should improve deployment safety without introducing a full migration framework or changing the database access layer.
## Goals / Non-Goals
**Goals:**
- Apply pending backend schema migrations automatically before the service becomes ready.
- Keep an explicit record of applied migrations in the database.
- Preserve a manual command for running migrations outside normal startup.
- Reuse the existing migration logic for account lockout and task thread ID rather than duplicating it.
- Fail fast if migration execution fails.
**Non-Goals:**
- Introducing Alembic or database-version autogeneration.
- Designing a generic cross-project migration framework.
- Changing business API behavior beyond startup readiness and migration execution.
- Reworking unrelated schema definitions that are not part of the existing manual migrations.
## Decisions
### 1. Add a small migration runner with a metadata table
Create a backend migration module that owns a fixed ordered registry of migration entries. Each entry should have a stable identifier, a short description, and a callable that receives a database connection or session-bound connection. The runner should create and consult a `schema_migrations` table before executing anything.
Why this over relying on ad hoc script execution: the database itself becomes the source of truth for what has already been applied, so startup and manual execution use the same contract. This is lighter than Alembic and better aligned with the current hand-written SQL style.
Alternatives considered:
- Keep only standalone scripts. Rejected because there is no durable applied-state record and startup cannot know whether a migration still needs to run.
- Adopt Alembic now. Rejected because it is larger than the current need and would require a broader migration model than the repo currently uses.
### 2. Run migrations during startup before the scheduler starts
Move the migration runner into the FastAPI lifespan startup path after the database base tables are ensured and before `start_scheduler()` is called.
Why this order: the app should not begin processing scheduled work against a partially migrated database. Running before the scheduler also keeps the failure surface small and makes startup failures explicit.
Alternatives considered:
- Run migrations lazily on first request. Rejected because it defers failure until user traffic arrives and allows the scheduler to start against the wrong schema.
- Run migrations after the scheduler starts. Rejected because scheduler jobs may read or write schema fields that are not yet present.
### 3. Keep the existing scripts as thin wrappers or registered migration implementations
The existing `migrate_add_account_lockout.py` and `migrate_add_task_thread_id.py` logic should be reused through the registry rather than remaining as separate one-off flows. A manual runner entrypoint can call the same registry used at startup.
Why this choice: it avoids duplicate migration behavior and keeps the manual/operator path and the automatic path consistent.
Alternatives considered:
- Rewrite the scripts into a new CLI-only tool and leave startup separate. Rejected because the automatic path would still need a second implementation.
- Delete the scripts entirely. Rejected because operators still need a manual escape hatch and the scripts already document the schema history.
### 4. Treat migration failure as a startup blocker
If any migration raises, the runner should stop immediately and surface the failing migration ID and error. A migration should only be marked applied after its work succeeds.
Why this choice: schema drift is a correctness problem, not a recoverable warning. Marking failure as applied would hide a broken database state and make recovery harder.
### 5. Keep migrations idempotent and validation-first where needed
The runner should skip already-applied migrations based on metadata. Individual migration functions should still check the current schema when they need to perform safe DDL or data backfills, because SQLite DDL and older databases can require defensive inspection.
Why this choice: metadata protects the common case, but some migrations already need schema checks and data validation before altering tables or creating indexes.
## Risks / Trade-offs
- [Risk] Manual SQL migrations can still be database-specific and brittle. → Mitigation: keep the registry small, ordered, and explicit; avoid pretending this is a generic migration framework.
- [Risk] Startup failures may block the entire service when a migration encounters bad legacy data. → Mitigation: fail fast by design, surface the failing migration clearly, and keep validation in the migration itself so operators can fix the data before retrying.
- [Risk] SQLite DDL behavior can make transactional guarantees uneven. → Mitigation: use metadata updates only after successful execution and keep migration steps idempotent so reruns are safe.
- [Risk] The manual scripts and the registry could drift apart. → Mitigation: make the scripts call the shared migration functions or runner so there is one source of truth.
## Migration Plan
1. Add the migration metadata table and runner module.
2. Register the two existing migrations in execution order.
3. Wire the runner into backend startup before the scheduler begins.
4. Add a manual CLI entrypoint that invokes the same runner.
5. Add tests for first-run execution, repeat-run skipping, failure handling, and startup ordering.
6. Update the developer/deployment notes to mention automatic startup migrations and the manual command.
Rollback strategy:
- If a migration blocks startup, fix the underlying migration or data issue and rerun startup or the manual command.
- Because applied migrations are tracked explicitly, repeated runs should remain safe once the issue is resolved.
## Open Questions
- Should the manual runner live as a dedicated `backend.scripts.run_migrations` module, or should the startup helper be the only public entrypoint and scripts import it directly?
- Should migration IDs be semantic strings based on the change name, or timestamp-prefixed identifiers to make ordering obvious?
@@ -0,0 +1,30 @@
## Why
The backend currently relies on `Base.metadata.create_all()` plus manually executed SQL scripts, so existing databases do not reliably receive schema changes during deployment or restart. This is risky now because recent backend changes already added schema evolution scripts that are easy to forget before the scheduler starts using the database.
## What Changes
- Add a lightweight backend migration capability that tracks applied migrations in the database and runs pending migrations in deterministic order.
- Run backend migrations automatically during FastAPI startup after base table creation and before the scheduler starts.
- Preserve a manual migration entrypoint for operators and developers who want to apply or inspect migrations outside normal service startup.
- Adapt the existing account-lockout and task-thread-id migration scripts into the registered migration path while keeping them safe to skip after they have already run.
- Fail startup clearly when a migration fails, and never mark a failed migration as applied.
- Do not add Alembic or a broad migration framework in this change.
## Capabilities
### New Capabilities
- `backend-auto-migrations`: automatic and manual execution contract for ordered backend database migrations.
### Modified Capabilities
None.
## Impact
- Affected backend startup path: `apps/backend/main.py`.
- Affected database code: `apps/backend/models/database.py` and a new backend migration module or service.
- Affected scripts: existing migration scripts under `apps/backend/scripts/` plus a consolidated manual runner.
- Affected tests: backend migration runner tests and startup-order coverage.
- Affected documentation: developer/deployment guidance for automatic migrations and the manual migration command.
@@ -0,0 +1,72 @@
## ADDED Requirements
### Requirement: Ordered backend migration registry
The backend SHALL define a deterministic registry of database migrations with stable identifiers and execution order.
#### Scenario: Registry order is stable
- **WHEN** the migration runner loads available migrations
- **THEN** it SHALL evaluate them in the registered order using stable migration identifiers.
#### Scenario: Existing migrations are registered
- **WHEN** the backend migration registry is built
- **THEN** it SHALL include migrations for the existing account-lockout fields and task thread ID schema changes.
### Requirement: Applied migration tracking
The backend SHALL store applied migration records in the application database and use those records to skip completed migrations.
#### Scenario: Migration metadata is initialized
- **WHEN** the migration runner starts against a database without migration metadata
- **THEN** it SHALL create the migration metadata table before checking pending migrations.
#### Scenario: Pending migration is marked after success
- **WHEN** a pending migration completes successfully
- **THEN** the backend SHALL record that migration as applied with its stable identifier and applied timestamp.
#### Scenario: Completed migration is skipped
- **WHEN** a migration identifier is already present in the applied migration metadata
- **THEN** the backend SHALL skip that migration instead of executing it again.
#### Scenario: Failed migration is not marked
- **WHEN** a migration fails during execution
- **THEN** the backend SHALL NOT record that migration as applied.
### Requirement: Automatic startup migration execution
The backend SHALL run pending database migrations automatically during API startup before runtime background work begins.
#### Scenario: Startup applies migrations before scheduler
- **WHEN** the FastAPI lifespan startup runs
- **THEN** it SHALL initialize base database tables, run pending migrations, and only then start the scheduler.
#### Scenario: Startup stops on migration failure
- **WHEN** any pending migration fails during startup
- **THEN** the backend SHALL fail startup and SHALL NOT start the scheduler.
#### Scenario: Startup logs migration activity
- **WHEN** startup migration execution runs
- **THEN** the backend SHALL log whether migrations were applied, skipped, or failed with the relevant migration identifier.
### Requirement: Manual migration execution
The backend SHALL provide a manual command path that runs the same registered migrations used by automatic startup.
#### Scenario: Operator runs migrations manually
- **WHEN** an operator executes the documented backend migration command
- **THEN** the command SHALL apply pending registered migrations and skip already-applied migrations.
#### Scenario: Manual failure exits unsuccessfully
- **WHEN** a migration fails during manual execution
- **THEN** the command SHALL exit unsuccessfully after reporting the failing migration.
### Requirement: Existing migration behavior is preserved
The backend SHALL preserve the behavior of existing schema changes when they move into the automatic migration path.
#### Scenario: Account lockout fields are added
- **WHEN** the account-lockout migration runs against a database missing its fields
- **THEN** it SHALL add `failed_login_attempts`, `locked_until`, and `last_failed_login` to the users table.
#### Scenario: Task thread identity is backfilled
- **WHEN** the task thread ID migration runs against valid existing task payloads
- **THEN** it SHALL add or maintain `check_in_tasks.thread_id`, backfill it from `payload_config.ThreadId`, and ensure the per-user thread ID uniqueness index exists.
#### Scenario: Invalid legacy task payload blocks migration
- **WHEN** the task thread ID migration finds missing or duplicate `ThreadId` values that would make the schema invalid
- **THEN** it SHALL fail with a clear validation error instead of silently creating inconsistent task identity data.
@@ -0,0 +1,19 @@
## 1. Migration Runner Foundation
- [x] 1.1 Add a backend migration metadata table and helper utilities for reading and writing applied migration records.
- [x] 1.2 Implement a deterministic migration registry with stable identifiers and ordered execution.
- [x] 1.3 Extract the existing account-lockout and task-thread-id migration logic into shared callable migration units.
- [x] 1.4 Add a manual migration entrypoint that invokes the shared registry.
## 2. Startup Integration
- [x] 2.1 Wire the migration runner into backend FastAPI startup after base table creation and before scheduler startup.
- [x] 2.2 Make startup fail fast when a migration fails and ensure the scheduler does not start afterward.
- [x] 2.3 Add clear logs for applied, skipped, and failed migrations.
## 3. Verification and Documentation
- [x] 3.1 Add backend tests for first-run execution, repeat-run skipping, and failure-not-marked behavior.
- [x] 3.2 Add startup-order coverage to prove migrations run before the scheduler.
- [x] 3.3 Update developer/deployment documentation with automatic startup migration behavior and the manual migration command.
- [x] 3.4 Run the repository checks for backend code and OpenSpec validation before archiving or implementation handoff.
@@ -0,0 +1,75 @@
# backend-auto-migrations Specification
## Purpose
Backend database migration contract for applying ordered schema changes during startup and through a manual operator command.
## Requirements
### Requirement: Ordered backend migration registry
The backend SHALL define a deterministic registry of database migrations with stable identifiers and execution order.
#### Scenario: Registry order is stable
- **WHEN** the migration runner loads available migrations
- **THEN** it SHALL evaluate them in the registered order using stable migration identifiers.
#### Scenario: Existing migrations are registered
- **WHEN** the backend migration registry is built
- **THEN** it SHALL include migrations for the existing account-lockout fields and task thread ID schema changes.
### Requirement: Applied migration tracking
The backend SHALL store applied migration records in the application database and use those records to skip completed migrations.
#### Scenario: Migration metadata is initialized
- **WHEN** the migration runner starts against a database without migration metadata
- **THEN** it SHALL create the migration metadata table before checking pending migrations.
#### Scenario: Pending migration is marked after success
- **WHEN** a pending migration completes successfully
- **THEN** the backend SHALL record that migration as applied with its stable identifier and applied timestamp.
#### Scenario: Completed migration is skipped
- **WHEN** a migration identifier is already present in the applied migration metadata
- **THEN** the backend SHALL skip that migration instead of executing it again.
#### Scenario: Failed migration is not marked
- **WHEN** a migration fails during execution
- **THEN** the backend SHALL NOT record that migration as applied.
### Requirement: Automatic startup migration execution
The backend SHALL run pending database migrations automatically during API startup before runtime background work begins.
#### Scenario: Startup applies migrations before scheduler
- **WHEN** the FastAPI lifespan startup runs
- **THEN** it SHALL initialize base database tables, run pending migrations, and only then start the scheduler.
#### Scenario: Startup stops on migration failure
- **WHEN** any pending migration fails during startup
- **THEN** the backend SHALL fail startup and SHALL NOT start the scheduler.
#### Scenario: Startup logs migration activity
- **WHEN** startup migration execution runs
- **THEN** the backend SHALL log whether migrations were applied, skipped, or failed with the relevant migration identifier.
### Requirement: Manual migration execution
The backend SHALL provide a manual command path that runs the same registered migrations used by automatic startup.
#### Scenario: Operator runs migrations manually
- **WHEN** an operator executes the documented backend migration command
- **THEN** the command SHALL apply pending registered migrations and skip already-applied migrations.
#### Scenario: Manual failure exits unsuccessfully
- **WHEN** a migration fails during manual execution
- **THEN** the command SHALL exit unsuccessfully after reporting the failing migration.
### Requirement: Existing migration behavior is preserved
The backend SHALL preserve the behavior of existing schema changes when they move into the automatic migration path.
#### Scenario: Account lockout fields are added
- **WHEN** the account-lockout migration runs against a database missing its fields
- **THEN** it SHALL add `failed_login_attempts`, `locked_until`, and `last_failed_login` to the users table.
#### Scenario: Task thread identity is backfilled
- **WHEN** the task thread ID migration runs against valid existing task payloads
- **THEN** it SHALL add or maintain `check_in_tasks.thread_id`, backfill it from `payload_config.ThreadId`, and ensure the per-user thread ID uniqueness index exists.
#### Scenario: Invalid legacy task payload blocks migration
- **WHEN** the task thread ID migration finds missing or duplicate `ThreadId` values that would make the schema invalid
- **THEN** it SHALL fail with a clear validation error instead of silently creating inconsistent task identity data.