feat: health consistency agent — 4 data integrity checks, GET /health/consistency

This commit is contained in:
Accusys
2026-05-19 02:17:27 +08:00
parent c95de97762
commit 538eea6406
4 changed files with 355 additions and 0 deletions

View File

@@ -0,0 +1,187 @@
<!-- module: health -->
<!-- description: Health check endpoints -->
<!-- depends: 01_auth -->
## Health Check
### `GET /health`
**Auth**: Public
**Scope**: system-level
Returns basic server health status — used by load balancers and monitoring.
#### Example
```bash
curl "$API/health" | jq '{status, version}'
```
#### Response (200)
```json
{
"status": "ok",
"version": "1.0.0",
"build_git_hash": "3a6c1865",
"build_timestamp": "2026-05-16T13:38:15Z",
"uptime_ms": 3015
}
```
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | `ok` or `degraded` |
| `version` | string | Semver version |
| `build_git_hash` | string | Git commit hash |
| `build_timestamp` | string | Binary build time |
| `uptime_ms` | integer | Milliseconds since server start |
---
### `GET /health/detailed`
**Auth**: Required
**Scope**: system-level
Returns full system health including each service status, resource utilization, pipeline readiness, schema migration status, identity file sync status, and external integrations.
> Requires authentication (JWT, session cookie, or API key). The basic `/health` endpoint remains public for load balancer checks.
#### Example
```bash
curl "$API/health/detailed" | jq '{status, services, resources: {cpu: .resources.cpu_used_percent, memory: .resources.memory_used_percent}}'
```
#### Response (200)
```json
{
"status": "ok",
"version": "1.0.0",
"services": {
"postgres": {"status": "ok", "latency_ms": 3},
"redis": {"status": "ok", "latency_ms": 1},
"qdrant": {"status": "ok", "latency_ms": 5}
},
"resources": {
"cpu_used_percent": 12.5,
"memory_available_mb": 32768,
"memory_used_percent": 31.7
},
"pipeline": {
"scripts_ready": true,
"scripts_count": 345,
"processors": {
"asr": true,
"yolo": true,
"face": true,
"pose": true,
"ocr": true,
"cut": true,
"scene": true,
"asrx": true,
"visual_chunk": true
},
"models_ready": true,
"models_count": 42,
"scripts_integrity": {"matched": 332, "total": 345, "ok": false},
"ffmpeg": true
},
"schema": {
"table_exists": true,
"applied": [{"filename": "migrate_add_users_table.sql"}],
"required": [],
"ok": true
},
"identities": {
"directory_exists": true,
"files_count": 3481,
"index_ok": true,
"db_count": 3481,
"synced": true
},
"integrations": {
"tmdb": {
"api_key_configured": false,
"enabled": false,
"api_reachable": null
}
}
}
```
#### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | `ok` if all essential services healthy |
| `services` | object | Per-service status (postgres, redis, qdrant) |
| `services.*.status` | string | `ok`, `error`, or `degraded` |
| `services.*.latency_ms` | int | Response time in milliseconds |
| `resources` | object | CPU, memory usage |
| `pipeline.scripts_ready` | boolean | Scripts directory accessible |
| `pipeline.scripts_count` | int | Number of Python processor scripts |
| `pipeline.processors` | object | Per-processor availability |
| `pipeline.models_ready` | boolean | Models directory accessible |
| `pipeline.scripts_integrity` | object | SHA256 checksum verification results |
| `schema.ok` | boolean | All required migrations applied |
| `identities.synced` | boolean | Identity file count matches DB count |
| `config` | object | Runtime toggle states (cache, auto-pipeline, watcher) |
| `integrations.tmdb` | object | TMDB API key config and reachability |
### `GET /health/consistency`
**Auth**: Required
**Scope**: system-level
Scans the database for data consistency issues. Reports anomalies without modifying any data.
#### Example
```bash
curl -s "$API/health/consistency" -H "X-API-Key: $KEY" | jq '.checks[] | {check, severity, count}'
```
#### Response (200)
```json
{
"status": "degraded",
"checked_at": "2026-05-18T17:30:00Z",
"checks": [
{
"check": "stale_processing",
"severity": "warn",
"count": 3,
"files": [
{"file_name": "video.mp4", "file_uuid": "abc123...", "status": "processing", "detail": "job_id is null"}
]
}
]
}
```
| Check | Description | Severity |
|-------|-------------|---------|
| `stale_processing` | Status=processing but job_id is null | `warn` |
| `orphaned_processing` | Status=processing but no active monitor_job | `warn` |
| `processing_job_done` | Status=processing but job already completed | `warn` |
| `unregistered_with_uuid` | Status=unregistered but row still in DB (migration residue) | `info` |
#### Health status rules
| Condition | status |
|-----------|--------|
| All services ok | `ok` |
| Any service error | `degraded` |
| Postgres or Redis error | `degraded` (server still responds) |
---
### Stats Endpoints
| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| GET | `/api/v1/stats/sftpgo` | No | SFTPGo service status |