feat: add appearance feature system with coordinate/scale fixes

- Add Appearance_Feature_System_V1.0.md design doc
- Add proportion_calculator.py for body proportions (height, body shape)
- Add feature_extractor.py for hierarchical feature extraction
- Add tkg_level1_builder.py for TKG person_trace nodes
- Fix mediapipe_holistic_processor.py to output Top-Left pixels
- Add MediaPipe format conversion in proportion_calculator

Coordinate system alignment:
- Swift Pose: Top-Left pixels (Y-flip done in swift_pose.swift)
- MediaPipe: Top-Left pixels (norm→pixel conversion added)
This commit is contained in:
Accusys
2026-06-22 02:27:03 +08:00
parent 97180aa7cd
commit 606f31f13c
5 changed files with 2397 additions and 29 deletions

View File

@@ -0,0 +1,664 @@
---
title: Appearance Feature System V1.0
version: 1.0.0
date: 2025-06-22
author: OpenCode
status: Draft
---
# Appearance Feature System V1.0
## Overview
### Purpose
Lock onto a target and continuously track across frames using appearance features.
### Architecture
```
Face (identification) → Pose (tracking) → Appearance (tracking)
↓ ↓ ↓
identity_uuid bbox features + proportions
```
### Data Sources
| Source | Provides | Output |
|--------|----------|--------|
| Face | identity, landmarks | face.json |
| Pose | bbox, keypoints | pose.json |
| MediaPipe | detailed landmarks, hands | mediapipe.json |
---
## Keypoint Systems
### Swift Pose (Apple Vision) - 19 Keypoints
| Index | Keypoint | Vision Framework Joint |
|-------|----------|------------------------|
| 0 | nose | .nose (head_joint) |
| 1 | left_eye | .leftEye (left_eye_joint) |
| 2 | right_eye | .rightEye (right_eye_joint) |
| 3 | left_ear | .leftEar (left_ear_joint) |
| 4 | right_ear | .rightEar (right_ear_joint) |
| 5 | neck | .neck (neck_1_joint) |
| 6 | root | .root (center_hip_joint) |
| 7 | left_shoulder | .leftShoulder |
| 8 | right_shoulder | .rightShoulder |
| 9 | left_elbow | .leftElbow |
| 10 | right_elbow | .rightElbow |
| 11 | left_wrist | .leftWrist (left_hand_joint) |
| 12 | right_wrist | .rightWrist (right_hand_joint) |
| 13 | left_hip | .leftHip |
| 14 | right_hip | .rightHip |
| 15 | left_knee | .leftKnee |
| 16 | right_knee | .rightKnee |
| 17 | left_ankle | .leftAnkle |
| 18 | right_ankle | .rightAnkle |
### MediaPipe Pose - 33 Landmarks
| Index | Name | Index | Name |
|-------|------|-------|------|
| 0 | nose | 17 | left_pinky |
| 1 | left_eye_inner | 18 | right_pinky |
| 2 | left_eye | 19 | left_index |
| 3 | left_eye_outer | 20 | right_index |
| 4 | right_eye_inner | 21 | left_thumb |
| 5 | right_eye | 22 | right_thumb |
| 6 | right_eye_outer | 23 | left_hip |
| 7 | left_ear | 24 | right_hip |
| 8 | right_ear | 25 | left_knee |
| 9 | mouth_left | 26 | right_knee |
| 10 | mouth_right | 27 | left_ankle |
| 11 | left_shoulder | 28 | right_ankle |
| 12 | right_shoulder | 29 | left_heel |
| 13 | left_elbow | 30 | right_heel |
| 14 | right_elbow | 31 | left_foot_index |
| 15 | left_wrist | 32 | right_foot_index |
| 16 | right_wrist | | |
### MediaPipe Hand - 21 Landmarks
| Index | Name | Finger |
|-------|------|--------|
| 0 | wrist | - |
| 1-4 | thumb_cmc/mcp/ip/tip | thumb |
| 5-8 | index_mcp/pip/dip/tip | index |
| 9-12 | middle_mcp/pip/dip/tip | middle |
| 13-16 | ring_mcp/pip/dip/tip | ring |
| 17-20 | pinky_mcp/pip/dip/tip | pinky |
### YOLOv8 Pose (Fallback) - 17 Keypoints
| Index | Name |
|-------|------|
| 0 | nose |
| 1 | left_eye |
| 2 | right_eye |
| 3 | left_ear |
| 4 | right_ear |
| 5 | left_shoulder |
| 6 | right_shoulder |
| 7 | left_elbow |
| 8 | right_elbow |
| 9 | left_wrist |
| 10 | right_wrist |
| 11 | left_hip |
| 12 | right_hip |
| 13 | left_knee |
| 14 | right_knee |
| 15 | left_ankle |
| 16 | right_ankle |
---
## Body Proportions Calculation
### Reference Unit
```python
# Eye distance as reference unit
eye_width = distance(left_eye, right_eye)
```
### Body Measurements
```python
# Full body height (nose to ankle)
nose_y = keypoints['nose']['y']
ankle_y = max(keypoints['left_ankle']['y'], keypoints['right_ankle']['y'])
body_height = ankle_y - nose_y
# Upper body (neck to hip)
neck_y = keypoints['neck']['y']
hip_y = (keypoints['left_hip']['y'] + keypoints['right_hip']['y']) / 2
torso_height = hip_y - neck_y
# Lower body (hip to ankle)
leg_height = ankle_y - hip_y
# Shoulder width
shoulder_width = distance(left_shoulder, right_shoulder)
```
### Proportion Ratios
```python
proportions = {
'eye_width': eye_width,
'body_height': body_height,
'torso_height': torso_height,
'leg_height': leg_height,
'shoulder_width': shoulder_width,
'head_ratio': eye_width / body_height,
'torso_ratio': torso_height / body_height,
'leg_ratio': leg_height / body_height,
}
```
### Body Shape Calculation (三圍)
```python
# Chest width (shoulder width approximation)
chest_width = distance(left_shoulder, right_shoulder)
# Waist width (hip width approximation)
waist_width = distance(left_hip, right_hip)
# Hip width
hip_width = distance(left_hip, right_hip)
# Body shape classification
if chest_waist_ratio < 1.0 and waist_hip_ratio < 0.9:
shape_type = "hourglass" #葫芦形
elif chest_waist_ratio > 1.2:
shape_type = "triangle" # 倒三角
elif waist_hip_ratio > 1.1:
shape_type = "inverted_triangle" # 正三角
elif abs(chest_width - hip_width) < 0.1 * max(chest_width, hip_width):
shape_type = "rectangle" #矩形
else:
shape_type = "oval" #椭圆形
```
### Height Estimation
```python
# Use eye_width as reference (≈6cm)
height_ratio = body_height / eye_width
estimated_height_cm = height_ratio * 6.0
# Height category
if estimated_height_cm < 150:
height_category = "short"
elif estimated_height_cm < 170:
height_category = "medium"
elif estimated_height_cm < 180:
height_category = "tall"
else:
height_category = "very_tall"
```
---
## Appearance Feature Location Mapping
### Environment Factors
| Feature | Location | Detection Method |
|---------|----------|------------------|
| Light type | Frame background | HSV H distribution |
| Light direction | Shadow analysis | Shadow orientation |
| Light intensity | Overall brightness | HSV V mean |
### Head Features
#### Hair Style
| Feature | Keypoints Range |
|---------|-----------------|
| Short hair | head_top → ear/neck |
| Long hair | head_top → shoulder/back |
| Ponytail | head_top → neck (tied) |
| Braids | head_top → shoulder (braided) |
| Curly hair | hair region texture |
| Straight hair | hair region texture |
#### Hair Accessories
| Feature | Keypoints |
|---------|-----------|
| Hair band | eye_distance (head top) |
| Hair clip | ear/head |
| Hair wrap | ear_distance |
| Hair tie | neck (ponytail position) |
| Hair pin | head |
#### Head Accessories
| Feature | Keypoints |
|---------|-----------|
| Hat | head_top → eye |
| Headscarf | ear_distance (wrapped) |
| Hood | head_top → neck (full head) |
#### Hair Color
| Feature | Detection |
|---------|-----------|
| Hair color HSV | hair region HSV histogram |
### Face Features
#### Eye Accessories
| Feature | Keypoints |
|---------|-----------|
| Glasses | eye_distance |
| Sunglasses | eye_distance (larger) |
#### Ear Accessories
| Feature | Keypoints |
|---------|-----------|
| Earrings | ear_position |
| Headphones (over-ear) | ear_distance (wrapped) |
| Earphones (in-ear) | ear_position |
| Earphones (ear-hook) | ear_position |
#### Face Accessories
| Feature | Keypoints |
|---------|-----------|
| Blush | cheeks (below eye) |
| Lipstick | lips (nose + eye_width * 0.5) |
| Mask | ear_distance, eye → neck |
#### Skin Tone
| Feature | Detection |
|---------|-----------|
| Skin color HSV | face region HSV histogram |
### Neck Features
#### Neck Accessories
| Feature | Keypoints |
|---------|-----------|
| Collar | neck |
| Bow tie | neck → chest |
| Tie | neck → hip |
| Scarf | neck → shoulder |
| Necklace | neck |
#### Hanging Accessories
| Feature | Keypoints |
|---------|-----------|
| Pendant (necklace) | neck → chest |
| Charm (bag) | bag_position |
| Charm (phone) | phone_position |
### Upper Body Features
#### Clothing
| Feature | Keypoints |
|---------|-----------|
| Shirt color | neck → hip |
| Shirt material | clothing texture (LBP) |
| Clothing pattern | pattern detection |
#### Sleeves
| Feature | Keypoints |
|---------|-----------|
| Long sleeve | shoulder → wrist |
| Short sleeve | shoulder → elbow |
| Arm sleeve | elbow → wrist |
#### Back Features
| Feature | Keypoints |
|---------|-----------|
| Back exposed | shoulder → hip (view angle) |
| Back tattoo | back exposed skin |
### Bags
| Feature | Keypoints |
|---------|-----------|
| Handbag | hand_position |
| Shoulder bag | shoulder_position |
| Backpack | shoulder → hip (back) |
| Waist bag | hip_position |
### Hand Features
#### Hand Accessories
| Feature | Keypoints |
|---------|-----------|
| Watch | wrist |
| Bracelet | wrist → hand |
| Ring | finger (MediaPipe hand landmarks 13-16) |
| Gloves | wrist → hand |
| Nail polish | finger tips |
#### Handheld Objects
| Feature | Keypoints |
|---------|-----------|
| Phone | hand + object detection |
| Handbag | hand + object detection |
### Lower Body Features
#### Pants
| Feature | Keypoints |
|---------|-----------|
| Long pants | hip → ankle |
| Shorts | hip → knee |
#### Waist Accessories
| Feature | Keypoints |
|---------|-----------|
| Belt | hip |
### Foot Features
#### Foot Accessories
| Feature | Keypoints |
|---------|-----------|
| Anklet | ankle |
| Socks | ankle → foot |
| Shoes | ankle |
### Skin Features
| Feature | Detection |
|---------|-----------|
| Tattoo | exposed skin anomaly color block |
### Exposed Skin Detection
| Location | Coverage Detection |
|----------|-------------------|
| Face | always exposed |
| Arms | exposed if short sleeve |
| Legs | exposed if shorts |
| Hands | exposed if no gloves |
| Feet | exposed if no socks |
---
## Mobility Aids / Vehicles
### Walking Aids (Object Detection)
| Feature | Keypoints |
|---------|-----------|
| Cane | hand + object |
| Wheelchair | hip + object |
| Walker | both hands + object |
### Mobility Tools (Object Detection)
| Feature | Keypoints |
|---------|-----------|
| Roller skates | ankle + object |
| Skateboard | ankle + object |
| Scooter | hand + ankle + object |
### Vehicles (Object Detection)
| Feature | Keypoints |
|---------|-----------|
| Motorcycle | hip + ankle + object |
| Bicycle | hip + ankle + object |
| Tricycle | hip + ankle + object |
| Car | hip + object |
---
## Feature Extraction Techniques
### Color Extraction (HSV Histogram)
```python
def extract_color(roi):
hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
h_hist = cv2.calcHist([hsv], [0], None, [30], [0, 180])
s_hist = cv2.calcHist([hsv], [1], None, [32], [0, 256])
v_hist = cv2.calcHist([hsv], [2], None, [32], [0, 256])
return {
'h_histogram': normalize(h_hist),
's_histogram': normalize(s_hist),
'v_histogram': normalize(v_hist),
}
```
### Dominant Color (K-means)
```python
def extract_dominant_colors(roi, k=5):
hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
pixels = hsv.reshape(-1, 3).astype(np.float32)
_, labels, centers = cv2.kmeans(pixels, k, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
counts = np.bincount(labels.flatten())
return centers[np.argsort(-counts)[:k]]
```
### Texture Extraction (LBP)
```python
def extract_texture(roi):
gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
lbp = local_binary_pattern(gray, P=8, R=1)
return {
'lbp_variance': np.var(lbp),
'lbp_histogram': np.histogram(lbp, bins=256)[0],
}
```
### Shininess Detection
```python
def detect_shininess(roi):
hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
v_mean = np.mean(hsv[:,:,2])
v_std = np.std(hsv[:,:,2])
return {
'brightness': v_mean,
'brightness_variance': v_std,
}
```
---
## Tracking Flow
### Feature Storage Strategy
| Level | Storage | Reason |
|-------|---------|--------|
| **Level 1** | TKG nodes | Stable features for tracking |
| **Level 2** | Dynamic | On-demand calculation |
| **Level 3** | Dynamic | On-demand calculation |
### Level 1 in TKG
```sql
-- New node_type: person_trace
INSERT INTO tkg_nodes (
node_type = 'person_trace',
external_id = 'person_{frame}_{index}',
file_uuid = 'xxx',
properties = {
'frame_count': 100,
'frames': [1, 30, 60, ...],
'avg_bbox': {...},
'height_estimate': {
'estimated_height_cm': 170.5,
'height_ratio': 28.4,
'height_category': 'tall'
},
'body_shape': {
'chest_width': 150.2,
'waist_width': 100.5,
'hip_width': 120.3,
'chest_waist_ratio': 1.49,
'waist_hip_ratio': 0.84,
'body_shape': 'hourglass'
},
'level1_features': {
'body': {...},
'head_top': {...},
'upper_body': {...},
'lower_body': {...}
}
}
)
```
### Level 2/3 Dynamic Calculation
```python
# Level 2: computed on query
face_features = extractor.extract_level2(frame, regions)
# Level 3: computed on query
accessory_features = extractor.extract_level3(frame, keypoints, eye_width)
```
### Matching Strategy
```
Frame N → Frame N+1:
1. Pose bbox IoU → same person position
2. Level 1 similarity (TKG) → same feature combination
3. Level 2/3 dynamic → detailed verification
4. Face identity → final confirmation (if face detected)
Result: Continuous tracking of same identity
```
### IoU Calculation
```python
def calculate_iou(bbox1, bbox2):
x1, y1, w1, h1 = bbox1
x2, y2, w2, h2 = bbox2
xi1 = max(x1, x2)
yi1 = max(y1, y2)
xi2 = min(x1 + w1, x2 + w2)
yi2 = min(y1 + h1, y2 + h2)
inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
union_area = w1 * h1 + w2 * h2 - inter_area
return inter_area / union_area if union_area > 0 else 0
```
### Feature Similarity
```python
def calculate_similarity(features1, features2):
# HSV histogram similarity
h_sim = cv2.compareHist(features1['h_histogram'], features2['h_histogram'], cv2.HISTCMP_CORREL)
# Dominant color similarity
color_dist = np.linalg.norm(features1['dominant_colors'] - features2['dominant_colors'])
# Combined score
return {
'color_similarity': h_sim,
'color_distance': color_dist,
'overall_score': h_sim * 0.7 + (1 - color_dist/255) * 0.3,
}
```
---
## Output Format
### appearance.json Structure
```json
{
"frame_count": 100,
"fps": 30.0,
"frames": [
{
"frame": 1,
"timestamp": 0.033,
"persons": [
{
"person_index": 0,
"bbox": {"x": 100, "y": 200, "width": 400, "height": 600},
"identity_uuid": "xxx-xxx-xxx",
"proportions": {
"eye_width": 50.0,
"body_height": 600.0,
"torso_height": 200.0,
"leg_height": 300.0,
"shoulder_width": 150.0,
"head_ratio": 0.08,
"torso_ratio": 0.33,
"leg_ratio": 0.50
},
"features": {
"hair": {
"color": {"h_histogram": [...], "dominant_colors": [...]},
"length": "long",
"style": "straight"
},
"skin": {
"color": {"h_histogram": [...], "dominant_colors": [...]}
},
"clothing": {
"upper": {
"color": {...},
"material": "cotton",
"pattern": "solid",
"sleeve": "short"
},
"lower": {
"color": {...},
"length": "long"
}
},
"accessories": {
"earring": true,
"watch": true,
"shoes_color": {...}
}
}
}
]
}
]
}
```
---
## Dependencies
### Processor Dependencies
| Processor | Depends On | Reason |
|-----------|------------|--------|
| Appearance | Pose | bbox for region extraction |
| Appearance | Face | identity matching + face landmarks |
| Appearance | MediaPipe | hand landmarks + detailed pose |
### Data Flow
```
pose.json → bbox + keypoints
face.json → identity + face landmarks
mediapipe.json → hand landmarks + pose landmarks
appearance.json → features + proportions + tracking
```
---
## Implementation Phases
### Phase 1: Design Document
- Create this design document
- Define all feature mappings
- Define output format
### Phase 2: Appearance Processor Refactor
- Add proportion calculation module
- Add feature extraction module
- Integrate Pose + MediaPipe + Face data
- Add IoU matching for pose-face
### Phase 3: Output Format Update
- Update appearance.json structure
- Update Rust structs
- Update DB schema
### Phase 4: Testing
- Unit tests for proportion calculation
- Integration tests for full pipeline
- Real video tracking validation
---
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0.0 | 2025-06-22 | OpenCode | Initial design document |