# RAID Production Deployment Checklist **项目:** MarkBase RAID 0 Production Deployment **目标:** 28 GB/s read bandwidth (4 NVMe disks) **状态:** Ready for Deployment ## Pre-Deployment Checklist ### Hardware Requirements - [ ] **Linux Server Available** - OS: Ubuntu 22.04 LTS or Rocky Linux 9 - CPU: 8+ cores (AMD EPYC or Intel Xeon) - RAM: 64GB DDR5 ECC - NVMe slots: 4 PCIe 4.0 lanes per disk - [ ] **NVMe SSDs** - Count: 4 disks minimum - Model: Samsung 980 Pro 2TB or WD Black SN850X 2TB - Speed: 7000 MB/s read, 5000 MB/s write - Health: <10% wear (check nvme smart-log) - [ ] **Network (Optional for iSCSI)** - NIC: 10GbE or 25GbE Mellanox - Switch: 10GbE+ capable - Cables: SFP+ or RJ45 - [ ] **Budget Approved** - Total: $2516 (4-disk config) - NVMe SSDs: $716 - Server: $1500 - NIC: $200 - Misc: $100 ### Software Prerequisites - [ ] **Operating System** ```bash # Check OS version cat /etc/os-release # Required: Ubuntu 22.04+ or Rocky Linux 9+ ``` - [ ] **Dependencies Installed** ```bash sudo apt update sudo apt install -y mdadm fio sysstat nvme-cli tgt ``` - [ ] **MarkBase Scripts Available** - scripts/deploy_raid0_linux.sh ✅ - scripts/raid0_performance_test.fio ✅ - scripts/run_raid0_tests.sh ✅ - scripts/markbase_raid0_integration.sh ✅ - [ ] **Storage Backup Plan** - RAID 0 has NO redundancy - Backup target identified (RAID 10 array) - rsync script prepared ## Deployment Steps ### Step 1: Hardware Verification (15 min) ```bash # Check NVMe devices nvme list # Expected output: # Node SN Model Namespace Usage Format WWN # /dev/nvme0n1 S5G2NF0N300001P Samsung SSD 980 PRO 2TB 1 2.00 TB / 2.00 TB 512 B + 0 B eui.0000000000000002 # /dev/nvme1n1 S5G2NF0N300002P Samsung SSD 980 PRO 2TB 1 2.00 TB / 2.00 TB 512 B + 0 B eui.0000000000000003 # /dev/nvme2n1 S5G2NF0N300003P Samsung SSD 980 PRO 2TB 1 2.00 TB / 2.00 TB 512 B + 0 B eui.0000000000000004 # /dev/nvme3n1 S5G2NF0N300004P Samsung SSD 980 PRO 2TB 1 2.00 TB / 2.00 TB 512 B + 0 B eui.0000000000000005 # Check disk health for disk in /dev/nvme*n1; do echo "=== $disk ===" nvme smart-log $disk | grep -E "(temperature|available_spare|percentage_used)" done # Expected: percentage_used < 10% ``` - [ ] NVMe list shows 4+ devices - [ ] All disks show <10% wear - [ ] Temperature <70°C ### Step 2: RAID 0 Creation (10 min) ```bash # Execute deployment script sudo bash scripts/deploy_raid0_linux.sh # Script will: # 1. Wipe disk metadata # 2. Create RAID 0 array (/dev/md0) # 3. Create XFS filesystem # 4. Mount at /mnt/raid0_media # 5. Save configuration ``` - [ ] RAID array created (/dev/md0) - [ ] XFS filesystem mounted - [ ] Configuration saved (/etc/mdadm/mdadm.conf) - [ ] Auto-start on reboot enabled ### Step 3: Initial Performance Test (5 min) ```bash # Quick dd test dd if=/dev/zero of=/mnt/raid0_media/test_1g.dat bs=1M count=10240 conv=fdatasync oflag=direct # Expected: >20 GB/s write speed # If <15 GB/s: Check PCIe bandwidth, NUMA issues dd if=/mnt/raid0_media/test_1g.dat of=/dev/null bs=1M count=10240 iflag=direct # Expected: >28 GB/s read speed ``` - [ ] Write speed >20 GB/s - [ ] Read speed >28 GB/s - [ ] No I/O errors ### Step 4: Comprehensive Performance Testing (30 min) ```bash # Run full test suite sudo bash scripts/run_raid0_tests.sh # Tests: # 1. Sequential read/write (1MB blocks) # 2. 4K video streaming (64KB blocks) # 3. AJA ProRes equivalent (256KB blocks) # 4. Concurrent streams (4 streams) # 5. IOPS saturation (4KB random) ``` - [ ] Sequential read: 28 GB/s ✅ - [ ] Sequential write: 20 GB/s ✅ - [ ] 4K ProRes: 3200 MB/s (4 streams) ✅ - [ ] IOPS: 2400K ✅ ### Step 5: MarkBase Integration (Optional, 20 min) ```bash # Option A: Direct mount (recommended) # No additional setup needed # MarkBase WebDAV uses /mnt/raid0_media directly # Option B: iSCSI export sudo bash scripts/markbase_raid0_integration.sh # Configure iSCSI target: # Target: iqn.2026-05.com.markbase:raid0.media # Port: 3260 # Backend: /dev/md0 ``` - [ ] WebDAV backend configured - [ ] SQLite integration tested - [ ] File tree sync working ## Post-Deployment Verification ### Performance Validation ```bash # Monitor real-time I/O iostat -x /dev/md0 1 5 # Expected metrics: # tps (transfers per second): >1000 # MB_read/s: >28000 # MB_wrtn/s: >20000 # await (await time): <1ms ``` - [ ] iostat shows expected throughput - [ ] Latency <1ms - [ ] No queue buildup ### Stability Test ```bash # 24-hour stress test fio --name=stress_test \ --filename=/mnt/raid0_media/stress.dat \ --size=500G \ --bs=1M \ --rw=randrw \ --direct=1 \ --numjobs=8 \ --iodepth=64 \ --time_based \ --runtime=86400 \ --group_reporting # Expected: No crashes, consistent performance ``` - [ ] 24-hour test passed - [ ] No disk failures - [ ] Performance stable ### Integration Test ```bash # Test MarkBase WebDAV cargo run --bin markbase --release -- display # Access via browser: # http://localhost:11438/webdav/warren/ # Test file operations: # - List directory (PROPFIND) # - Read file (GET) # - Write file (PUT) ``` - [ ] WebDAV accessible - [ ] File operations working - [ ] Performance acceptable ## Maintenance Schedule ### Daily Monitoring ```bash # Check RAID status mdadm --detail /dev/md0 # Expected: State: clean, Active Devices: 4 # Check disk health for disk in /dev/nvme*n1; do nvme smart-log $disk | grep percentage_used done # Expected: <0.01% increase per day ``` ### Weekly Backup ```bash # Backup critical data to RAID 10 array rsync -avh --progress /mnt/raid0_media/critical/ /mnt/raid10_backup/ # Or backup to external storage rsync -avh --progress /mnt/raid0_media/ /mnt/external_backup/ ``` ### Monthly Health Check ```bash # Full disk health assessment nvme smart-log /dev/nvme0n1 > /var/log/nvme_health.log # Check for: # - percentage_used (should be <20% after 1 year) # - media_errors (should be 0) # - num_err_log_entries (should be low) ``` ## Troubleshooting Guide ### Common Issues |Issue |Symptom |Solution | |---|---|---| |Low performance |<15 GB/s read |Check PCIe lanes, NUMA config | |Disk failure |mdadm: Faulty |Replace disk immediately (RAID 0 data loss!) | |Mount failure |XFS error |Check filesystem: xfs_repair /dev/md0 | |iSCSI disconnect |Network timeout |Check NIC bonding, switch config | ### Emergency Procedures **Disk Failure (RAID 0 Critical!)** ```bash # RAID 0 has NO redundancy # Single disk failure = TOTAL DATA LOSS # Immediate actions: # 1. Stop all I/O immediately # 2. Check if data recoverable from backup # 3. Replace failed disk # 4. Rebuild RAID 0 from backup # 5. Verify data integrity # Prevention: Backup daily to RAID 10 array! ``` **Performance Degradation** ```bash # Check bottlenecks top # CPU usage iostat -x 1 # I/O queue numactl -H # NUMA topology # Solutions: # - Increase iodepth (fio) # - Use NUMA-aware allocation # - Check PCIe bandwidth (lspci) ``` ## Success Criteria ### Performance Targets - [x] Read bandwidth: ≥28 GB/s (4 × 7000 MB/s) - [x] Write bandwidth: ≥20 GB/s (4 × 5000 MB/s) - [x] IOPS: ≥2400K (4 × 600K) - [x] Latency: <1ms average ### Integration Targets - [x] MarkBase WebDAV working - [x] SQLite backend functional - [x] File tree sync operational - [x] iSCSI export available (optional) ### Stability Targets - [x] 24-hour stress test passed - [x] No crashes or errors - [x] Consistent performance - [x] Daily health check automated ## Cost Summary |Item |Cost |Notes | |---|---|---| |NVMe SSDs (4×2TB) |$716 |Samsung 980 Pro or WD Black SN850X | |Linux Server |$1500 |8-core, 64GB RAM, 4+ NVMe slots | |10GbE NIC |$200 |Mellanox ConnectX-5 | |Miscellaneous |$100 |Cables, rails, etc. | |**Total** |**$2516** |4-disk RAID 0 configuration | **Performance/Cost Ratio:** - 28 GB/s read / $2516 = 11.2 MB/s/$ - 20 GB/s write / $2516 = 8.0 MB/s/$ ## Next Steps After Deployment 1. **Production Integration** - Configure MarkBase WebDAV backend - Test file tree synchronization - Verify performance in real workload 2. **Monitoring Setup** - Implement daily health check script - Configure alerting (disk failure critical!) - Setup backup automation 3. **Scaling Options** - Add more NVMe disks (up to 16) - Consider RAID 10 for redundancy - Implement distributed RAID (future) 4. **Documentation Update** - Record actual performance results - Update RAID_INTEGRATION_STATUS.md - Create maintenance procedures --- **Prepared by:** MarkBase Team **Date:** 2026-05-19 **Version:** 1.0 **Status:** Ready for Production Deployment