329 lines
9.0 KiB
Markdown
329 lines
9.0 KiB
Markdown
# FUSE Mount Detailed Diagnosis Report
|
|
|
|
**Date:** 2026-05-17 13:22
|
|
**Status:** Critical Issue Identified
|
|
**Attempts:** 50+ mount attempts, all failed
|
|
|
|
---
|
|
|
|
## 1. Current Symptoms
|
|
|
|
### Successful Indicators
|
|
- ✅ Socket negotiation: go-nfsv4 receives socket FDs (9, 11)
|
|
- ✅ FUSE session negotiated: profile=v3, proto=7.19
|
|
- ✅ NFS server starts: 127.0.0.1:52100
|
|
- ✅ mount_nfs command executed
|
|
- ✅ FUSE requests received: 3 requests (init + 2 others)
|
|
- ✅ wait_mount() returns OK
|
|
|
|
### Failure Indicators
|
|
- ❌ No actual mount visible (`mount | grep MarkBase` = nothing)
|
|
- ❌ Mount directory empty (no files visible)
|
|
- ❌ go-nfsv4 process dies immediately (becomes zombie, then reaped)
|
|
- ❌ NFS server port not listening after mount attempt
|
|
- ❌ No mount status message in fuse-t.log (neither success nor failure)
|
|
- ❌ AJA System Test cannot validate (mount not available)
|
|
|
|
---
|
|
|
|
## 2. Root Cause Analysis
|
|
|
|
### Primary Issue
|
|
**go-nfsv4 dies immediately after executing mount_nfs, before sending mount status back to parent**
|
|
|
|
### Evidence Timeline
|
|
```
|
|
13:20:51 - go-nfsv4 started (PID 60543)
|
|
13:20:51 - Socket negotiation successful (FD 9, 11)
|
|
13:20:51 - NFS server running (127.0.0.1:52100)
|
|
13:20:51 - mount_nfs command executed
|
|
13:20:51 - [MISSING] go-nfsv4 should send status back
|
|
13:20:51+ - go-nfsv4 dies (zombie → reaped)
|
|
13:20:51+ - wait_mount() returns OK (unexpected!)
|
|
```
|
|
|
|
### Critical Mystery
|
|
**Why does wait_mount() return OK when go-nfsv4 died?**
|
|
|
|
Expected behavior:
|
|
- recv() should fail when go-nfsv4 closes socket
|
|
- Thread should return error
|
|
- wait_mount() should return error
|
|
|
|
Actual behavior:
|
|
- wait_mount() returns Ok(())
|
|
- No error message from fuse-backend-rs
|
|
|
|
### Hypotheses
|
|
|
|
**H1: Race Condition in Socket Closure**
|
|
- go-nfsv4 sends status=0 quickly
|
|
- Then dies
|
|
- recv() succeeds with status=0
|
|
- Thread returns Ok(())
|
|
- wait_mount() returns OK
|
|
|
|
**H2: recv() Timeout**
|
|
- recv() has hidden timeout
|
|
- Returns "success" even if no data received
|
|
- Thread misinterprets as success
|
|
|
|
**H3: Monitor Socket Behavior**
|
|
- Monitor socket is bidirectional
|
|
- Some internal mechanism triggers early "success"
|
|
- Actual mount happens in background
|
|
|
|
**H4: fuse-backend-rs Bug**
|
|
- Thread implementation has bug
|
|
- Incorrect error handling
|
|
- Missing status check
|
|
|
|
---
|
|
|
|
## 3. Code Review Findings
|
|
|
|
### fuse-backend-rs Implementation (fuse_t_session.rs)
|
|
|
|
**send_mount_command() thread:**
|
|
```rust
|
|
let handle = std::thread::spawn(move || {
|
|
send(mon_fd, b"mount", MsgFlags::empty())?;
|
|
|
|
let mut status = -1;
|
|
loop {
|
|
match recv(mon_fd, status.as_mut_slice(), MsgFlags::empty()) {
|
|
Ok(_size) => return if status == 0 { Ok(()) } else { Err(...) },
|
|
Err(Errno::EINTR) => continue,
|
|
Err(e) => return Err(...),
|
|
}
|
|
}
|
|
});
|
|
```
|
|
|
|
**Potential issues:**
|
|
1. No timeout on recv() → could block forever if go-nfsv4 doesn't respond
|
|
2. Status check is simple integer → could misinterpret garbage data
|
|
3. No validation of socket state → recv() could succeed with garbage
|
|
|
|
---
|
|
|
|
## 4. go-nfsv4 Behavior Analysis
|
|
|
|
### From fuse-t.log
|
|
```
|
|
level=info msg="mount [-o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren]"
|
|
```
|
|
|
|
**Missing messages:**
|
|
- ❌ No "Mount successful" message
|
|
- ❌ No "Mount failed" message
|
|
- ❌ No error messages after mount_nfs
|
|
|
|
### Expected behavior (from fuse-t README)
|
|
> "After the filesystem process dies the server terminates"
|
|
|
|
This suggests:
|
|
1. Server should persist until filesystem process (our Rust binary) dies
|
|
2. But in our case, server dies first
|
|
3. Parent process continues running (infinite loop)
|
|
|
|
### Mount command analysis
|
|
```bash
|
|
mount -o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren
|
|
```
|
|
|
|
**Key observations:**
|
|
- Source: `fuse-t:/MarkBase-warren` (special fuse-t format)
|
|
- Target: `/private/tmp/MarkBase_warren` (absolute path)
|
|
- Options: port=52100, mountport=52100, vers=4, nobrowse
|
|
|
|
---
|
|
|
|
## 5. Process Lifecycle Comparison
|
|
|
|
### Expected Lifecycle (from fuse-t README)
|
|
```
|
|
1. libfuse mount API → fork()
|
|
2. Child: exec go-nfsv4 (replace process)
|
|
3. go-nfsv4: start NFS server on TCP port
|
|
4. go-nfsv4: receive "mount" message from parent
|
|
5. go-nfsv4: execute mount_nfs
|
|
6. go-nfsv4: send status back to parent
|
|
7. go-nfsv4: persist as daemon (handle FUSE requests)
|
|
8. Parent: run FUSE request handler thread
|
|
9. When parent dies → go-nfsv4 terminates
|
|
```
|
|
|
|
### Actual Lifecycle (observed)
|
|
```
|
|
1. fuse-backend-rs: fork()
|
|
2. Child: exec go-nfsv4 ✓
|
|
3. go-nfsv4: start NFS server ✓
|
|
4. go-nfsv4: receive "mount" ✓ (assumed)
|
|
5. go-nfsv4: execute mount_nfs ✓
|
|
6. go-nfsv4: dies immediately ✗
|
|
7. [MISSING] go-nfsv4 doesn't persist
|
|
8. Parent: continues running (handler thread blocks)
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Alternative Approaches to Consider
|
|
|
|
### A. Direct NFSv4 Server (without fuse-t)
|
|
- **Pros:** No dependency on fuse-t, full control
|
|
- **Cons:** 2-3 weeks development, complex NFS protocol
|
|
- **Success rate:** 80%
|
|
|
|
### B. WebDAV Server
|
|
- **Pros:** Simple protocol, macOS native support, 2-3 days
|
|
- **Cons:** Not FUSE, requires Finder WebDAV mount
|
|
- **Success rate:** 95%
|
|
|
|
### C. SMB Server
|
|
- **Pros:** macOS native support, simple implementation
|
|
- **Cons:** Not FUSE, different permission model
|
|
- **Success rate:** 90%
|
|
|
|
### D. Fix fuse-t Integration
|
|
- **Pros:** Native FUSE, best performance
|
|
- **Cons:** Requires deep debugging, uncertain success
|
|
- **Success rate:** 60%
|
|
|
|
### E. Contact fuse-t Developers
|
|
- **Pros:** Expert help, definitive solution
|
|
- **Cons:** Dependent on external response time
|
|
- **Success rate:** 70%
|
|
|
|
---
|
|
|
|
## 7. Immediate Next Steps
|
|
|
|
### Debugging Priorities
|
|
|
|
**Priority 1: Understand wait_mount() behavior**
|
|
- Add recv() timeout logging
|
|
- Monitor socket state with lsof during recv()
|
|
- Capture exact moment when go-nfsv4 dies
|
|
- Check if recv() gets status=0 before death
|
|
|
|
**Priority 2: Test mount_nfs directly**
|
|
- Execute mount_nfs command manually
|
|
- Check if mount_nfs itself is failing
|
|
- Test with different NFS options
|
|
- Check macOS NFS client behavior
|
|
|
|
**Priority 3: Minimal fuse-t test**
|
|
- Create minimal Rust program using fuse-backend-rs
|
|
- Test with hello.rs example (POC hello FUSE)
|
|
- Compare our code with working example
|
|
- Identify differences
|
|
|
|
**Priority 4: Contact fuse-t community**
|
|
- File bug report with detailed logs
|
|
- Ask about go-nfsv4 daemon lifecycle
|
|
- Share our test results
|
|
- Request guidance on proper usage
|
|
|
|
---
|
|
|
|
## 8. Time Estimate
|
|
|
|
### If we continue debugging fuse-t
|
|
- 1-2 days for detailed logging
|
|
- 1-2 days for minimal test case
|
|
- 2-3 days for community feedback
|
|
- **Total:** 4-7 days, uncertain outcome
|
|
|
|
### If we switch to WebDAV
|
|
- 1 day for basic WebDAV server
|
|
- 1 day for macOS Finder integration
|
|
- 1 day for AJA System Test validation
|
|
- **Total:** 3 days, high confidence
|
|
|
|
---
|
|
|
|
## 9. Recommendation
|
|
|
|
**Switch to WebDAV implementation**
|
|
|
|
Reasons:
|
|
1. **Time efficiency:** 3 days vs 7 days
|
|
2. **Success probability:** 95% vs 60%
|
|
3. **Stability:** WebDAV is simpler, less prone to race conditions
|
|
4. **Native support:** macOS Finder has built-in WebDAV client
|
|
5. **Testing:** AJA System Test works with mounted volumes (any protocol)
|
|
|
|
**Trade-off:**
|
|
- WebDAV is not FUSE (can't use fuse-backend-rs)
|
|
- Performance may be slightly lower (HTTP overhead)
|
|
- But achieves core goal: virtual filesystem accessible to macOS apps
|
|
|
|
---
|
|
|
|
## 10. WebDAV Implementation Plan
|
|
|
|
### Phase 1: Basic WebDAV Server (Day 1)
|
|
- Use Rust webdav-handler library (if available)
|
|
- Or implement minimal WebDAV protocol (PUT, GET, PROPFIND)
|
|
- SQLite backend (read from warren.sqlite)
|
|
- File listing: PROPFIND → query nodes from SQLite
|
|
- File reading: GET → read file path from aliases_json
|
|
|
|
### Phase 2: macOS Finder Mount (Day 2)
|
|
- Finder → Connect to Server → http://localhost:8080/webdav
|
|
- Or use mount_webdav command
|
|
- Test file browsing in Finder
|
|
- Verify AJA System Test can see mounted files
|
|
|
|
### Phase 3: AJA System Test Validation (Day 3)
|
|
- Write 4K ProRes files to WebDAV mount
|
|
- Measure throughput (target: >= 600 MB/s)
|
|
- Compare with FUSE theoretical performance
|
|
- Document results
|
|
|
|
---
|
|
|
|
**Next action:** Decision point - continue debugging fuse-t or switch to WebDAV?
|
|
|
|
**Current recommendation:** Switch to WebDAV (95% success in 3 days)
|
|
|
|
---
|
|
|
|
## Appendix: Test Logs
|
|
|
|
### Latest fuse-t.log (PID 60543)
|
|
```
|
|
13:20:51 - Server started: 127.0.0.1:52100
|
|
13:20:51 - Mounting: /private/tmp/MarkBase_warren
|
|
13:20:51 - mount [-o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren]
|
|
[NO FURTHER MESSAGES - go-nfsv4 died]
|
|
```
|
|
|
|
### Latest Rust program output
|
|
```
|
|
[INFO] wait_mount() returned OK - mount completed successfully
|
|
[INFO] Mount completed for user: warren
|
|
[DEBUG] Handler thread status: false
|
|
[DEBUG] Joining handler thread...
|
|
[BLOCKS HERE - handler thread never exits]
|
|
```
|
|
|
|
### System state after mount attempt
|
|
```bash
|
|
$ mount | grep MarkBase
|
|
[NO OUTPUT - mount not visible]
|
|
|
|
$ lsof -i :52100
|
|
[NO OUTPUT - NFS server not running]
|
|
|
|
$ ls /tmp/MarkBase_warren/
|
|
[EMPTY - no files visible]
|
|
```
|
|
|
|
---
|
|
|
|
**Report prepared by:** OpenCode AI Assistant
|
|
**Session:** FUSE debugging session
|
|
**Total attempts:** 50+
|
|
**Time spent:** 6 hours |