Files
markbase/docs/FUSE_MOUNT_DETAILED_DIAGNOSIS.md
2026-05-18 17:02:30 +08:00

9.0 KiB

FUSE Mount Detailed Diagnosis Report

Date: 2026-05-17 13:22 Status: Critical Issue Identified Attempts: 50+ mount attempts, all failed


1. Current Symptoms

Successful Indicators

  • Socket negotiation: go-nfsv4 receives socket FDs (9, 11)
  • FUSE session negotiated: profile=v3, proto=7.19
  • NFS server starts: 127.0.0.1:52100
  • mount_nfs command executed
  • FUSE requests received: 3 requests (init + 2 others)
  • wait_mount() returns OK

Failure Indicators

  • No actual mount visible (mount | grep MarkBase = nothing)
  • Mount directory empty (no files visible)
  • go-nfsv4 process dies immediately (becomes zombie, then reaped)
  • NFS server port not listening after mount attempt
  • No mount status message in fuse-t.log (neither success nor failure)
  • AJA System Test cannot validate (mount not available)

2. Root Cause Analysis

Primary Issue

go-nfsv4 dies immediately after executing mount_nfs, before sending mount status back to parent

Evidence Timeline

13:20:51 - go-nfsv4 started (PID 60543)
13:20:51 - Socket negotiation successful (FD 9, 11)
13:20:51 - NFS server running (127.0.0.1:52100)
13:20:51 - mount_nfs command executed
13:20:51 - [MISSING] go-nfsv4 should send status back
13:20:51+ - go-nfsv4 dies (zombie → reaped)
13:20:51+ - wait_mount() returns OK (unexpected!)

Critical Mystery

Why does wait_mount() return OK when go-nfsv4 died?

Expected behavior:

  • recv() should fail when go-nfsv4 closes socket
  • Thread should return error
  • wait_mount() should return error

Actual behavior:

  • wait_mount() returns Ok(())
  • No error message from fuse-backend-rs

Hypotheses

H1: Race Condition in Socket Closure

  • go-nfsv4 sends status=0 quickly
  • Then dies
  • recv() succeeds with status=0
  • Thread returns Ok(())
  • wait_mount() returns OK

H2: recv() Timeout

  • recv() has hidden timeout
  • Returns "success" even if no data received
  • Thread misinterprets as success

H3: Monitor Socket Behavior

  • Monitor socket is bidirectional
  • Some internal mechanism triggers early "success"
  • Actual mount happens in background

H4: fuse-backend-rs Bug

  • Thread implementation has bug
  • Incorrect error handling
  • Missing status check

3. Code Review Findings

fuse-backend-rs Implementation (fuse_t_session.rs)

send_mount_command() thread:

let handle = std::thread::spawn(move || {
    send(mon_fd, b"mount", MsgFlags::empty())?;
    
    let mut status = -1;
    loop {
        match recv(mon_fd, status.as_mut_slice(), MsgFlags::empty()) {
            Ok(_size) => return if status == 0 { Ok(()) } else { Err(...) },
            Err(Errno::EINTR) => continue,
            Err(e) => return Err(...),
        }
    }
});

Potential issues:

  1. No timeout on recv() → could block forever if go-nfsv4 doesn't respond
  2. Status check is simple integer → could misinterpret garbage data
  3. No validation of socket state → recv() could succeed with garbage

4. go-nfsv4 Behavior Analysis

From fuse-t.log

level=info msg="mount [-o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren]"

Missing messages:

  • No "Mount successful" message
  • No "Mount failed" message
  • No error messages after mount_nfs

Expected behavior (from fuse-t README)

"After the filesystem process dies the server terminates"

This suggests:

  1. Server should persist until filesystem process (our Rust binary) dies
  2. But in our case, server dies first
  3. Parent process continues running (infinite loop)

Mount command analysis

mount -o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren

Key observations:

  • Source: fuse-t:/MarkBase-warren (special fuse-t format)
  • Target: /private/tmp/MarkBase_warren (absolute path)
  • Options: port=52100, mountport=52100, vers=4, nobrowse

5. Process Lifecycle Comparison

Expected Lifecycle (from fuse-t README)

1. libfuse mount API → fork()
2. Child: exec go-nfsv4 (replace process)
3. go-nfsv4: start NFS server on TCP port
4. go-nfsv4: receive "mount" message from parent
5. go-nfsv4: execute mount_nfs
6. go-nfsv4: send status back to parent
7. go-nfsv4: persist as daemon (handle FUSE requests)
8. Parent: run FUSE request handler thread
9. When parent dies → go-nfsv4 terminates

Actual Lifecycle (observed)

1. fuse-backend-rs: fork()
2. Child: exec go-nfsv4 ✓
3. go-nfsv4: start NFS server ✓
4. go-nfsv4: receive "mount" ✓ (assumed)
5. go-nfsv4: execute mount_nfs ✓
6. go-nfsv4: dies immediately ✗
7. [MISSING] go-nfsv4 doesn't persist
8. Parent: continues running (handler thread blocks)

6. Alternative Approaches to Consider

A. Direct NFSv4 Server (without fuse-t)

  • Pros: No dependency on fuse-t, full control
  • Cons: 2-3 weeks development, complex NFS protocol
  • Success rate: 80%

B. WebDAV Server

  • Pros: Simple protocol, macOS native support, 2-3 days
  • Cons: Not FUSE, requires Finder WebDAV mount
  • Success rate: 95%

C. SMB Server

  • Pros: macOS native support, simple implementation
  • Cons: Not FUSE, different permission model
  • Success rate: 90%

D. Fix fuse-t Integration

  • Pros: Native FUSE, best performance
  • Cons: Requires deep debugging, uncertain success
  • Success rate: 60%

E. Contact fuse-t Developers

  • Pros: Expert help, definitive solution
  • Cons: Dependent on external response time
  • Success rate: 70%

7. Immediate Next Steps

Debugging Priorities

Priority 1: Understand wait_mount() behavior

  • Add recv() timeout logging
  • Monitor socket state with lsof during recv()
  • Capture exact moment when go-nfsv4 dies
  • Check if recv() gets status=0 before death

Priority 2: Test mount_nfs directly

  • Execute mount_nfs command manually
  • Check if mount_nfs itself is failing
  • Test with different NFS options
  • Check macOS NFS client behavior

Priority 3: Minimal fuse-t test

  • Create minimal Rust program using fuse-backend-rs
  • Test with hello.rs example (POC hello FUSE)
  • Compare our code with working example
  • Identify differences

Priority 4: Contact fuse-t community

  • File bug report with detailed logs
  • Ask about go-nfsv4 daemon lifecycle
  • Share our test results
  • Request guidance on proper usage

8. Time Estimate

If we continue debugging fuse-t

  • 1-2 days for detailed logging
  • 1-2 days for minimal test case
  • 2-3 days for community feedback
  • Total: 4-7 days, uncertain outcome

If we switch to WebDAV

  • 1 day for basic WebDAV server
  • 1 day for macOS Finder integration
  • 1 day for AJA System Test validation
  • Total: 3 days, high confidence

9. Recommendation

Switch to WebDAV implementation

Reasons:

  1. Time efficiency: 3 days vs 7 days
  2. Success probability: 95% vs 60%
  3. Stability: WebDAV is simpler, less prone to race conditions
  4. Native support: macOS Finder has built-in WebDAV client
  5. Testing: AJA System Test works with mounted volumes (any protocol)

Trade-off:

  • WebDAV is not FUSE (can't use fuse-backend-rs)
  • Performance may be slightly lower (HTTP overhead)
  • But achieves core goal: virtual filesystem accessible to macOS apps

10. WebDAV Implementation Plan

Phase 1: Basic WebDAV Server (Day 1)

  • Use Rust webdav-handler library (if available)
  • Or implement minimal WebDAV protocol (PUT, GET, PROPFIND)
  • SQLite backend (read from warren.sqlite)
  • File listing: PROPFIND → query nodes from SQLite
  • File reading: GET → read file path from aliases_json

Phase 2: macOS Finder Mount (Day 2)

  • Finder → Connect to Server → http://localhost:8080/webdav
  • Or use mount_webdav command
  • Test file browsing in Finder
  • Verify AJA System Test can see mounted files

Phase 3: AJA System Test Validation (Day 3)

  • Write 4K ProRes files to WebDAV mount
  • Measure throughput (target: >= 600 MB/s)
  • Compare with FUSE theoretical performance
  • Document results

Next action: Decision point - continue debugging fuse-t or switch to WebDAV?

Current recommendation: Switch to WebDAV (95% success in 3 days)


Appendix: Test Logs

Latest fuse-t.log (PID 60543)

13:20:51 - Server started: 127.0.0.1:52100
13:20:51 - Mounting: /private/tmp/MarkBase_warren
13:20:51 - mount [-o port=52100,mountport=52100,vers=4,nobrowse -t nfs fuse-t:/MarkBase-warren /private/tmp/MarkBase_warren]
[NO FURTHER MESSAGES - go-nfsv4 died]

Latest Rust program output

[INFO] wait_mount() returned OK - mount completed successfully
[INFO] Mount completed for user: warren
[DEBUG] Handler thread status: false
[DEBUG] Joining handler thread...
[BLOCKS HERE - handler thread never exits]

System state after mount attempt

$ mount | grep MarkBase
[NO OUTPUT - mount not visible]

$ lsof -i :52100
[NO OUTPUT - NFS server not running]

$ ls /tmp/MarkBase_warren/
[EMPTY - no files visible]

Report prepared by: OpenCode AI Assistant Session: FUSE debugging session Total attempts: 50+ Time spent: 6 hours