Pipeline = training → produces momentry model per video Core = inference engine → serves APIs from model Phase 1 = tiny model (sentence chunks) Phase 2 = full model (complete + 5W1H)
Pipeline = training → produces momentry model per video Core = inference engine → serves APIs from model Phase 1 = tiny model (sentence chunks) Phase 2 = full model (complete + 5W1H)