Skip to content

Actions: microsoft/DeepSpeed

nv-lightning-v100

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
5,003 workflow runs
5,003 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

nv-lightning-v100
nv-lightning-v100 #13861: Scheduled
December 26, 2024 00:20 5m 52s master
December 26, 2024 00:20 5m 52s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13857: Pull request #6909 synchronize by hj-wei
December 25, 2024 02:18 Action required hj-wei:dev_hjwei
December 25, 2024 02:18 Action required
Add the missing view operations from sequence parallel(async).
nv-lightning-v100 #13856: Pull request #6750 synchronize by inkcherry
December 25, 2024 01:50 Action required inkcherry:ds_overlap_fix
December 25, 2024 01:50 Action required
nv-lightning-v100
nv-lightning-v100 #13855: Scheduled
December 25, 2024 00:20 5m 35s master
December 25, 2024 00:20 5m 35s
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
nv-lightning-v100 #13854: Pull request #6909 opened by hj-wei
December 24, 2024 07:38 Action required hj-wei:dev_hjwei
December 24, 2024 07:38 Action required
[inf] Add config var to enable keeping module on host
nv-lightning-v100 #13853: Pull request #6846 synchronize by oelayan7
December 24, 2024 06:49 6m 33s oelayan7:keep_module_on_host
December 24, 2024 06:49 6m 33s
nv-lightning-v100
nv-lightning-v100 #13852: Scheduled
December 24, 2024 00:21 6m 51s master
December 24, 2024 00:21 6m 51s
Tecorigin sdaa accelerator
nv-lightning-v100 #13851: Pull request #6903 synchronize by tjruwase
December 23, 2024 23:13 Action required siqi654321:Tecorigin-SDAA-accelerator
December 23, 2024 23:13 Action required
Tecorigin sdaa accelerator
nv-lightning-v100 #13848: Pull request #6903 opened by siqi654321
December 23, 2024 02:21 3m 12s siqi654321:Tecorigin-SDAA-accelerator
December 23, 2024 02:21 3m 12s
nv-lightning-v100
nv-lightning-v100 #13847: Scheduled
December 23, 2024 00:21 5m 51s master
December 23, 2024 00:21 5m 51s
nv-lightning-v100
nv-lightning-v100 #13846: Scheduled
December 22, 2024 00:23 5m 45s master
December 22, 2024 00:23 5m 45s
nv-lightning-v100
nv-lightning-v100 #13845: Scheduled
December 21, 2024 00:20 5m 46s master
December 21, 2024 00:20 5m 46s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13843: Pull request #6773 synchronize by tohtana
December 20, 2024 18:16 6m 15s deepcharm:stage3-use-new-grad-acc-api
December 20, 2024 18:16 6m 15s
Fix error caused by all_reduce call in domino
nv-lightning-v100 #13841: Pull request #6880 synchronize by tjruwase
December 20, 2024 02:22 1h 52m 55s hongwei/fix_domino_allreduce
December 20, 2024 02:22 1h 52m 55s
Change compile for pipeline module torch.compile
nv-lightning-v100 #13840: Pull request #6478 synchronize by loadams
December 20, 2024 00:56 1h 47m 34s NirSonnenschein:torch_compile_micro_offset_fix
December 20, 2024 00:56 1h 47m 34s
Fix checkpointable_layers Logic
nv-lightning-v100 #13839: Pull request #6881 synchronize by loadams
December 20, 2024 00:55 1h 19m 46s Quentin-Anthony:qanthony/fix-act-recomp
December 20, 2024 00:55 1h 19m 46s
Stage3: Use new torch grad accumulation hooks API
nv-lightning-v100 #13838: Pull request #6773 synchronize by loadams
December 20, 2024 00:55 18m 33s deepcharm:stage3-use-new-grad-acc-api
December 20, 2024 00:55 18m 33s
nv-lightning-v100
nv-lightning-v100 #13837: Scheduled
December 20, 2024 00:21 38m 27s master
December 20, 2024 00:21 38m 27s
Fix error caused by all_reduce call in domino
nv-lightning-v100 #13836: Pull request #6880 synchronize by loadams
December 19, 2024 23:23 26m 29s hongwei/fix_domino_allreduce
December 19, 2024 23:23 26m 29s
Fix checkpointable_layers Logic
nv-lightning-v100 #13835: Pull request #6881 synchronize by loadams
December 19, 2024 20:32 5m 58s Quentin-Anthony:qanthony/fix-act-recomp
December 19, 2024 20:32 5m 58s