Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why I use multi GPUs to train a model, then the training details log missed? YOLOv8多GPU训练会丢失日志吗? #13492

Closed
1 task done
BGMer7 opened this issue Jan 17, 2025 · 5 comments
Labels
detect Object Detection issues, PR's question Further information is requested

Comments

@BGMer7
Copy link

BGMer7 commented Jan 17, 2025

Search before asking

Question

I used YOLOv8 to train a model to ectract Facial Features, the mission is here: https://www.kaggle.com/datasets/osmankagankurnaz/facial-feature-extraction-dataset

Actually, it is not a fatel error, it's just a little question, but no YOLOv8 repo is found, so I posted here for help.
In kaggle, it provides 2 Tesla P4 GPU to accelarate, many people just use device='0' to make only one GPU in use, I make full use of these 2 GPUs, and it works. But the only difference and the problem is the training details logs are missing.

This is the code which uses only one GPU:

results = self.model.train(
            data=DATA_YAML,       # data.yaml file from Roboflow
            epochs=20,            # number of epochs
            imgsz=640,            # image size
            batch=32,             # batch size
            name='yolov8_custom', # folder name for training results
            device='0',           # '0' for GPU, 'cpu' for CPU
            patience=50,          # early stopping patience
            save=True,            # save best model
            pretrained=True,      # use pretrained weights
            plots=True,           # save training plots
            cache=True,           # enable caching
            verbose=True,         # Enable verbose logging
            workers=2
        )

and this is the code which uses two GPUs:

cuda.init()
        device_count = cuda.Device.count()
        device = ','.join(str(i) for i in range(device_count)) if device_count > 0 else 'cpu'

        # Initialize the wandb logging before training
        # wandb.init(project="yolo_training", config={"epochs": 20, "batch_size": 16})
        results = self.model.train(
            data=DATA_YAML,       # data.yaml file from Roboflow
            epochs=20,            # number of epochs
            imgsz=640,            # image size
            batch=32,             # batch size
            name='yolov8_custom', # folder name for training results
            device=device,        # '0' for GPU, 'cpu' for CPU
            patience=50,          # early stopping patience
            save=True,            # save best model
            pretrained=True,      # use pretrained weights
            plots=True,           # save training plots
            cache=True,           # enable caching
            verbose=True,         # Enable verbose logging
            workers=2
        )

一个GPU跑的话就是详细日志输出,每轮训练之后都会有指标输出,但是多个日志就变成只有最终模型有输出,中间的训练过程就没有日志了。

This is the training logs with only one GPU, the output is very detailed, and it printed the metrics after every epoch

Ultralytics 8.3.62 🚀 Python-3.10.12 torch-2.5.1+cu121 CUDA:0 (Tesla T4, 15095MiB)
engine/trainer: task=detect, mode=train, model=yolov8m.pt, data=/kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/data.yaml, epochs=20, time=None, patience=50, batch=32, imgsz=640, save=True, save_period=-1, cache=True, device=0, workers=8, project=None, name=yolov8_custom3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/yolov8_custom3
Overriding model.yaml nc=80 with nc=5

                   from  n    params  module                                       arguments                     
  0                  -1  1      1392  ultralytics.nn.modules.conv.Conv             [3, 48, 3, 2]                 
  1                  -1  1     41664  ultralytics.nn.modules.conv.Conv             [48, 96, 3, 2]                
  2                  -1  2    111360  ultralytics.nn.modules.block.C2f             [96, 96, 2, True]             
  3                  -1  1    166272  ultralytics.nn.modules.conv.Conv             [96, 192, 3, 2]               
  4                  -1  4    813312  ultralytics.nn.modules.block.C2f             [192, 192, 4, True]           
  5                  -1  1    664320  ultralytics.nn.modules.conv.Conv             [192, 384, 3, 2]              
  6                  -1  4   3248640  ultralytics.nn.modules.block.C2f             [384, 384, 4, True]           
  7                  -1  1   1991808  ultralytics.nn.modules.conv.Conv             [384, 576, 3, 2]              
  8                  -1  2   3985920  ultralytics.nn.modules.block.C2f             [576, 576, 2, True]           
  9                  -1  1    831168  ultralytics.nn.modules.block.SPPF            [576, 576, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  2   1993728  ultralytics.nn.modules.block.C2f             [960, 384, 2]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  2    517632  ultralytics.nn.modules.block.C2f             [576, 192, 2]                 
 16                  -1  1    332160  ultralytics.nn.modules.conv.Conv             [192, 192, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
...
TensorBoard: Start with 'tensorboard --logdir runs/detect/yolov8_custom3', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed ✅
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?c56fa883-3627-4114-bc78-1ed71e3193ef) or open in a [text editor](command:workbench.action.openLargeOutput?c56fa883-3627-4114-bc78-1ed71e3193ef). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
train: Scanning /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/train/labels... 457 images, 0 backgrounds, 0 corrupt: 100%|██████████| 457/457 [00:00<00:00, 672.14it/s]
train: WARNING ⚠️ Cache directory /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/train is not writeable, cache not saved.

WARNING ⚠️ cache='ram' may produce non-deterministic training results. Consider cache='disk' as a deterministic alternative if your disk space allows.
train: Caching images (0.5GB RAM): 100%|██████████| 457/457 [00:00<00:00, 613.71it/s]
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))
/usr/local/lib/python3.10/dist-packages/albumentations/__init__.py:24: UserWarning: A new version of Albumentations is available: 2.0.0 (you have 1.4.20). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
  check_for_updates()
val: Scanning /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid/labels... 126 images, 0 backgrounds, 0 corrupt: 100%|██████████| 126/126 [00:00<00:00, 345.11it/s]
val: WARNING ⚠️ Cache directory /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid is not writeable, cache not saved.

WARNING ⚠️ cache='ram' may produce non-deterministic training results. Consider cache='disk' as a deterministic alternative if your disk space allows.
val: Caching images (0.1GB RAM): 100%|██████████| 126/126 [00:00<00:00, 150.01it/s]
Plotting labels to runs/detect/yolov8_custom3/labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.001111, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005), 83 bias(decay=0.0)
TensorBoard: model graph visualization added ✅
Image sizes 640 train, 640 val
Using 4 dataloader workers
Logging results to runs/detect/yolov8_custom3
Starting training for 20 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/20      13.6G      1.772      3.051      1.781         61        640: 100%|██████████| 15/15 [00:14<00:00,  1.00it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:03<00:00,  1.65s/it]
                   all        126        685      0.919      0.912      0.942      0.672


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       2/20      13.6G     0.9436     0.8229      1.096         78        640: 100%|██████████| 15/15 [00:14<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.12s/it]
                   all        126        685      0.926      0.921      0.968      0.719


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       3/20      13.6G     0.9125     0.6598      1.077         80        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.09s/it]
                   all        126        685      0.819      0.856      0.875      0.618


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       4/20      13.6G      0.883     0.6061      1.074         91        640: 100%|██████████| 15/15 [00:16<00:00,  1.07s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.08s/it]
                   all        126        685      0.957       0.96      0.983      0.753


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       5/20      13.6G     0.8521     0.5585      1.061         63        640: 100%|██████████| 15/15 [00:15<00:00,  1.04s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.06s/it]
                   all        126        685       0.96      0.949      0.989      0.759


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       6/20      13.2G     0.8453     0.5445       1.05        101        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.03s/it]
                   all        126        685      0.931       0.97      0.986      0.754


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       7/20      13.6G     0.8498     0.5344      1.058         64        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.06s/it]
                   all        126        685      0.921      0.963      0.986      0.745


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       8/20      13.2G     0.8276     0.5133      1.056         47        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
                   all        126        685       0.96      0.967      0.991      0.775


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       9/20      13.6G     0.8036     0.4939      1.027        106        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
                   all        126        685      0.964      0.955      0.984      0.772


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      10/20      13.1G     0.7984     0.4807      1.033         60        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.05it/s]
                   all        126        685      0.983      0.977      0.993      0.801

Closing dataloader mosaic
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01, num_output_channels=3, method='weighted_average'), CLAHE(p=0.01, clip_limit=(1.0, 4.0), tile_grid_size=(8, 8))

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      11/20      13.5G      0.709     0.4234     0.9871         50        640: 100%|██████████| 15/15 [00:16<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
                   all        126        685      0.962      0.981      0.987      0.798


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      12/20      13.2G     0.7006     0.4037     0.9814         45        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.03it/s]
                   all        126        685      0.968      0.986      0.986       0.79


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      13/20      13.6G     0.6802     0.3952     0.9669         51        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
                   all        126        685      0.984      0.991      0.993      0.801


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      14/20      13.1G     0.6713      0.388      0.959         50        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.06it/s]
                   all        126        685      0.988      0.984      0.994      0.812


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      15/20      13.6G     0.6565     0.3693     0.9504         47        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.04s/it]
                   all        126        685      0.989      0.995      0.995      0.826


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      16/20      13.2G     0.6381     0.3588     0.9381         48        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.03it/s]
                   all        126        685      0.983      0.998      0.995      0.836


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      17/20      13.5G     0.6197     0.3422     0.9381         45        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.05s/it]
                   all        126        685       0.99      0.996      0.995      0.832


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      18/20      13.1G     0.6092     0.3295     0.9337         45        640: 100%|██████████| 15/15 [00:15<00:00,  1.03s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.05it/s]
                   all        126        685      0.993      0.997      0.994       0.84


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      19/20      13.1G     0.5917     0.3167     0.9202         45        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.06it/s]
                   all        126        685      0.996      0.998      0.995      0.844


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
      20/20      13.2G      0.569     0.3065     0.9168         51        640: 100%|██████████| 15/15 [00:15<00:00,  1.02s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:01<00:00,  1.05it/s]
                   all        126        685      0.997      0.998      0.995      0.847


20 epochs completed in 0.107 hours.
Optimizer stripped from runs/detect/yolov8_custom3/weights/last.pt, 52.0MB
Optimizer stripped from runs/detect/yolov8_custom3/weights/best.pt, 52.0MB

Validating runs/detect/yolov8_custom3/weights/best.pt...
Ultralytics 8.3.62 🚀 Python-3.10.12 torch-2.5.1+cu121 CUDA:0 (Tesla T4, 15095MiB)
Model summary (fused): 218 layers, 25,842,655 parameters, 0 gradients, 78.7 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:02<00:00,  1.34s/it]
                   all        126        685      0.997      0.998      0.995      0.848
                   eye        126        138          1      0.996      0.995      0.839
               eyebrow        126        144      0.993      0.999      0.994      0.796
                   lip        126        129      0.998          1      0.995      0.881
        mustache-beard        126        146      0.993      0.998      0.995      0.844
                  nose        126        128          1      0.997      0.995      0.879
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
Speed: 0.2ms preprocess, 12.0ms inference, 0.0ms loss, 1.1ms postprocess per image
Results saved to runs/detect/yolov8_custom3
Ultralytics 8.3.62 🚀 Python-3.10.12 torch-2.5.1+cu121 CUDA:0 (Tesla T4, 15095MiB)
Model summary (fused): 218 layers, 25,842,655 parameters, 0 gradients, 78.7 GFLOPs
val: Scanning /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid/labels... 126 images, 0 backgrounds, 0 corrupt: 100%|██████████| 126/126 [00:00<00:00, 604.81it/s]
val: WARNING ⚠️ Cache directory /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid is not writeable, cache not saved.

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 8/8 [00:04<00:00,  1.73it/s]
                   all        126        685      0.989      0.995      0.995      0.855
                   eye        126        138      0.996      0.993      0.995      0.822
               eyebrow        126        144      0.973      0.994      0.994      0.817
                   lip        126        129      0.988          1      0.995      0.884
        mustache-beard        126        146      0.986      0.992      0.995      0.869
                  nose        126        128          1      0.996      0.995      0.882
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
Speed: 0.5ms preprocess, 24.1ms inference, 0.0ms loss, 2.6ms postprocess per image
Results saved to runs/detect/val3

✨ Training completed!
💾 Best model saved at: runs/detect/yolov8_custom/weights/best.pt

📊 Model Performance Summary:
==================================================
Precision: 0.989
Recall: 0.995
mAP50: 0.995
mAP50-95: 0.855
==================================================

But when chect to 2 GPUs, the output only contains the result of the final model

Ultralytics 8.3.62 🚀 Python-3.10.12 torch-2.5.1+cu121 CUDA:0 (Tesla T4, 15095MiB)
                                                       CUDA:1 (Tesla T4, 15095MiB)
engine/trainer: task=detect, mode=train, model=yolov8m.pt, data=/kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/data.yaml, epochs=20, time=None, patience=50, batch=32, imgsz=640, save=True, save_period=-1, cache=True, device=0,1, workers=32, project=None, name=yolov8_custom4, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/yolov8_custom4
Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
100%|██████████| 755k/755k [00:00<00:00, 16.9MB/s]
Overriding model.yaml nc=80 with nc=5

                   from  n    params  module                                       arguments                     
  0                  -1  1      1392  ultralytics.nn.modules.conv.Conv             [3, 48, 3, 2]                 
  1                  -1  1     41664  ultralytics.nn.modules.conv.Conv             [48, 96, 3, 2]                
  2                  -1  2    111360  ultralytics.nn.modules.block.C2f             [96, 96, 2, True]             
  3                  -1  1    166272  ultralytics.nn.modules.conv.Conv             [96, 192, 3, 2]               
  4                  -1  4    813312  ultralytics.nn.modules.block.C2f             [192, 192, 4, True]           
  5                  -1  1    664320  ultralytics.nn.modules.conv.Conv             [192, 384, 3, 2]              
  6                  -1  4   3248640  ultralytics.nn.modules.block.C2f             [384, 384, 4, True]           
  7                  -1  1   1991808  ultralytics.nn.modules.conv.Conv             [384, 576, 3, 2]              
  8                  -1  2   3985920  ultralytics.nn.modules.block.C2f             [576, 576, 2, True]           
  9                  -1  1    831168  ultralytics.nn.modules.block.SPPF            [576, 576, 5]                 
 10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  2   1993728  ultralytics.nn.modules.block.C2f             [960, 384, 2]                 
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  2    517632  ultralytics.nn.modules.block.C2f             [576, 192, 2]                 
 16                  -1  1    332160  ultralytics.nn.modules.conv.Conv             [192, 192, 3, 2]              
 17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  2   1846272  ultralytics.nn.modules.block.C2f             [576, 384, 2]                 
 19                  -1  1   1327872  ultralytics.nn.modules.conv.Conv             [384, 384, 3, 2]              
 20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  2   4207104  ultralytics.nn.modules.block.C2f             [960, 576, 2]                 
 22        [15, 18, 21]  1   3778591  ultralytics.nn.modules.head.Detect           [5, [192, 384, 576]]          
Model summary: 295 layers, 25,859,215 parameters, 25,859,199 gradients, 79.1 GFLOPs

Transferred 469/475 items from pretrained weights
DDP: debug command /usr/bin/python3 -m torch.distributed.run --nproc_per_node 2 --master_port 50831 /root/.config/Ultralytics/DDP/_temp_4t1q3l3o139581780929232.py
Ultralytics 8.3.62 🚀 Python-3.10.12 torch-2.5.1+cu121 CUDA:0 (Tesla T4, 15095MiB)
Model summary (fused): 218 layers, 25,842,655 parameters, 0 gradients, 78.7 GFLOPs
val: Scanning /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid/labels... 126 images, 0 backgrounds, 0 corrupt: 100%|██████████| 126/126 [00:00<00:00, 747.79it/s]
val: WARNING ⚠️ Cache directory /kaggle/input/facial-feature-extraction-dataset/Facial Feature Extraction Dataset/valid is not writeable, cache not saved.

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 8/8 [00:04<00:00,  1.99it/s]
                   all        126        685      0.989      0.995      0.995      0.855
                   eye        126        138      0.996      0.993      0.995      0.822
               eyebrow        126        144      0.973      0.994      0.994      0.817
                   lip        126        129      0.988          1      0.995      0.884
        mustache-beard        126        146      0.986      0.992      0.995      0.869
                  nose        126        128          1      0.996      0.995      0.882
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
/usr/local/lib/python3.10/dist-packages/matplotlib/colors.py:721: RuntimeWarning: invalid value encountered in less
  xa[xa < 0] = -1
Speed: 0.2ms preprocess, 22.9ms inference, 0.0ms loss, 3.8ms postprocess per image
Results saved to runs/detect/val4

✨ Training completed!
💾 Best model saved at: runs/detect/yolov8_custom/weights/best.pt

📊 Model Performance Summary:
==================================================
Precision: 0.989
Recall: 0.995
mAP50: 0.995
mAP50-95: 0.855
==================================================

Has anyone met this before?
And I find it seems when I use only one GPU, Api invokes tensorflow, when 2 GPUs, it invokes pytorch.

Additional

No response

@BGMer7 BGMer7 added the question Further information is requested label Jan 17, 2025
@UltralyticsAssistant UltralyticsAssistant added the detect Object Detection issues, PR's label Jan 17, 2025
@UltralyticsAssistant
Copy link
Member

👋 Hello @BGMer7, thank you for your interest in YOLOv5 🚀!

For your question regarding multi-GPU training with YOLOv8, it's important to note that this repository is specifically for YOLOv5. However, since YOLOv8 shares some similarities, we might be able to assist you here!

If this is a 🐛 Bug Report, could you please provide a minimum reproducible example (MRE), including all necessary code and steps to replicate the issue? This will help us investigate the behavior further.

For custom training ❓ Questions, sharing additional details, such as setup specifics, logs, or configurations, would assist in diagnosing the issue. It might also be worth checking if your multi-GPU setup affects logging by trying configurations like adjusting verbosity or distributed training options.

Requirements

YOLOv5 and YOLOv8 require Python>=3.8.0 with all required dependencies installed. Ensure your environment is configured correctly and up to date.

Environments

Both YOLOv5 and YOLOv8 can be run in various environments, such as local setups, cloud-based GPUs, or Docker images. Verify your current setup matches recommended configurations, including ensuring that all GPUs are properly initialized and recognized.

Status

If you are receiving training logs when using a single GPU but missing them for a multi-GPU setup, it is possible that output redirection or distributed training settings are affecting the logs. When training on multiple GPUs, frameworks like PyTorch may modify how and where logs are written.

This is an automated response 🛠️, but rest assured, an Ultralytics engineer will review your issue and provide further assistance soon. Let us know if you can provide any additional information in the meantime that might help clarify this behavior! 😊

@BGMer7
Copy link
Author

BGMer7 commented Jan 17, 2025

Here I attach some more details here,
I executed this command to let environment detect 2 gpus,

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"

!source ~/.bashrc

otherwise, when using torch.cuda.device_count it will return 1.
After executing this command, the result will be 2.

@pderrenger
Copy link
Member

@BGMer7 thank you for providing additional details. Setting CUDA_VISIBLE_DEVICES is a valid method to specify which GPUs are visible to your script. However, missing detailed logs during multi-GPU training is likely due to the use of torch.distributed.run, which handles Distributed Data Parallel (DDP) training. Logs are often only output from the main process. To ensure you capture detailed logs, you can:

  1. Run the training with the --verbose flag to enhance logging.
  2. Check if the logs are aggregated in the runs/ directory or in the wandb/TensorBoard integrations if enabled.
  3. Use the latest version of the repository to ensure any potential issues with logging in DDP mode are resolved.

If the issue persists, please confirm your DDP setup and training script alignment with Ultralytics' multi-GPU training guide.

@BGMer7
Copy link
Author

BGMer7 commented Jan 18, 2025

@pderrenger Thanks for your reply!
I have been always keeping verbose added.
This is my training code:

def train_model(self):
        """
        Initialize and train YOLOv8 model with specified parameters
        Returns:
            training results
        """
        cuda.init()
        device_count = cuda.Device.count()
        device = ','.join(str(i) for i in range(device_count)) if device_count > 0 else 'cpu'

        # Initialize the wandb logging before training
        # wandb.init(project="yolo_training", config={"epochs": 20, "batch_size": 16})
        results = self.model.train(
            data=DATA_YAML,       # data.yaml file from Roboflow
            epochs=20,            # number of epochs
            imgsz=640,            # image size
            batch=32,             # batch size
            name='yolov8_custom', # folder name for training results
            device=device,        # '0' for GPU, 'cpu' for CPU
            patience=50,          # early stopping patience
            save=True,            # save best model
            pretrained=True,      # use pretrained weights
            plots=True,           # save training plots
            cache=True,           # enable caching
            verbose=True,         # Enable verbose logging
            workers=32
        )
        
        return results

As for wandb/TensorBoard, I tried but this need to login, so I gave up.

I didn't use torch.distributed.run, but I tried the code below:

!pip show accelerate
!pip install git+https://github.com/huggingface/accelerate

But later I found my code can run without it, so I deleted this.

Thanks for your remind, I found the logs in the final result page in kaggle though the output during running didn't contain this part.

And this is the whole code in kaggle:
https://www.kaggle.com/code/caijinyang/facial-feature-extraction-yolo/log

Thanks! :)

@BGMer7 BGMer7 closed this as completed Jan 18, 2025
@pderrenger
Copy link
Member

It seems your training process is functioning as intended, with logs available in the Kaggle results page, even if they are not displayed during runtime. This behavior is typical when using multi-GPU training, as certain logging outputs may only appear in the final results or the primary process. To ensure consistent logging:

  1. If using torch.distributed, ensure the primary process handles verbose output, as secondary processes often suppress detailed logs.
  2. For real-time monitoring, tools like wandb or TensorBoard are ideal, though they require login/authentication. If logging into these services is a barrier, you might explore saving custom logs locally during training.

For further refinement, you can explore the official Multi-GPU Training Guide to ensure optimal usage of resources. Let us know if you need more support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detect Object Detection issues, PR's question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants