Smart Tech Information: From Concept to Coding


YOLOv5: Object Detection Made Easy with PyTorch Hub

thumb image

This article is an introductory tutorial where we download the pre-trained YOLOv5 from PyTorch Hub and perform object detection on sample images. YOLO (You Look Only Once) is a well-known name for object detection. Joseph Redmon from the University of Washington developed the original versions (YOLOv1, v2, and v3) and he stopped working on them. However, other people continue the development of YOLO after that. Alexey Bochkovskiy developed YOLOv4 by forking and improving Darknet from Joseph Redmon’s GitHub. Independently, Ultralytics developed YOLOv5 using PyTorch, which we use in this article.

1. Python Environment Setup🔝

We’ll use venv to set up a Python environment as below. Alternatively, we can follow the instruction from PyTorch Hub.

# Create a project folder and move there
mkdir yolov5
cd yolov5

# Create and activate a Python environment using venv
python3 -m venv venv
source venv/bin/activate

# We should always upgrade pip as it's usually old version
# that has older information about libraries
pip install --upgrade pip

# We install PyTorch and related libraries
pip install torch torchvision matplotlib

# Install the required libraries for YOLOv5
pip install -qr

We are ready to download a pre-trained YOLOv5 to perform object detection on sample images.

2. Object Detection with YOLOv5 🔝

Let’s download the smallest version of pre-trained YOLOv5. All we need to do is execute the following one line to download the model from PyTorch Hub.

import torch

# Download YOLOv5 from PyTorch Hub
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

The source code of the model will be stored under the folder ~/.cache/torch/hub/ultralytics_yolov5_master/ so that the next time torch.hub.load will fetch the model locally which is fast. If there is a problem due to an outdated model version in the cache folder, we may need to remove it manually. The pre-trained model weights are stored under our project folder which is the yolov5 folder we created in the beginning.

Let’s use a sample image from the Ultralytics GitHub. We can directly specify the image URL:

# Sample Image URL
image_url = BASE_URL + 'zidane.jpg'

If you have images locally on your computer, you can simply specify the path to your images.

We create a batch of images (albeit only one image entry) and feed it to the model.

# A batch of images (only one entry here)
imgs = [image_url]

# Inference
results = model(imgs)

We can show the result with one line:

# Display the results

It detected two person objects and two tie objects.

We can save it with one line:

# Save the results

It creates folders and saves the resulting image: runs/detect/exp/zidane.jpg. If we run the experiment again, it’ll add a new folder exp2 and save the image there (the result will be the same if we are using the same input image).

3. Non-Maximum Suppression (NMS)🔝

We can also print results to see which classes were detected along with other information:

# Print the results

The printed results are as follows:

image 1/1: 720x1280 2 persons, 2 ties
Speed: 317.8ms pre-process, 200.2ms inference, 10.8ms NMS per image at shape (1, 3, 384, 640)

We can confirm that it detected two person objects and two tie objects. It has the breakdown of where the model spent how much time (milliseconds):

  • Pre-processing Time: 317.8 ms
  • Inference (Model) Time: 200.2 ms
  • Post-processing (NMS) Time: 10.8ms

Note: the above timing values don’t include the time spent on generating result images. If there are multiple images, the above time values will be average for each case.

The pre-processing is for converting an image from Numpy to PyTorch and normalizing the pixel values from 0-255 to 0.0 – 1.0. The inference time is the time spent within the model. The post-processing is the time spent by NMS (Non-Maximum Suppression). As the model predicts many bounding boxes with different positions, sizes, and confidence levels, and they are overlapping with each other, NMS chooses the ones that the model feels most confident in and eliminates others overlapping with them. The below shows an example that NMS eliminates overlapping bounding boxes to leave the most confident prediction.


4. Bounding Boxes🔝

We can print the bounding box values for the first image in the batch (we only have one image):


xyxy means bounding box values are x-axis and y-axis values for left-top and x-axis and y-axis values for right-bottom. The model detected four bounding boxes so we have four rows printed:

tensor([[7.43290e+02, 4.83436e+01, 1.14176e+03, 7.20000e+02, 8.79861e-01, 0.00000e+00],
        [4.41990e+02, 4.37337e+02, 4.96585e+02, 7.10036e+02, 6.75119e-01, 2.70000e+01],
        [1.23051e+02, 1.93238e+02, 7.14691e+02, 7.19771e+02, 6.66693e-01, 0.00000e+00],
        [9.78990e+02, 3.13579e+02, 1.02530e+03, 4.15526e+02, 2.61517e-01, 2.70000e+01]])

There are six numbers per row.

  • The first four values are for xyxy
  • The fifth value is the confidence score
  • The sixth value is the predicted object class

For better readability, I rounded the values as follows:

[743,  48, 1142, 720, 88%,  0], #  0 - person
[442, 437,  497, 710, 68%, 27], # 27 - tie
[123, 193,  716, 720, 67%,  0], #  0 - person
[979, 314, 1025, 416, 26%, 27]  # 27 - tie

We can compare the above confidence scores and detected classes in the resulting image to confirm they match:


YOLO can detect 80 object classes. We can print all supported classes as follows:

# Supported classes

Below is the output:

['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

We can confirm model.names[0] is person, and model.names[27] is tie.

5. Source Code🔝

The below source code performs everything we discussed with two images in a batch.

import torch

# Download YOLOv5 from PyTorch Hub
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Supported classes
print("Supported classes")

# Sample Image URL
FILE_NAMES = ['zidane.jpg', 'bus.jpg']

# A batch of images
imgs = [BASE_URL + file_name for file_name in FILE_NAMES]

# Inference
results = model(imgs)

# Display the results

# Save the results

# Print the results

# Print the first bounding box values
print("Bounding box")
for i, file_name in enumerate(FILE_NAMES):

Below is the detection result for the second image.


6. References🔝

6.1. YOLOv5 Transfer Learning In Simple Steps Without Losing Your Mind🔝

6.2. You Only Look Once: Unified, Real-Time Object Detection🔝

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

6.3. YOLO9000: Better, Faster, Stronger🔝

Joseph Redmon, Ali Farhadi

6.4. YOLOv3: An Incremental Improvement🔝

Joseph Redmon, Ali Farhadi

6.5. YOLOv4: Optimal Speed and Accuracy of Object Detection🔝

Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao

6.6. YOLOv5🔝


  1. […] – Smart Tech Information… YOLOv5: Object Detection Made Easy with PyTorch Hub – KiKaBeN This article is an introductory tutorial where we download the pre-trained YOLOv5 from PyTorch Hub […]

Leave a Reply!

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: