Yolo V5 Object Detection Model on Flo Edge One

Overview of Yolo V5 object detection model

Yolo V5 is a popular object detection model used for image and video analysis. It is capable of detecting multiple objects in an image simultaneously and has been trained on various datasets to identify different objects.

 Description of the Coco database

The Yolo V5 model is trained on the Common Objects in Context (Coco) database, which contains more than 330,000 images with over 2.5 million object instances labeled with 80 different object categories. This database is widely used for object detection models and provides a diverse range of objects for the model to recognize.

Explanation of the model’s capabilities and limitations

The Yolo V5 model can detect objects such as people, bicycles, cars, trucks, and other common objects found in daily life. However, the accuracy of the model depends on the threshold set for the output. There can be false positives in the output, but a higher threshold can improve accuracy at the cost of detection speed.

Performance of the model

 

Inference time and FPS on Flo Edge One GPU

The Yolo V5 model performs well on the Flo Edge One GPU, with an inference time of around 47 milliseconds and a frames per second (FPS) rate of 20-21. For most object detection models, this is considered a good output.

object detection yolo v5

Comparison with more powerful systems

Even when compared to more powerful systems such as gaming laptops with Nvidia Geforce GTX 1650, the Yolo V5 model still performs well, with an FPS rate of around 40. This demonstrates the impressive performance of the model on a lighter system like the Flo Edge One.

Discussion of accuracy and false positives

The detection rate of the Yolo V5 model depends on the level of threshold set for the output. The higher the threshold, the more accurate the output, but it can also lead to more false negatives. It is essential to balance accuracy and speed for each use case.

How the model works

Input format and output parameters

The Yolo V5 model takes frame by frame from each video as input in the format float 32. The output of the model is an array with four parameters, including class, which is the label that corresponds to the object detected, and score, which represents the model’s confidence about the detection.

Post-processing and rendering of detections

Based on the output value of the tensors, the detections are rendered onto the image given as input. Once all post-processing is done, the output can be seen on the screen.

Visual examples of model output

The Yolo V5 model’s output can be seen visually, as it is rendered onto the image being analyzed. The model is capable of detecting multiple objects in an image simultaneously and highlighting them with a bounding box.

 

Use case 1: Car and truck detection

Description of data set

For the first use case, the Yolo V5 model was trained on a dataset of a street with cars and trucks moving around.

Model performance on car and truck detection

The Yolo V5 model was able to detect cars and trucks accurately on this dataset. However, there were some false positives due to the threshold set for the output. A higher threshold could reduce false positives, but it would also increase the likelihood of false negatives.

Limitations of the model on non-standard car images

Although the Yolo V5 model can detect cars and trucks accurately, it may not be 100% accurate on non-standard car images. This limitation can be mitigated by training the model on a more diverse dataset that includes a variety of car types

Scroll to Top