
It wasn’t until Girshick et al.’s follow-up 2015 paper, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, that R-CNNs became a true end-to-end deep learning object detector by removing the Selective Search requirement and instead relying on a Region Proposal Network (RPN) that is (1) fully convolutional and (2) can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image). The Fast R-CNN algorithm made considerable improvements to the original R-CNN, namely increasing accuracy and reducing the time it took to perform a forward pass however, the model still relied on an external region proposal algorithm. published a second paper in 2015, entitled Fast R- CNN. The problem with the standard R-CNN method was that it was painfully slow and not a complete end-to-end object detector.

R-CNN and their variants, including the original R-CNN, Fast R- CNN, and Faster R-CNN.When it comes to deep learning-based object detection, there are three primary object detectors you’ll encounter:

We’ll use YOLO with OpenCV in this blog post. Figure 1: A simplified illustration of the YOLO object detector pipeline ( source).
