How to deploy YOLOv12 on NVIDIA Jetson devices? 👁️🧠

March 17, 2025

Deploy any YOLO model on NVIDIA Jetson devices using Ultralytics and Flask.

Introduction 📚

Video in spanish here 🇪🇸

In our last article, we delved into the features and capabilities of the latest YOLOv12. In this follow-up, we’re taking things a step further by deploying this model on the popular NVIDIA Jetson devices. Our aim is to tackle the critical question: is YOLOv12 truly a state-of-the-art model in real-world scenarios? 🤔

We’ll explore the intricacies of setting up YOLOv12 on a compact yet powerful platform like NVIDIA Jetson, analyze its performance, and compare it to its predecessor, YOLO11.

NVIDIA Jetson devices. Image taken from this link.

Deployment using Docker Over Virtualenvs 🐳

Why I don’t use virtualenvs for fast implementation? When working with NVIDIA Jetson devices, one of the first challenges you’ll encounter is setting up the proper environment for deep learning models like YOLO. These compact yet powerful edge computing devices run on ARM architecture, which creates significant compatibility issues with standard Python packages — particularly PyTorch.

In my case, I have a NVIDIA Jetson Xavier NX with 8GB of share RAM with comes with Jetpack 5.1 and just some version of PyTorch can be installed according to the compatibility matrix provided by NVIDIA Documentation.

The ARM Architecture Challenge 🔧

NVIDIA Jetson devices use ARM architecture, which is fundamentally different from the x86 architecture found in most desktop and server computers. This architectural difference means that not all Python packages are directly compatible, and installing frameworks like PyTorch becomes particularly challenging.

If you attempt the typical approach of creating a virtual environment and running pip install ultralytics, you’ll quickly run into problems. The installed version of PyTorch won’t be compatible with the Jetson GPU, resulting in models running on CPU only—dramatically slowing down inference.

# This won’t work properly on Jetson devices
pip install ultralytics # Will install incompatible PyTorch version

When working with a Jetson device, installing PyTorch outside a container presents several issues. For example, you will need to compile torchvision from scratch, which can take a lot of time and you might make some mistakes with proper torch version compatibility.

Docker: The Superior Solution 🚢

NVIDIA Jetpack comes with a preinstalled a full compatible version of docker, so this apporach is better a quicker for certain purposes. Using Docker containers provides several advantages:

Pre-configured environments : The Ultralytics team maintains Docker images specifically designed for different Jetpack versions
GPU compatibility : These containers come with PyTorch versions that are correctly compiled to leverage the Jetson’s GPU
Isolation : Your deployment remains isolated from the system, preventing conflicts
Reproducibility : The same container will work consistently across different Jetson devices with the same Jetpack version.

So after connecting to our Jetson device using ssh protocol, we need to identify which Docker image matches our Jetpack version and that’s easey if you follow this question. Then we must clone the repository we will use for this tutorial. For Jetpack 5.x, we would use:

git clone https://github.com/hdnh2006/ultralyticsAPI.git
t=ultralytics/ultralytics:latest-jetson-jetpack5 &&
sudo docker pull $t &&
sudo docker run -it --nerwork=host --ipc=host --runtime=nvidia -v "$(pwd)/ultralyticsAPI:/ultralyticsAPI" $t

Inside this container, PyTorch is already properly configured to work with the Jetson’s GPU, which you can verify by running a simple inference:

cd /ultralyticsAPI
yolo predict detect

You will see something similar to this:

root@JetsonXavierNX:/ultralyticsAPI yolo predict detect
WARNING ⚠️ 'source' argument is missing, using default 'source=/ultralytics/ultralytics/assets'.
Ultralytics 0.3.4 🚀 Python-3.8.10 CUDA:0 (Xavier, 6857MB)
YOLOv11 summary (fused): 180 layers, 2,616,240 parameters, 0 gradients, 6.5 GFLOPs

image 1/2 /ultralytics/ultralytics/assets/bus.jpg: 640x480 4 persons, 1 bus, 370.8ms
image 2/2 /ultralytics/ultralytics/assets/zidane.jpg: 384x640 2 persons, 1 tie, 389.7ms
Speed: 95.1ms preprocess, 380.2ms inference, 8.3ms postprocess per image at shape (1, 3, 384, 640)
Results saved to /ultralytics/runs/detect/predict2
💡 Learn more at https://docs.ultralytics.com/modes/predict

When the inference runs, you should see “CUDA” mentioned in the output log, confirming that the GPU is being utilized.

To use the Flask API, just run python3 predict_api.py --weights yolo12s.pt. Once you go to your browser in http://nvidiajetsonip:5000 you will see an interface like this:

Flask interface for ultralytics framework.

And voilà you can upload a video and you will see something like this:

Logs in NVIDIA Jetson Xavier NX for YOLOv12.

YOLO11 vs YOLOv12: Performance Comparison on NVIDIA Jetson 🔍

After successfully setting up our environment with Docker, it’s time to examine how YOLO11 and YOLOv12 perform on the NVIDIA Jetson Xavier NX. This comparison reveals some interesting insights about model efficiency that might contradict popular claims.

Initial Benchmark Results 📊

Testing both models with the same configuration, we observed:

YOLOv12-S (without optimization):

Inference time: ~55ms at best case
Frame rate: ~18 FPS
High variability in processing time (55–200ms)

YOLO11-S (without optimization):

Inference time: ~36ms at best case
Frame rate: ~27 FPS
More consistent processing times

This is particularly noteworthy since YOLOv12 was promoted as being faster than its predecessor. However, our real-world testing on the Jetson device showed YOLOv11 performing significantly better — approximately 9 FPS faster than YOLOv12 in the same conditions.

Doubling Your Inference Speed with TensorRT 🚀

After comparing the baseline performance of YOLO11 and YOLOv12, it became clear that optimization would be crucial for real-time applications on NVIDIA Jetson devices. This is where NVIDIA TensorRT comes into play — a powerful SDK designed specifically to optimize neural network models for NVIDIA hardware.

Understanding TensorRT Optimization 🧠

TensorRT is not just another library; it’s a high-performance neural network inference optimizer that can dramatically improve inference speeds. It works by:

Model pruning and weight precision calibration
Layer fusion to reduce computational overhead
Memory optimization for the specific GPU architecture
Kernel auto-tuning for the exact hardware you’re using

The key advantage of TensorRT is that it tailors your model to the exact GPU it will run on.
The downside? A model optimized for one device cannot be transferred to another — it’s specifically compiled for your hardware.

Exporting YOLO to TensorRT 🔄

Thankfully, Ultralytics has made the once-complex process of TensorRT conversion remarkably straightforward. Here’s how to export your YOLO model to TensorRT format:

yolo export format=engine imgsz=480,640 half=True simplify=True device=0 batch=1 model=yolo11s.pt

The parameters explained:

format=engine: Specifies TensorRT as the output format
imgsz=480,640: Sets the input image dimensions (height, width)
half=True: Enables FP16 precision for faster inference
simplify=True: Uses Microsoft’s ONNX simplification to streamline model architecture
device=0: Targets the GPU
batch=1: Optimizes for single image processing (ideal for real-time applications)
model=yolo11s.pt: The model to be optimized

This export process is computationally intensive, especially on Jetson devices. In our testing with a Jetson Xavier NX the total export time took ~34 minutes (2,057 seconds).

For larger models like YOLO-X variants, expect this process to take even longer. However, this one-time investment yields substantial performance gains.

The Remarkable Results 📈

After optimization with TensorRT, we tested YOLO11-S again, and the results were outstanding:

YOLOv11-S (before TensorRT):

~27 FPS
High variability in processing time (30–200ms)

1/0.036ms ~ 27 frames per second for yolo11s without TensorRT optimization

YOLOv11-S (with TensorRT):

~55 FPS
Consistent processing times under 20ms
Occasional spikes to 43ms, but overall much more stable

1/0.036ms ~ 27 frames per second for yolo11s with TensorRT optimization

This represents a 100% increase in performance with just a few lines of code!

Gif taken from this video: [https://youtu.be/m1WqJ-sEBM8](https://youtu.be/m1WqJ-sEBM8)

Conclusion ✨

Our journey of deploying YOLO models on NVIDIA Jetson devices has revealed important insights that challenge conventional wisdom. While YOLOv12 has been promoted as the new state-of-the-art model, our real-world testing showed that YOLO11 actually outperformed it on NVIDIA Jetson hardware — delivering approximately 27 FPS compared to YOLOv12’s 18 FPS. This highlights how published benchmarks often reflect performance on specific hardware configurations that may not translate to edge device deployments.

The most significant discovery was the transformative effect of TensorRT optimization, which doubled our inference speed from 27 FPS to 55 FPS while maintaining detection accuracy. This optimization step, combined with Docker containers to solve ARM architecture compatibility challenges, provides a clear pathway for practitioners looking to deploy computer vision models on edge devices. As YOLO architectures continue to evolve, successful edge deployment will require both theoretical understanding and practical engineering know-how, with rigorous testing on target hardware remaining essential for optimal performance.

Happy Coding! 💻🚀

Keywords: YOLOv12 detection, YOLO 12 architecture, YOLO v12 inference, YOLOv12 training, YOLOv12 performance, YOLOv12 installation, YOLOv12, YOLO v12, YOLO 12

Originally published on Medium on March 17, 2025.