Hire the Best CUDA Developers

More than 3,000 reviews on G2

4.5/5

of Upwork by G2 peer reviewers

Hire freelancers

Izzatullokh M.

Incheon, South Korea

$20/hr

5.0

21 jobs

I design, optimize, and integrate real-time object detection and tracking pipelines for NVIDIA Jetson, RK3588, and cloud environments — with measurable gains in FPS, latency, and deployment stability. From YOLO training to TensorRT optimization and production integration, I build Computer Vision systems that work reliably on real hardware. If you need more than just a trained model — if you need a working AI system integrated into hardware or software — I deliver complete, production-ready solutions. 🎯 WHAT I DELIVER • End-to-End Deep Learning Pipelines Model architecture → dataset optimization → training → evaluation → deployment • Real-Time Object Detection & Multi-Object Tracking Optimized YOLO pipelines with stable tracking (DeepSORT / ByteTrack / BOT-SORT) • ⚡ Edge AI Acceleration & Performance Optimization TensorRT conversion, CUDA acceleration, latency reduction, memory tuning, FPS improvements • 🔗 AI Integration into Production Systems Jetson deployment, inference APIs, embedded integration, debugging, monitoring & system optimization 🏆 PROVEN EXPERIENCE • 5★ Upwork reviews for NVIDIA Jetson Nano, AGX & Orin deployments • Deployed Frigate and real-time CV pipelines on Jetson Orin NX • Optimized inference performance using TensorRT for improved real-time execution • Installed and configured LLM environments with secure key management & usage tracking • Research & production AI deployment experience at HBrain 🛠 TECH STACK Deep Learning: PyTorch, CNN architectures, YOLO variants Computer Vision: OpenCV, real-time video processing, detection & tracking Optimization: TensorRT, CUDA Systems: Linux, Embedded Systems, NVIDIA Jetson Languages: Python, C++ If you share your hardware target, dataset sample, or performance goal, I can propose a clear technical architecture and realistic delivery plan.

Computer Vision
Artificial Intelligence
Edge AI
Object Detection & Tracking
NVIDIA Jetson
YOLO
Deep Learning
Image Segmentation
Anomaly Detection
Python
AI Model Integration

László B.

Alboraya, Spain

$100/hr

4.9

91 jobs

🏆Top Rated for more than 10 years on Upwork | 🏆 70+ projects | 🏆 100% Job Success Score | 🏆 16 years experience | 🏆Working with 3x enterprise companies and so many start-ups/individual clients Production-grade computer vision systems for real-world environments — edge deployment, system integration, and measurable business outcomes. I graduated as an Electrical Engineer M.Sc. (Embedded Software Developer faculty). I am an expert in AI/Machine learning (object detection) and IoT/Edge Computing (Raspberry Pi and Jetson Orin NX/Nano, Deepstream 6.x, 7.x, Triton/TAO toolkit). I prefer long-term projects, but you can hire me for shorter ones also as a consultant. Strength: - Computer Vision/Machine learning Project architect/design - technical stack, cost of long-term usage, refine user requirement etc. - Python Machine learning projects (Tensorflow, Keras, PyTorch) - especially with computer vision, object detection (YOLO), image segmentation and pose tracking (OpenPose), shape recognition (dlib) and action recognition (X3D, mmaction etc.) - Develop/maintain/upgrade your existing Deepstream/Triton based application - Embedded image processing solutions (Edge computing): computer vision on Jetson Orin (NX/Nano) with Deepstream7.0 (earlier version too 6.x/7.x, Triton, TAO toolkit) or on Raspberry Pi, using Intel Movidius Neural Compute Stick (NCS) + OpenVINO. Train a model for specific purpose, provident dataset collection/preparation/preprocessing task management, I specialized for object detection (up to YOLOv12) - AWS, GCP (Google Cloud Platform) VM inference and training setup and execution/deployment - NVIDIA GPU experience: Tesla T4, L4, A100, H100 GPU for training and inference/server production. - Home automation/Smart home - lighting, building engineering, heating/cooling, Home assistant: integrate Amazon Alexa, Phillips Hue, Sonos, Apple TV, Google Nest, custom solutions - Raspberry Pi (Linux - Raspbian) - programming and teaching in Python (RS232, RS485, TCP communication), Wi-Fi settings, Home assistant Services: - Short (30-60 minutes) consultation - AI Architect tasks: responsible for the architecture of AI solutions, which includes planning, implementing, and managing AI technologies within an organization or project - I can speak English and Hungarian (magyar) and a little bit Spanish (castellano, espanol) AI Computer Vision | MLOPS | Edge Computing | AI Automation | AI strategy | AI consultant

AI Consulting
AI Development
Rapid Prototyping
AI Model Integration
Computer Vision
NVIDIA Jetson
NVIDIA Triton
AI Model Development
Machine Learning
Raspberry Pi
Python
OpenCV
Automation
ML Automation
Prototype

Jay D.

Gandhinagar, India

$35/hr

5.0

12 jobs

I help startups and AI companies build high-performance AI systems, GPU-accelerated applications, and scalable AI infrastructure using CUDA, Rust, and Python. My expertise focuses on performance engineering, AI infrastructure, parallel computing, and low-latency systems designed for production-scale workloads. I work on optimizing compute-heavy applications, improving inference performance, and building reliable backend systems for modern AI products. I specialize in: • CUDA & GPU Optimization • AI Inference Optimization • Rust-based High Performance Systems • AI Infrastructure Engineering • Parallel Computing • Low-Latency Backend Systems • Python-based AI & Automation Systems • LLM Infrastructure • RAG Architecture & AI Search • Scalable API & System Architecture I can help with: ✅ GPU acceleration and CUDA optimization ✅ AI inference performance tuning ✅ Rust backend systems for high-performance workloads ✅ AI infrastructure and deployment pipelines ✅ Memory and compute optimization ✅ Parallel processing systems ✅ AI search and RAG pipelines ✅ Python automation and backend development ✅ Scalable distributed systems ✅ Production-ready AI architecture Tech Stack & Tools: CUDA, Rust, Python, PyTorch, TensorRT, ONNX, Docker, FastAPI, PostgreSQL, Vector Databases, REST APIs, WebSockets, AWS, Azure, and cloud-native architectures. I focus on building systems that are: * performant * scalable * efficient * reliable * production-ready If you are building GPU-intensive applications, AI infrastructure, inference systems, or performance-critical software, I’d be happy to discuss your project. Send me a message, and let’s discuss how we can make it happen!

CUDA
Generative AI
AI Agent Development
Chatbot Development
GPU
Python
MERN Stack
Mobile App Development
Web Application Development
AI Chatbot
Vector Database
DevOps
AWS Development
LLM Prompt Engineering

HIT K.

Surat, India

$5/hr

5.0

4 jobs

Most AI developers can build a prototype. Very few can optimize and deploy multimodal LLMs on Jetson Orin Nano at real-time speeds, architect enterprise-grade RAG systems across 3,000+ SQL tables, fine-tune state-of-the-art vision-language models, or serve high-throughput inference with vLLM and SGLang at production scale. That’s what I do. I’m a Senior AI/ML Engineer specializing in: • Production LLM Systems & Agentic AI • Edge AI & High-Performance Model Optimization • Computer Vision & Real-Time Multimodal AI • Distributed Inference & AI Infrastructure I don’t just wrap APIs I engineer AI systems from the model architecture and optimization layer all the way to scalable production deployment. What makes my work different: → I build multi-agent AI systems that feel invisible to users. Designed a dual-LLM tutoring architecture where a speech-to-speech AI tutor interacts live with students while a hidden orchestration LLM performs real-time prompt injection, memory routing, retrieval planning, and response control completely transparently. → I optimize and deploy AI where most teams fail. Fine-tuned PaliGemma, converted it to ONNX, applied INT8/FP16 quantization and TensorRT optimization, reducing model size by 65% with <2% accuracy loss, then deployed it on Jetson Orin Nano, Raspberry Pi, edge devices, and mobile hardware for real-time inference with extremely limited compute. → I build enterprise RAG systems that reason not just retrieve. Architected a multi-stage multilingual RAG pipeline using HyDE, recursive retrieval, schema-aware chunking, metadata filtering, hybrid search, reranking, graph-based retrieval, chain-of-table reasoning, and long-context orchestration across 3,000+ enterprise SQL tables and 150+ warehouses. → I engineer high-performance LLM inference infrastructure. Built scalable inference pipelines using vLLM, SGLang, ONNX Runtime, TensorRT-LLM, DeepSpeed, FlashAttention, speculative decoding, paged attention, KV-cache optimization, and distributed GPU serving for low-latency, high-concurrency AI systems. → I deliver video AI and lip sync systems beyond standard open-source baselines. Fine-tuned LatentSync on a 4,000-sample dataset, outperforming Wav2Lip, GAN-Wav2Lip, and Wav2LipHD baselines in multilingual real-time video translation with highly coherent visual speech synchronization. → I design AI systems for real-world production environments. Built AI backends with FastAPI, asyncio, WebSockets, Redis queues, Kafka streaming, GPU worker orchestration, autoscaling inference services, and Kubernetes-based deployments capable of handling real-time concurrent workloads. What I build for clients: • Production-grade RAG systems with hybrid, recursive, graph, and agentic retrieval • High-throughput LLM serving using vLLM, SGLang, TensorRT-LLM, Triton Inference Server • Edge AI deployment with ONNX, TensorRT, OpenVINO, CoreML, TensorFlow Lite, ONNX Runtime • Fine-tuning pipelines (LoRA, QLoRA, PEFT, RLHF, DPO) with evaluation and benchmarking • AI agents with LangGraph, LangChain, LlamaIndex, CrewAI, AutoGen, MCP architectures • Multimodal AI systems combining vision, audio, speech, and language models • Text-to-SQL systems for complex enterprise schemas with multilingual querying • Computer Vision pipelines YOLOv8, YOLO11, SAM2, GroundingDINO, DETR, Mask R-CNN • Real-time speech AI Whisper, XTTS, RVC, voice cloning, speech-to-speech pipelines • Video AI lip sync, face reenactment, multilingual dubbing, avatar systems • Distributed AI infrastructure with Docker, Kubernetes, Ray, Celery, Redis, Kafka • AI observability, evaluation, tracing, and monitoring with LangSmith, MLflow, Weights & Biases • Vector databases and retrieval systems FAISS, Pinecone, ChromaDB, Milvus, Qdrant, Weaviate • AI surveillance and geospatial intelligence systems with real-time threat detection Core Stack: Python · PyTorch · TensorFlow · HuggingFace · Transformers · LangChain · LangGraph · LlamaIndex · OpenAI · Anthropic · Gemini · FastAPI · vLLM · SGLang · TensorRT-LLM · Triton · ONNX Runtime · TensorRT · OpenVINO · TensorFlow Lite · CUDA · OpenCV · MediaPipe · Whisper · YOLO · SAM · GroundingDINO · FAISS · Pinecone · Qdrant · MongoDB · PostgreSQL · Redis · Kafka · Docker · Kubernetes · GCP · Vertex AI · AWS · Azure · WebSockets If you need an AI engineer who can go deeper than most teams whether that means deploying optimized multimodal AI on constrained edge hardware, architecting enterprise RAG systems that minimize hallucinations, building scalable LLM inference infrastructure, or shipping production-grade multi-agent AI send me a message. I’ll tell you honestly what’s possible, what’s overhyped, and exactly how I’d build it.

MLOps
Web Development
Python
Deep Learning
AI Development
Machine Learning
Model Deployment
Retrieval Augmented Generation
Build Automation

Andrii P.

Zaporizhzhia, Ukraine

$55/hr

5.0

46 jobs

Experienced in: 1. Computer vision • C++, Python, OpenCV, CUDA, Git, Linux, Qt, Boost, OpenGl, PCL, Strong math background, Neural Networks - road segmentation for unmanned vehicles (ENet, Caffe, OpenCV, C++, Linux) - car tracking (Yolo v3, OpenCV, C++, Linux) - wagon number identification (Yolo v4, Python) - implementation of real time 360°/perspective camera transformation on Cuda (C++, Cuda, OpenCV, Linux, Jetson Nano) - distance calculation to point on 2D camera frame (C++, OpenCV, Linux) - automate grading system for handwritten answer sheets (computer vision part, OpenCV, Java, Android Key stack: Linux, C++ (Qt), Python, Java, OpenCV, Yolo (darknet) 2. Machine Learning Machine learning research projects in the following domains: - person segmentation (ModNet, RVM, TDNet, UCTransNet, XMem etc.) - image inpainting (Pen-Net, Deepfillv2, Shift-Net, ViNet etc.) - image upscale (RDN, RRDN, Stable Diffusion, ISR etc.) - image relighting (Total Relighting, DPR, RelightNet etc.) - road segmentation for unmanned vehicles (ENet, Caffe, OpenCV, C++, Linux) - car tracking (Yolo v3, OpenCV, C++, Linux) - wagon number identification (Yolo v4, Python) - implementation of real time 360°/perspective camera transformation on Cuda (C++, Cuda, OpenCV, Linux, Jetson Nano) - distance calculation to point on 2D camera frame (C++, OpenCV, Linux) - automate grading system for handwritten answer sheets (computer vision part, OpenCV, Java - Android, IOS - Swift) Key stack: Linux, Python, Pytorch, Tensorflow, OpenCV, Pillow, Numpy, C++, CUDA, Darknet, SegNet 3. Robotics and Embedded development: - OCPP Protocol, Linux, Modbus, Raspberry Pi, CAN; -ROS Robot operating system; - Skilled in SLAM, localization, mapping - Experienced in path planning algorithms, obstacle avoidance, holonomic, and non-holonomic motion planning, trajectory planning for robotics arms; - Used to work with Bayesian/Kalman filters, and sensor fusion (LiDAR, IMU, Visual, Odometry, Radar, GPS). 4. Android development (Kotlin, Java, Android Studio, Eclipse, Firebase). Has expert colleagues in: • .Net Framework (C#, VB.Net, ASP.Net, .Net Core, WPF, UWP,WCF, ADO.Net) • Java (j2se, j2ee, servlets, java beans, Maven) • JavaScript (Node.JS, Express.js, Vue.js, Element.js, Angular.js, D3.js) • C++ (TCP/IP, HTTP, HTTPS, WebSocket, Modbus) • Python (MAVLink, WebSocket) • Step7 (S7 Communication, OCPP, Modbus, CANOpen, ProfiNet) Database • PostgreSQL • MySQL • MongoDB • Microsoft SQL • Oracle Database • Neo4J Software development for mobile platforms • Crossplatform React Native, Flutter, Xamarin, • Android (Kotlin, Java, Android Studio, Eclipse) • iOS (Objective C, Swift) Mobile apps development: • Crossplatform: Futter, React Native, Xamarin. • Android (Java, Kotlin) • iOS (Objective C, Swift) 1. Native Development - Kotlin/Java - Swift / Objective-C - iOS/macOS/tvOS/watchOS - Firebase, CloudKit, Coredata 2. Cross-Platform and Hybrid App Development - React Native/React - Flutter / Dart - Xamarin.iOS / Xamarin.Android / Xamarin.Forms tech stack ● Android Studio, Gradle, Kotlin DSL, KSP ● Kotlin, Java programming languages ● AndroidX, Android Jetpack libraries, Android Architecture Components ● Jetpack Compose ● Material Design Components ● Clean Architecture, SOLID design principles ● MVVM, MVI, GoF design patterns ● Modularization (multi-module projects) ● Kotlin Coroutines + Flow, RxJava, RxBinding ● REST API / Networking - OkHttp, Retrofit 2, Socket IO ● Room Database, SQLite, Datastore ● Kotlinx Serialization, Protobuf, Moshi, Gson ● Dependency Injection (Hilt, Dagger 2, Koin) ● Git ● Firebase Products, Google Cloud APIs, HMS Services ● Admob, Google Play Billing Library (in-app purchases), Samsung/Huawei IAP ● Unit / Instrumented (UI) tests ● Agile Scrum development methodology ● CI/CD (GitHub Actions) AR/VR: Vuforia ARkit/ARcore/AR Foundation Wikitude Oculus Integration OpenXR XR Interaction toolkit VR Walkthrough UltimateXR VR Interaction Framework Hardware expert: • Nvidia Jetson Nano, TX2, Xavier; • Raspberry Pi; • Arduino, STM32; • Depth cameras Intel Realsense d435i , Zed Sterelabs. • Lidars, Radars. Charging stations for the Electric Vehicles development software for the managing stations and networks (server and user applications): • C#, SQL, PostgreSQL, .Net Core, REST Api, WebSockets • OCPP Protocol, Linux, Modbus, Raspberry Pi Software development • .Net Framework (C#, VB.Net, ASP.Net, .Net Core, WPF, UWP,WCF, ADO.Net) • Java (j2se, j2ee, servlets, java beans, Maven) • JavaScript (Node.JS, Express.js, Vue.js, Element.js, Angular.js, D3.js) • C++ (TCP/IP, HTTP, HTTPS, WebSocket, Modbus) • Python (MAVLink, WebSocket) • Step7 (S7 Communication, OCPP, Modbus, CANOpen, ProfiNet) Database • PostgreSQL • MySQL • MongoDB • Microsoft SQL • Oracle Database • Neo4J

CUDA
Computer Vision
Machine Learning
Data Science
OpenCV
Deep Learning
YOLO
Object Detection
Neural Network
Android App Development
Mobile App Development
English
DNN
OCR Algorithm
Kotlin
Xamarin
Front-End Development
Desktop Application
.NET Framework

Aleksandr K.

Warsaw, Poland

$35/hr

4.8

44 jobs

Hello! My name is Aleksandr Kalinin. I have a double major in Math and CS. I have been programming professionally since 2006. Skills: C++ 11/14/17/20/23, STL, SIMD (SSE, AVX, AVX512), CMake, vcpkg, Conan, Boost, perf, VTune. Python, Data Science, NumPy, Pandas, Matplotlib. Rust, tokio. Algorithms, Data Structures, Computational Geometry, Optimization, Linear Algebra, Numerical Methods. Concurrent Programming, Multithreading, Lock-free Programming. CUDA, HIP, OpenCL, Compute Shaders, Halide, TBB, OneAPI. MATLAB, SageMath, Mathematica. Graphics Programming, Vulkan, Metal, DirectX 12, OpenGL, HLSL, GLSL, MSL, Slang, Real-time Rendering, Custom Engine Development. Unreal Engine 4/5, UE5 (3D, PCG, Geometry Scripting, Control Rig, Niagara, UMG, Slate, GAS, Chaos Cloth), Godot, Cocos2dx Machine Learning, Deep Learning, PyTorch, TensorFlow, Scikit-Learn, ONNX, TensorRT. Computer Vision, Image Processing, OpenCV, Roboflow, Triton, YOLO, YOLOv11, DETR, RT-DETR, Faster R-CNN, R-CNN, U-Net, DeepLabV3+, SAM, SAM2, SegFormer, ResNet, ConvNeXt, ViT, Swin. Optimization / Solvers: OR-Tools (CP-SAT), Z3, MiniSAT, SCIP, HiGHS, IPOPT, MiniZinc. Point Cloud Processing, Collision Detection, Physics Simulation. Maya API, Houdini API (HDK, VEX, HScript), Blender Python API, Pipeline Tools Development. Software Porting, Game Development. Qt, QML, PySide6, WxWidgets, VTK. iOS Development, Swift, Objective-C, UIKit, AVFoundation, Metal. Android Development, NDK. FFmpeg WebRTC, RTMP, RTSP, SRT, HLS, NDI, MPEG-DASH, CMAF, WebTransport, QUIC, UDP/RTP/RTCP, TCP, WebSockets, HTTP Live Streaming pipelines, Video Processing, Real-time Streaming Systems. JavaScript, Node.js, Three.js, Pixi.js. Docker, Kubernetes, AWS, EKS, CI/CD, n8n. PostgreSQL, Supabase. Windows, Mac, Linux. AI, Claude Code.

OpenCV
C++
Python
Machine Learning
Mathematica
Mathematics
MATLAB
Image Processing
DirectX
OpenGL
Microsoft Windows
Geometry
3D Graphics Framework
CMake
Computer Graphics

How it works

Post a job for free Post a job

Tell us what you need. Create your own job post or generate one with AI then filter talent matches.

Hire top talent fast

Consult, interview, and hire quickly, so you can meet the freelancers you're excited about.

Collaborate easily

Use Upwork to chat or video call, share files, and track project progress right from the app.

Payment simplified

Manage payments in one place with flexible billing options. Only pay for approved work, hourly or by milestone.

Don't just take our word for it

“Upwork provides an umbrella-level of security. I can see a talent’s work history and ratings. I can hold payments in escrow. I can communicate through Upwork Messages instead of working through my email address.”

Kim Darling

Emerald Tiger
“Upwork is the best platform to hire skilled professionals when we're not looking for a full-time employee. All the companies in our portfolio use Upwork to find talent across a wide range of fields.”

David Merry

Kinetic Investments
“Our very specific requirements can be a challenge—With Upwork, we’re able to access a bigger community to ensure the success of our projects.”

Katja Krohn

Summa Linguae

At A Glance: CUDA

Graphics processing units have been utilized by the top programmers and designers for creating detailed applications that improve user experience. For businesses, GPU assists in a range of needs, such as market analysis, data processing, ad creation and placement, and much more. Harnessing this complex power is possible with CUDA — which stands for Compute Unified Device Architecture — because it allows experts of every level to interact with and utilize the many advantages of GPUs with less effort and confusion. Maybe you don’t have the time to spare from your many duties, or perhaps you are not fluent in the necessary skills required to use CUDA. There’s no need to miss out on its advantages; the CUDA specialists on Upwork are here to ensure your business thrives, offering additional benefits with their competitive rates, diverse skill sets, and flexible hours.

The freelancers on Upwork boast of many years’ experience in a diverse range of jobs, which has allowed them to gain knowledge and become proficient in a wide range of specialties and talents. This enables them to apply their exclusive expertise to any project and provide unique insights and suggestions, as well as solutions to bugs and technical issues. These experts also have great familiarity in the online workplace, which allows them to collaborate remotely with other teams or work independently with little supervision. With thousands of experts to choose from on Upwork, you’re sure to find a professional who boasts of the unique experience, education level, and work ethic you need for your project.

Hire the Best CUDA Developers

More than 3,000 reviews on G2

How it works

Post a job for free Post a job

Hire top talent fast

Collaborate easily

Payment simplified

Don't just take our word for it

At A Glance: CUDA

Similar CUDA Developer Skills

Top Countries for CUDA Developers

Hire anyone,
anywhere.

Hire the Best CUDA Developers

More than 3,000 reviews on G2

How it works

Post a job for free Post a job

Hire top talent fast

Collaborate easily

Payment simplified

Don't just take our word for it

At A Glance: CUDA

Find more freelancers

Similar CUDA Developer Skills

Top Countries for CUDA Developers

Hire anyone,anywhere.

Hire anyone,
anywhere.