Hire the Best CUDA Developers
Alboraya, Spain
๐Top Rated for more than 10 years on Upwork | ๐ 70+ projects | ๐ 100% Job Success Score | ๐ 16 years experience | ๐Working with 3x enterprise companies and so many start-ups/individual clients Production-grade computer vision systems for real-world environments โ edge deployment, system integration, and measurable business outcomes. I graduated as an Electrical Engineer M.Sc. (Embedded Software Developer faculty). I am an expert in AI/Machine learning (object detection) and IoT/Edge Computing (Raspberry Pi and Jetson Orin NX/Nano, Deepstream 6.x, 7.x, Triton/TAO toolkit). I prefer long-term projects, but you can hire me for shorter ones also as a consultant. Strength: - Computer Vision/Machine learning Project architect/design - technical stack, cost of long-term usage, refine user requirement etc. - Python Machine learning projects (Tensorflow, Keras, PyTorch) - especially with computer vision, object detection (YOLO), image segmentation and pose tracking (OpenPose), shape recognition (dlib) and action recognition (X3D, mmaction etc.) - Develop/maintain/upgrade your existing Deepstream/Triton based application - Embedded image processing solutions (Edge computing): computer vision on Jetson Orin (NX/Nano) with Deepstream7.0 (earlier version too 6.x/7.x, Triton, TAO toolkit) or on Raspberry Pi, using Intel Movidius Neural Compute Stick (NCS) + OpenVINO. Train a model for specific purpose, provident dataset collection/preparation/preprocessing task management, I specialized for object detection (up to YOLOv12) - AWS, GCP (Google Cloud Platform) VM inference and training setup and execution/deployment - NVIDIA GPU experience: Tesla T4, L4, A100, H100 GPU for training and inference/server production. - Home automation/Smart home - lighting, building engineering, heating/cooling, Home assistant: integrate Amazon Alexa, Phillips Hue, Sonos, Apple TV, Google Nest, custom solutions - Raspberry Pi (Linux - Raspbian) - programming and teaching in Python (RS232, RS485, TCP communication), Wi-Fi settings, Home assistant Services: - Short (30-60 minutes) consultation - AI Architect tasks: responsible for the architecture of AI solutions, which includes planning, implementing, and managing AI technologies within an organization or project - I can speak English and Hungarian (magyar) and a little bit Spanish (castellano, espanol) AI Computer Vision | MLOPS | Edge Computing | AI Automation | AI strategy | AI consultant
- AI Consulting
- AI Development
- Rapid Prototyping
- AI Model Integration
- Computer Vision
- NVIDIA Jetson
- NVIDIA Triton
- AI Model Development
- Machine Learning
- Raspberry Pi
- Python
- OpenCV
- Automation
- ML Automation
- Prototype
Incheon, South Korea
I design, optimize, and integrate real-time object detection and tracking pipelines for NVIDIA Jetson, RK3588, and cloud environments โ with measurable gains in FPS, latency, and deployment stability. From YOLO training to TensorRT optimization and production integration, I build Computer Vision systems that work reliably on real hardware. If you need more than just a trained model โ if you need a working AI system integrated into hardware or software โ I deliver complete, production-ready solutions. ๐ฏ WHAT I DELIVER โข End-to-End Deep Learning Pipelines Model architecture โ dataset optimization โ training โ evaluation โ deployment โข Real-Time Object Detection & Multi-Object Tracking Optimized YOLO pipelines with stable tracking (DeepSORT / ByteTrack / BOT-SORT) โข โก Edge AI Acceleration & Performance Optimization TensorRT conversion, CUDA acceleration, latency reduction, memory tuning, FPS improvements โข ๐ AI Integration into Production Systems Jetson deployment, inference APIs, embedded integration, debugging, monitoring & system optimization ๐ PROVEN EXPERIENCE โข 5โ Upwork reviews for NVIDIA Jetson Nano, AGX & Orin deployments โข Deployed Frigate and real-time CV pipelines on Jetson Orin NX โข Optimized inference performance using TensorRT for improved real-time execution โข Installed and configured LLM environments with secure key management & usage tracking โข Research & production AI deployment experience at HBrain ๐ TECH STACK Deep Learning: PyTorch, CNN architectures, YOLO variants Computer Vision: OpenCV, real-time video processing, detection & tracking Optimization: TensorRT, CUDA Systems: Linux, Embedded Systems, NVIDIA Jetson Languages: Python, C++ If you share your hardware target, dataset sample, or performance goal, I can propose a clear technical architecture and realistic delivery plan.
- Computer Vision
- Artificial Intelligence
- Edge AI
- Object Detection & Tracking
- NVIDIA Jetson
- YOLO
- Deep Learning
- Image Segmentation
- Anomaly Detection
- Python
- AI Model Integration
Gandhinagar, India
I help startups and AI companies build high-performance AI systems, GPU-accelerated applications, and scalable AI infrastructure using CUDA, Rust, and Python. My expertise focuses on performance engineering, AI infrastructure, parallel computing, and low-latency systems designed for production-scale workloads. I work on optimizing compute-heavy applications, improving inference performance, and building reliable backend systems for modern AI products. I specialize in: โข CUDA & GPU Optimization โข AI Inference Optimization โข Rust-based High Performance Systems โข AI Infrastructure Engineering โข Parallel Computing โข Low-Latency Backend Systems โข Python-based AI & Automation Systems โข LLM Infrastructure โข RAG Architecture & AI Search โข Scalable API & System Architecture I can help with: โ GPU acceleration and CUDA optimization โ AI inference performance tuning โ Rust backend systems for high-performance workloads โ AI infrastructure and deployment pipelines โ Memory and compute optimization โ Parallel processing systems โ AI search and RAG pipelines โ Python automation and backend development โ Scalable distributed systems โ Production-ready AI architecture Tech Stack & Tools: CUDA, Rust, Python, PyTorch, TensorRT, ONNX, Docker, FastAPI, PostgreSQL, Vector Databases, REST APIs, WebSockets, AWS, Azure, and cloud-native architectures. I focus on building systems that are: * performant * scalable * efficient * reliable * production-ready If you are building GPU-intensive applications, AI infrastructure, inference systems, or performance-critical software, Iโd be happy to discuss your project. Send me a message, and letโs discuss how we can make it happen!
- CUDA
- Generative AI
- AI Agent Development
- Chatbot Development
- GPU
- Python
- MERN Stack
- Mobile App Development
- Web Application Development
- AI Chatbot
- Vector Database
- DevOps
- AWS Development
- LLM Prompt Engineering
Surat, India
Most AI developers can build a prototype. Very few can optimize and deploy multimodal LLMs on Jetson Orin Nano at real-time speeds, architect enterprise-grade RAG systems across 3,000+ SQL tables, fine-tune state-of-the-art vision-language models, or serve high-throughput inference with vLLM and SGLang at production scale. Thatโs what I do. Iโm a Senior AI/ML Engineer specializing in: โข Production LLM Systems & Agentic AI โข Edge AI & High-Performance Model Optimization โข Computer Vision & Real-Time Multimodal AI โข Distributed Inference & AI Infrastructure I donโt just wrap APIs I engineer AI systems from the model architecture and optimization layer all the way to scalable production deployment. What makes my work different: โ I build multi-agent AI systems that feel invisible to users. Designed a dual-LLM tutoring architecture where a speech-to-speech AI tutor interacts live with students while a hidden orchestration LLM performs real-time prompt injection, memory routing, retrieval planning, and response control completely transparently. โ I optimize and deploy AI where most teams fail. Fine-tuned PaliGemma, converted it to ONNX, applied INT8/FP16 quantization and TensorRT optimization, reducing model size by 65% with <2% accuracy loss, then deployed it on Jetson Orin Nano, Raspberry Pi, edge devices, and mobile hardware for real-time inference with extremely limited compute. โ I build enterprise RAG systems that reason not just retrieve. Architected a multi-stage multilingual RAG pipeline using HyDE, recursive retrieval, schema-aware chunking, metadata filtering, hybrid search, reranking, graph-based retrieval, chain-of-table reasoning, and long-context orchestration across 3,000+ enterprise SQL tables and 150+ warehouses. โ I engineer high-performance LLM inference infrastructure. Built scalable inference pipelines using vLLM, SGLang, ONNX Runtime, TensorRT-LLM, DeepSpeed, FlashAttention, speculative decoding, paged attention, KV-cache optimization, and distributed GPU serving for low-latency, high-concurrency AI systems. โ I deliver video AI and lip sync systems beyond standard open-source baselines. Fine-tuned LatentSync on a 4,000-sample dataset, outperforming Wav2Lip, GAN-Wav2Lip, and Wav2LipHD baselines in multilingual real-time video translation with highly coherent visual speech synchronization. โ I design AI systems for real-world production environments. Built AI backends with FastAPI, asyncio, WebSockets, Redis queues, Kafka streaming, GPU worker orchestration, autoscaling inference services, and Kubernetes-based deployments capable of handling real-time concurrent workloads. What I build for clients: โข Production-grade RAG systems with hybrid, recursive, graph, and agentic retrieval โข High-throughput LLM serving using vLLM, SGLang, TensorRT-LLM, Triton Inference Server โข Edge AI deployment with ONNX, TensorRT, OpenVINO, CoreML, TensorFlow Lite, ONNX Runtime โข Fine-tuning pipelines (LoRA, QLoRA, PEFT, RLHF, DPO) with evaluation and benchmarking โข AI agents with LangGraph, LangChain, LlamaIndex, CrewAI, AutoGen, MCP architectures โข Multimodal AI systems combining vision, audio, speech, and language models โข Text-to-SQL systems for complex enterprise schemas with multilingual querying โข Computer Vision pipelines YOLOv8, YOLO11, SAM2, GroundingDINO, DETR, Mask R-CNN โข Real-time speech AI Whisper, XTTS, RVC, voice cloning, speech-to-speech pipelines โข Video AI lip sync, face reenactment, multilingual dubbing, avatar systems โข Distributed AI infrastructure with Docker, Kubernetes, Ray, Celery, Redis, Kafka โข AI observability, evaluation, tracing, and monitoring with LangSmith, MLflow, Weights & Biases โข Vector databases and retrieval systems FAISS, Pinecone, ChromaDB, Milvus, Qdrant, Weaviate โข AI surveillance and geospatial intelligence systems with real-time threat detection Core Stack: Python ยท PyTorch ยท TensorFlow ยทย HuggingFace ยท Transformers ยท LangChain ยท LangGraph ยท LlamaIndex ยท OpenAI ยท Anthropic ยท Gemini ยท FastAPI ยท vLLM ยท SGLang ยท TensorRT-LLM ยท Triton ยท ONNX Runtime ยท TensorRT ยท OpenVINO ยท TensorFlow Lite ยท CUDA ยท OpenCV ยท MediaPipe ยท Whisper ยท YOLO ยท SAM ยท GroundingDINO ยท FAISS ยท Pinecone ยทย Qdrant ยท MongoDB ยท PostgreSQL ยท Redis ยท Kafka ยทย Docker ยท Kubernetes ยท GCP ยท Vertex AI ยท AWS ยท Azure ยท WebSockets If you need an AI engineer who can go deeper than most teams whether that means deploying optimized multimodal AI on constrained edge hardware, architecting enterprise RAG systems that minimize hallucinations, building scalable LLM inference infrastructure, or shipping production-grade multi-agent AI send me a message. Iโll tell you honestly whatโs possible, whatโs overhyped, and exactly how Iโd build it.
- MLOps
- Web Development
- Python
- Deep Learning
- AI Development
- Machine Learning
- Model Deployment
- Retrieval Augmented Generation
- Build Automation
Longjumeau, France
Many computer vision models work in a notebook but fail in production. I build OCR, detection, and tracking systems that actually run in real environments so your team can automate workflows and extract real business value from visual data. I work with companies and startups who need robust AI pipelines for document automation, traffic monitoring, retail analytics, or custom visual AI applications. If you want a low-budget experiment with no clear success criteria, Iโm probably not the best fit. ๐ ๐๐ผ๐ ๐ ๐ช๐ผ๐ฟ๐ธ ๐ญ. ๐๐ป-๐๐ฒ๐ฝ๐๐ต ๐๐ถ๐๐ฐ๐ผ๐๐ฒ๐ฟ๐ I begin each project by understanding your goals, constraints, and success metrics to ensure the solution meets your unique needs. ๐ฎ. ๐ ๐ผ๐ฑ๐๐น๐ฎ๐ฟ ๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ-๐ฆ๐ผ๐น๐๐ถ๐ป๐ด I break complex tasks into smaller parts and test multiple approaches to find the most effective solution. This ensures reliable results you can trust. ๐ฏ. ๐๐ฒ๐๐ฒ๐ฟ๐ฎ๐ด๐ถ๐ป๐ด ๐๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ๐ฑ & ๐๐๐๐๐ผ๐บ ๐ ๐ฒ๐๐ต๐ผ๐ฑ๐ My toolkit includes all major computer vision tasks: Classification, detection, tracking, and segmentation. I combine off-the-shelf models with custom-built methods to achieve top-notch performance. ๐ฐ. ๐๐น๐ฒ๐ฎ๐ฟ, ๐๐ฟ๐ฒ๐พ๐๐ฒ๐ป๐ ๐๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป I keep you updated at every step, provide realistic timelines, and immediately address any hurdles. Even if youโre not technical, Iโll explain everything in plain language so you always know where your project stands. ๐ฑ. ๐๐น๐๐ฎ๐๐ ๐ฃ๐๐๐๐ถ๐ป๐ด ๐ฌ๐ผ๐ ๐๐ถ๐ฟ๐๐ My clients consistently give me 5-star ratings and glowing feedback. If your requirements stretch beyond my skill set, Iโll be transparent and let you know right away. ๐๐ฑ๐๐ฎ๐ป๐ฐ๐ฒ๐ฑ ๐๐ผ๐บ๐ฝ๐๐๐ฒ๐ฟ ๐ฉ๐ถ๐๐ถ๐ผ๐ป & ๐ฏ๐ ๐ฃ๐ฒ๐ฟ๐ฐ๐ฒ๐ฝ๐๐ถ๐ผ๐ป For projects involving robotics, autonomous systems, or spatial analytics, I also build 3D perception pipelines using LiDAR, stereo cameras, and point clouds. This includes 3D object detection, Birdโs Eye View (BEV) transformations, and point cloud processing using deep learning. ๐ฅ๐ฒ๐ฐ๐ฒ๐ป๐ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ ๐ญ. ๐ ๐๐น๐๐ถ-๐๐ฃ๐ ๐ข๐๐ฅ ๐ฃ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ: Combined Google Vision, OpenAI, and AWS Rekognition to increase document extraction accuracy across noisy images. ๐ฎ. ๐ฆ๐ฐ๐ฎ๐ป๐ป๐ฒ๐ฑ ๐๐ผ๐ฐ๐๐บ๐ฒ๐ป๐ ๐๐ผ ๐๐ ๐ฐ๐ฒ๐น: Parsed key fields using Python + QwenVL and auto-generated Excel reports. ๐ฏ. ๐๐ถ๐ฐ๐ฒ๐ป๐๐ฒ ๐ฃ๐น๐ฎ๐๐ฒ ๐๐ฒ๐๐ฒ๐ฐ๐๐ถ๐ผ๐ป: End-to-end system with YOLOv8 trained on custom dataset and deployed for inference ๐ฐ. ๐ฅ๐ฒ๐ฎ๐น-๐ง๐ถ๐บ๐ฒ ๐๐ถ๐๐๐ฅ ๐ข๐ฏ๐ท๐ฒ๐ฐ๐ ๐๐ฒ๐๐ฒ๐ฐ๐๐ถ๐ผ๐ป & ๐ง๐ฟ๐ฎ๐ฐ๐ธ๐ถ๐ป๐ด Built a 3D detection and tracking pipeline using LiDAR and camera data with 3D bounding boxes, frame-to-frame association, and 2D/3D visualizations for autonomous navigation. ๐ฑ. ๐๐ฒ๐ฒ๐ฝ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ผ๐ป ๐ฃ๐ผ๐ถ๐ป๐ ๐๐น๐ผ๐๐ฑ๐ Trained PointNet and voxel-based 3D CNNs on ShapeNet Core for point cloud segmentation and classification, including full preprocessing and model visualization. ๐ง๐ฒ๐ฐ๐ต ๐ฆ๐๐ฎ๐ฐ๐ธ I work with Python, PyTorch, TensorFlow, OpenCV, YOLO, Open3D, and modern vision APIs (Google, AWS, OpenAI) to build detection, tracking, and OCR systems. ๐ช๐ต๐ฎ๐ ๐๐น๐ถ๐ฒ๐ป๐๐ ๐ฆ๐ฎ๐ "Yacine is reliable, very good at his job, and very informative. He was able to set up a POC, identify the main pitfalls, and propose solutions independently." "Yacine is committed to provide high quality work. He knows what he's doing. It's a pleasure to work together. I recommend him for data mining and vision work." "Yacine always does a great job on any computer vision related task, he delivered the project very quickly. I will definitely rehire him again whenever needed." ๐ฌ Letโs Talk Send a message describing your computer vision problem and the data youโre working with. If itโs a good fit, weโll discuss the next steps.
- CUDA
- Computer Vision
- Object Detection
- Object Detection & Tracking
- OCR Algorithm
- Deep Learning
- Python
- PyTorch
- Image Segmentation
- OpenCV
- TensorFlow
- Image Processing
- Image Recognition
- Machine Learning
- Image Classification
Tbilisi, Georgia
Experience in: Speech to Text Conversion Text to Speech Synthesis Natural Language Processing (NLP) Computer Vision and Neural Networks CUDA, PyCuda I'm a Machine Learning Engineer passionate about innovation and eager to work in Japan's technology sector. With expertise in various areas such as speech processing and GPU computing, I am looking for opportunities to contribute my skills in the Japanese market. I would also appreciate assistance with the work visa process for potential relocation. Feel free to reach out for collaboration or inquiries.
- CUDA
- Python
- Docker
- Deep Learning
- PyTorch
- Multimodal Large Language Model
- Large Language Model
- Computer Vision
- C
- Retrieval Augmented Generation
- Vector Database
- Dask
How it works
Post a job for free Post a job
Tell us what you need. Create your own job post or generate one with AI then filter talent matches.
Hire top talent fast
Consult, interview, and hire quickly, so you can meet the freelancers you're excited about.
Collaborate easily
Use Upwork to chat or video call, share files, and track project progress right from the app.
Payment simplified
Manage payments in one place with flexible billing options. Only pay for approved work, hourly or by milestone.
Don't just take our word for it
โUpwork provides an umbrella-level of security. I can see a talentโs work history and ratings. I can hold payments in escrow. I can communicate through Upwork Messages instead of working through my email address.โ
Kim Darling
Emerald Tiger
โUpwork is the best platform to hire skilled professionals when we're not looking for a full-time employee. All the companies in our portfolio use Upwork to find talent across a wide range of fields.โ
David Merry
Kinetic Investments
โOur very specific requirements can be a challengeโWith Upwork, weโre able to access a bigger community to ensure the success of our projects.โ
Katja Krohn
Summa Linguae
At A Glance: CUDA
Graphics processing units have been utilized by the top programmers and designers for creating detailed applications that improve user experience. For businesses, GPU assists in a range of needs, such as market analysis, data processing, ad creation and placement, and much more. Harnessing this complex power is possible with CUDA โ which stands for Compute Unified Device Architecture โ because it allows experts of every level to interact with and utilize the many advantages of GPUs with less effort and confusion. Maybe you donโt have the time to spare from your many duties, or perhaps you are not fluent in the necessary skills required to use CUDA. Thereโs no need to miss out on its advantages; the CUDA specialists on Upwork are here to ensure your business thrives, offering additional benefits with their competitive rates, diverse skill sets, and flexible hours.
The freelancers on Upwork boast of many yearsโ experience in a diverse range of jobs, which has allowed them to gain knowledge and become proficient in a wide range of specialties and talents. This enables them to apply their exclusive expertise to any project and provide unique insights and suggestions, as well as solutions to bugs and technical issues. These experts also have great familiarity in the online workplace, which allows them to collaborate remotely with other teams or work independently with little supervision. With thousands of experts to choose from on Upwork, youโre sure to find a professional who boasts of the unique experience, education level, and work ethic you need for your project.
Find more freelancers
Similar CUDA Developer Skills
- Groq Developers
- MATLAB Developers
- OpenCL Developers
- MATLAB Experts
- Julia Developers
- Numpy Professionals
- ChatGPT Developers
- CUDA Consultants
- fastText Specialists
- QML Developers
- AI Developers
- Deep Learning Experts
- Reinforcement Learning Specialists
- Bayesian Statistics Developers
- Machine Learning Engineers
- TensorFlow Specialists
Top Countries for CUDA Developers
- CUDA Developers in Egypt
- CUDA Developers in India
- CUDA Developers in Pakistan
- CUDA Developers in Bangladesh
- OpenCV Developers in India
- OpenCV Developers in Pakistan
- Deep Learning Experts in France
- Deep Learning Experts in Georgia
- Deep Learning Experts in South Korea
- Deep Learning Experts in Israel
- Deep Learning Experts in Tunisia
- Deep Learning Experts in Ukraine
- Deep Learning Experts in Germany
- Deep Learning Experts in Algeria
- Deep Learning Experts in Egypt
- Deep Learning Experts in Armenia