Senior AI/ML Engineer (Computer Vision, Multimodal, LLM, PyTorch
Worldwide
Senior AI and ML Engineer for Computer Vision, Multimodal, LLM, and PyTorch About the role We are an AI product company building a multimodal perception engine that fuses video, image, voice, and text into one conversational, uncertainty-aware pipeline. We are looking for a senior AI and ML engineer to own the research and development of the core models across computer vision, conversational NLP, and multimodal fusion. This is hands-on research and development. You will design, build, fine-tune, and quantitatively validate the models that power the engine. What you will do - Build the multimodal fusion pipeline that combines video, image, voice, and text. - Develop the computer vision components, including segmentation, recognition, and portion or 3D estimation. - Build the conversational, uncertainty-aware disambiguation layer using LLM and Transformer fine-tuning. - Adapt open multimodal foundation models through transfer learning, rather than training from scratch. - Define and run quantitative validation against measurable accuracy targets. Critical skills, must have - Strong Python and deep learning with PyTorch. - Computer vision for image and video, including segmentation, detection, and recognition. - Multimodal AI and vision-language models that fuse multiple modalities into one model. - NLP and LLM fine-tuning with Transformers and Hugging Face. - Experience adapting and fine-tuning open foundation models through transfer learning. Good to have - Video analysis and temporal models, 3D reconstruction, or monocular depth and volume estimation. - ASR and speech-to-text, for example Whisper. - Uncertainty quantification and model calibration. - Applied statistics and data science, including time-series correlation and multiple-comparison control. - MLOps, including experiment tracking, model serving, and cloud GPU on AWS or GCP. Requirements - Residency in an EU country is required. - A PhD in AI, computer vision, machine learning, or a closely related field is a big plus. - Able to work independently and own the model research and development end-to-end. To apply Briefly describe one multimodal or computer vision project you led, including the problem, the models and frameworks you used, and your specific role. Links to a portfolio, GitHub, or papers are welcome.
- More than 30 hrs/weekHourly
- 1-3 monthsDuration
- ExpertExperience Level
$30.00
-
$60.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:15 to 20
- Last viewed by client:last week
- Interviewing:10
- Invites sent:30
- Unanswered invites:5
About the client
- CyprusPaphos3:18 AM
- $16K total spent16 hires, 1 active
- 401 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by