Scaling AI Models for Better Work Outcomes

For decades, organizations around the globe have relied on Upwork to find the best talent to tackle their toughest business problems. Recently, the new age of advanced machine learning and AI has unlocked a better way to get work done, harnessing the very best in terms of both talent and technology. For businesses on Upwork, AI can power a more streamlined and intuitive experience so hiring managers can find the solutions and people they need faster and get to final work products more effectively. For freelancers, AI innovations help them increase their productivity, deliver high-quality business outcomes, and ultimately earn more on the platform.
This year, we introduced Uma, Upwork’s Mindful AI, to underpin our entire platform and to increasingly serve as a conversational Upwork companion to our customers. We’re committed to developing AI responsibly and “Mindful AI” on Upwork means AI experiences designed and developed in alignment with our AI principles. Uma powers a number of key experiences in our hiring and matching processes that are critical to clients and freelancers discovering each other, getting started, and completing more work together. We continue to train Uma to serve as a constant, intelligent companion for businesses and freelancers, helping them every step of the way across the entire Upwork experience.
Our AI and machine learning team, internally known as Umami, is not only building Uma to serve a wide breadth of customers, but also to serve a wide range of very specific but high-impact use cases across our work marketplace including marketplace-specific functions like discovery, search, and match.
Designing AI models for our marketplace is a challenging technology problem. We must serve a large and growing customer base that expects AI solutions today, but also need to provide language models that address specific customer needs with precision to ensure we’re creating real value for the freelancers and clients on Upwork.
The foundation behind effective AI and a productive Uma product is a combination of high-quality data and a multi-AI model approach that prioritizes customization. Let’s go under the hood.
Understanding data, the backbone of AI
Large language models (LLMs) are only as strong as the data on which they’ve been trained. The quality, diversity, and volume of data that a model is trained on directly influences its ability to generate accurate, contextually aware, and nuanced responses. The companies that will win in the age of AI will have the highest-quality data as their moat.
Data not only has to be high-quality, but in the age of LLMs, needs to be reflective of the rich interactions customers can be expected to have with these models. Models trained on richer, more varied, and more complex conversations will outperform those trained on lower-quality snippets scraped from random corners of the internet. Not only that, but web-based data tends to be simple and non-conversational, which often makes it inadequate for powering LLMs meant to deal with complex customer issues. Careful curation of high-quality data will always beat high-volume automated data collection when training LLMs for the real world.
At Upwork, we are in an advantageous position to drive successful work outcomes with our AI not only because we are developing novel synthetic data generation algorithms (techniques used to create artificial data that mimics real-world data) to power our LLMs at scale, but also because of our access to both a tremendous cache of rich historical signals collected on our platform and to top-caliber creative writing talent to help create gold-standard LLM data from scratch. Here are a few of the types of datasets that we use to train our models:
- Platform Data: Upwork has been around in various forms for over 20 years. That’s 20+ years of internal data—trillions of tokens of highly relevant interactions and moments across a range of work-specific engagements that we can feed into our models. As a dual-sided marketplace with both freelancers and businesses, we leverage insights from productive and successful work interactions on both sides of the equation, everything from freelancer proposals that won work with Fortune 500 companies to job posts that received high engagement from top-rated freelancers to freelancer profiles of Expert-Vetted talent. The freelancers on Upwork also span an increasingly large and diverse set of work categories from chemistry to screenwriting to software engineering, representing an unmatched breadth of subject matters and data points that inform our models. Specific data points from work interactions that had successful outcomes can be used to train our AI models to suggest solutions and advice to our customers that correlate with effective performance and solutions. *We’re committed to developing and using AI responsibly to help our customers reach their full potential and only use platform data in accordance with our Privacy Policy and customers’ user settings.
- Synthetic Data: Our real data then becomes the fuel for our synthetic data machine. We use proprietary algorithms developed internally by our AI R&D experts to easily generate tens of thousands of natural and accurate conversations that are anchored in our historical data and cover a diverse set of possible situations.
- Human-generated Data: Importantly, we also have unique access to top talent on the Upwork platform that we work with to create high-quality data to train our models on. We can hire top-rated screenwriters or copywriters to create full scripts of what an ideal conversation between a customer and Uma would look like for various work scenarios, like hiring a freelancer for building a website or for project management of a go-to-market campaign. This route provides increased control over the data that goes into our models so we can guarantee that our models are seeing novel and relevant scenarios that represent accurate interactions that freelancers and businesses might want to have with Uma as they progress through their Upwork journey.
Many large language models lack precision because they are trained on broad, general data that may not be fully accurate, relevant, or up-to-date. Data quality can be low if models pull from diverse sources like websites and social media that potentially contain errors. All modern LLMs generate responses based on dataset pattern matching rather than logical reasoning and therefore, increased control over data, training, and evaluation pipelines is crucial. These considerations motivate our very specific approach to curating the data that powers our Upwork models.
Our multi-AI model approach
Uma is the AI that powers workflows across our platform. Under the hood, Uma is made up of various LLMs that serve different purposes across Upwork. We’re moving towards a multiple use case-specific model structure in order to deliver outsized outcomes across specific business use cases.
In the simplest form, you can think of AI models in two buckets – standard AI workflows that leverage massive pretrained LLMs like GPT-4 or Claude and custom AI workflows that leverage smaller, but more precise datasets to create use case-specific LLMs. There are pros and cons to both and currently we leverage both approaches, augmented by our extensive marketplace knowledge and newly formed bench of deep AI expertise, to curate the best experience for our customers.
Standard AI Workflows
As LLMs gained steam last year and our customers expected AI solutions, we knew we had to move quickly and partnered with OpenAI to begin harnessing AI to tackle key customer challenges, starting with the difficulty of creating effective job posts for complex projects. This initiative led to the launch of Job Post Generator, Upwork’s first OpenAI-based product. Leveraging GPT-3.5, it streamlines job post creation, reducing time to posting by 80% for our clients. Building on this success, we introduced additional unique customer experiences leveraging OpenAI, augmented by Upwork’s extensive marketplace data set, like Upwork Chat Pro, a general-purpose work app powered by GPT-4o that offers tailored interactions to help freelancers tackle challenging and repetitive tasks faster, boost productivity, increase their earning potential, and improve work quality.
In these scenarios, we saw the benefit of leveraging a pretrained model from OpenAI for quick setup and testing, knowing we’d be able to fine-tune these existing models with our data to provide a more tailored experience. Using pretrained models offers significantly reduced resource requirements, as training LLMs from scratch requires massive computational power and extensive datasets. They also are highly effective at a wide range of tasks because they are trained on a broad and diverse dataset, offering great generalization across domains. Having said that, fine-tuning is an important next step enabling task specialization in an efficient manner. We’re able to optimize the model’s performance and produce more accurate and relevant results that are concentrated on the problems that freelancers and hiring decision makers face on Upwork, while saving on time and resources up front.
Custom Uma Workflows
As we continue to develop, improve, and continually train our AI models for better outcomes, we’re moving towards a customized specific model structure or use case-specific LLMs. Rather than having one model handle all use cases, we are creating separate smaller language models, each trained on a smaller but more specific data set to power a specific use case that we’ve identified as critical for our platform. Two models that we have started supporting with this approach include one that helps freelancers create better proposals and a separate one that helps clients on Upwork select the right candidates for projects.
The benefits of this approach include higher accuracy, full debuggability and tunability, and more customization overall. And although the setup and technical complexity of this approach is higher, Upwork’s newly formed Umami AI and machine learning organization staffed by industry-leading engineers and researchers is more than up to the task.
Below is an example comparing the output of Uma when powered by a customized specific model versus a standard LLM in helping a client determine the next steps in finding a web developer for a pet store business. The custom model is built on an open source model like Llama3.1 that allows for more flexibility relative to other pretrained LLMs and a heavy level of customizable fine-tuning and architectural modification.

Uma does a few specific things well that make it more useful to our customers. First, it asks more coherent questions. It takes time to understand what the user is looking for and asks specific follow-up questions to get to the root of the problem so it can best provide an answer and guidance. It also carries a longer conversation, gathering more information from the user so it can better supply guidance on the specific task. Secondly, Uma is trained on a vast selection of rich historical signals collected on Upwork and therefore can draw from specific examples of how to go about solving a problem. In this case, it starts by sharing a recommendation to start looking for a web hosting service, knowing this has been a successful strategy to this problem in the past for customers on Upwork. On the other hand, the pretrained LLM provides a scatterplot of potential with vague responses that could be the answer to the user’s question, but does not give a precise solution. It’s also unable to have a conversation that learns more about what the user really needs and it purely draws from internet data that may or may not be a viable solution for this user.
The differences in the Uma model as compared to pretrained LLMs is stark and points to the importance of creating custom workflows that serve specific use cases for our customers.
The Future of Work with Uma
One final thought around AI: British statistician George Box once quipped that “all statistical models are wrong, but some are useful.” The intention there was to point out that although statistical models often are useful tools, they are still rather contrived and cannot faithfully model the full complexity of our reality. As LLMs and AI models are fundamentally statistical models, we at Upwork take the same fundamental view on them: these models are not actually “intelligent” and cannot “reason,” but they can still be extremely useful tools that offer productivity multipliers to our clients and freelancers.
Upwork’s research and development around AI is anchored to that goal of usefulness and human enablement. We’re on a quest to redesign work by combining today's most innovative technology with the world’s most highly skilled talent so that the best work solutions are available for our customers to solve difficult business problems.
We see a tremendous opportunity to extend this mission with AI, specifically Uma. We’re uniquely positioned with our platform experience, reach, and technical knowhow to develop and train mindful AI that will enable an intuitive, streamlined, and high-quality work experience for businesses and professionals that progresses our ways of working forward.
The future of work is about harnessing the power of human-AI collaboration to achieve new levels of productivity, creativity, and collective success. With Uma working alongside high-quality talent, we are paving the way for that future.










.png)
.png)




