Driving Performance and Efficiency: Upwork's Path to Operational Excellence

August 11, 2023
12 min

Operational excellence drives organizational success by optimizing processes, minimizing waste, and enhancing customer satisfaction. It requires a commitment to excellence throughout an entire organization and has a direct impact on strategy, processes, people, technology, and culture.

The benefits of operational excellence enable an organization to achieve the following:

  • Gain competitive edge
  • Achieve higher levels of performance
  • Adapt more effectively to evolving market and technical conditions

Adopting operational excellence at Upwork

Before exploring Upwork’s operational excellence culture, it is important to identify the challenges, problems, and difficulties the Engineering organization faced over the years. Probably the most important issue was Upwork’s open bug count increased by about 200 percent from 2019 to 2021.

As a result, three specific issues were identified that contributed to the situation:

  1. The absence of a centralized engineering dashboard hindered the organization’s ability to monitor crucial quality metrics.
  2. There was no mechanism to view hierarchical data easily, that made top-down analysis and problem identification difficult.
  3. It was a challenge to hold a team accountable without regular, weekly peer reviews.

Operational excellence has now become an Upwork focus emphasizing continuous improvement, learning, and organizational excellence. Upwork incorporates the principles of the Toyota Production System (TPS) into its operational excellence approach. Specific actions that have been prioritized by Engineering include:

  • Identification and elimination of bottlenecks, defects, and waste
  • Adoption of weekly operations reviews and ad-hoc audits
  • Coaching through training and connecting with staff with one-on-one guidance
  • Celebration of successes

Upwork determined that to achieve operational excellence, an organization needs to be aligned with core values, metrics, tools, and rituals.

Aligning Engineering to core values

By aligning the Engineering organization to Upwork’s core values, a culture of teamwork, innovation, customer focus, and operational excellence is valued and encouraged. Specific values in Engineering can be summarized as follows:

Play to win as a team: By emphasizing collaboration and teamwork, individual success is directly related to the team’s success. Engineers are encouraged to work together, leverage each other’s strengths, and support one another to achieve common goals. 

Build and break fearlessly: To promote a culture of innovation and continuous improvement, engineers are encouraged to take calculated risks, think “outside the box,” and explore new ideas without fear of failure. This fosters a working environment where experimentation is valued, and learning from failure becomes a growth opportunity. 

Be customer zero: Engineers who put themselves into a “customer’s shoes” benefit by monitoring their software’s impact (both positive and negative) on typical users. Tracking a user’s workflow and recognizing pain points can motivate creation of better solutions that meet customer needs and foster high customer satisfaction. 

Commit to excellence: Last but not least, engineers at Upwork are encouraged to high standards and quality engineering work. An engineer’s commitment to excellence can be achieved by delivering their best work (no shortcuts!), paying attention to detail, continuously learning, and striving to meet or exceed expectations.

Adopting engineering metrics

There are four metrics identified in Engineering to track for operational excellence:

  • Bugs (defects) service level objective
  • Services availability
  • Engineering velocity
  • Site and app page speed performance

Tracking bugs

Bug service level objectives (SLO) are measured within engineering to ensure organizational service level agreements (SLA) are being met. If not, Engineering is expected to react and mitigate bugs quickly.

An SLO should define the acceptable (or desirable) level of bugs, and it sets expectations for the Engineering team on bug resolution, bug response time, and bug backlog management.

There are two key engineering metrics used to effectively track bugs:

  • All unresolved bugs: These are defects that either impact customer experience or negatively impact the business. This metric represents all unresolved bugs at the end of some period of time (usually at the end of a quarter or year).
  • Percentage of bugs meeting SLO: These are the percentage of defects resolved within the time period defined by the SLO based on severity. (In fact, different priorities may have different SLOs.) 

A bug SLO can mandate that all critical bugs be resolved within 48 hours, high-priority bugs should be fixed within a week, and medium-priority bugs should be resolved within a month. A bug SLO should define bug severity criteria and how many bugs at each severity level should exist at any time.

Since the launch of Upwork’s operational excellence program, the overall bug count has been reduced by 75 percent, and the bug SLO has increased by 40 percent:

The SLO & Open Bugs chart is updated weekly, while the Open Bugs Trends chart spans a period of about two years.

With the bug SLO’s introduction, Engineering has prioritized and addressed bugs to maintain an acceptable, if not exceptional, level of quality and stability. Engineering has been able to align bug management practices with overall service and project goals, which ultimately benefits our customers.

Ensuring service availability

Availability of services has become a critical factor in ensuring customer satisfaction, operational efficiency, and business success. Upwork strives to provide reliable and uninterrupted services to our users.

Taking the concept of service level objectives introduced in the previous section (“Tracking bugs”), the measurement of service availability can be monitored with a single metric: services breaching availability.

This SLO represents the probability of a service successfully handling a request without errors or interruptions. Now a standard measurement at Upwork, the availability of a service indicates how reliable and accessible the service is.

Upwork has adopted an industry-standard service tier division (T0, T1, T2, and T3) to categorize services based on their expected availability and success rate in handling requests. T0 services (99.99 percent availability) are of utmost importance with near-flawless reliability, while T1 (99.9 percent availability), T2 (99.5 percent availability), and T3 (98 percent availability) services follow in decreasing order of importance. Each of these tiers allows for a higher tolerance of potential failures and service disruptions.

As the name “services breaching availability” implies, Upwork wants an accidental or unauthorized loss of access metric to be as small as possible. Since the launch of its operational excellence program, this metric has been reduced over the recent year by 16 percent:

Improving engineering velocity with continuous deployment

When an Engineering organization operates in a “perfect flow,” a high engineering velocity reduces the time and effort required to deliver a product or services to the customer as quickly as possible. Continuous deployment (CD) automates the release process resulting in rapid and frequent deployment of updates, bug fixes, and new features.

There are two metrics used to measure engineering velocity by employing CD throughout an Engineering organization:

  • CD pipelines: This metric represents the percentage of deployment pipelines supporting continuous delivery and integration.
  • Production releases deployed by CD pipelines: This metric represents the percentage of production releases deployed automatically using the pipelines supporting continuous delivery and integration. 

A higher value in both metrics represents improved software quality and increased engineering velocity. Achieving an enhanced velocity has many advantages, including reduced time-to-market, increased customer satisfaction, and allows for faster feedback loops.

Adopting a faster speed of delivery offers the business a competitive edge while fostering innovation and enabling teams to respond quickly to changing market demands. Within Upwork Engineering, there is more of a culture of collaboration, accountability, and transparency. CD requires close coordination between development, testing, and operations teams, ultimately leading to higher productivity and efficiency in software development processes.

Since the launch of the operational excellence program at Upwork, the number of services that have adopted a CD pipeline grew by more than 250 percent, while the overall CD pipeline deployment amount increased by 43 percent across the company:

To examine actual, real-life examples, the book Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations provides extensive research and analysis on the benefits of adopting CD. 

Site and App Page Speed Performance

In a prior State of Site Speed Performance at Upwork in 2021 engineering blog post, Upwork made site speed improvements by establishing better culture, processes, and tools. Since then, the primarily metric measuring page speed performance has evolved with the following changes:

  • Desktop and mobile web app metrics were separated
  • New mobile app metrics were introduced
  • Primarily web vitals (LCP, FID, and CLS) were extended with secondary FCP and TBT

Inspired by Google’s Lighthouse Performance Score and Airbnb’s Page Performance Score, an Upwork Page Speed Score (UPSS) was introduced. The Engineering organization needed simpler, understandable, and actionable page speed metrics. As a result, metrics are now calculated for all three device types: desktop, mobile, and native apps:

  • Desktop website’s UPSS: A performance measurement system that tracks multiple performance metrics from customers visiting upwork.com on the desktop website.
  • Mobile website’s UPSS: A performance measurement system that tracks multiple performance metrics from customers visiting upwork.com using a mobile website.
  • Native apps’ UPSS: A performance measurement system that tracks multiple performance metrics from customers visiting Upwork’s iOS and Android native mobile apps.

For the three metrics, a score between 0 to 100 is calculated for each of the three metrics. Web vitals (with different weights) also contribute to the overall score. By selecting the device type (ex: mobile), the combined web vitals can be viewed:

The chart shows that for mobile sites, the overall UPSS was 90, which is good. Since the launch of the operational excellence program, speed performance has improved in all cases: desktop site by 20 percent, mobile site by 23.9 percent, and the mobile app site by 12.8 percent:

Tools that help visualize operational excellence

Tools, in the form of visual dashboards, can present how well an organization is achieving operational excellence. Management (and executive management). Upwork needed an easy-to-visualize overall efficiency, number of open bugs, and process efficiency.

Rather than create a dashboard presentation tool from scratch, a “no-code” platform called Retool was selected. A dedicated team was assembled to build Upwork’s internal Engineering dashboard. The goal was to build a tool that would present factual information supporting overall department and specific group needs.

Some key attributes of Retool include:

  • Dashboard view consisting of a set of widgets
  • Metrics would be displayed using a variety of chart types
  • Comprehensive overview from a variety of different “angles”
  • User interface consisting of at least one tabs for viewing metrics as “buckets”
  • Ability to aggregate metrics based on role, product groups, or the department
  • Flexible search capability (ex: by manager or by group)

The Upwork Engineering dashboard is organized as tabbed sections, each with its own set of charts and tables. The primary metrics shows a summary of bugs managed and tracked by the team:

Engineering’s velocity tracks how well cycle time is progressing:

A visual Engineering dashboard has become critical so that everyone at Upwork is aware of the progress towards operational excellence.

The importance of rituals

In addition to mastering values, metrics, and tools, operational excellence depends on rituals. At Upwork, mandatory operational excellence sync meetings are held weekly to report and offer guidance on metrics being tracked.

During the first year of implementation, each department’s engineering lead presented highlights using an engineering dashboard. As each team became more comfortable with this ritual and performance improved, the focus transitioned from detailed reviews to metric categories and progression.

As a result, each team clarifies its objectives and identifies ways to improve underperforming metrics to discuss at the next sync meeting. Upwork Engineering has found that these rituals act as a reinforcement loop to make continuous improvements in the quest for operational excellence.

In summary

Upwork has experienced remarkable results since prioritizing and investing in operational excellence. Significant improvements in core engineering metrics have had a far-reaching impact across the entire organization. This newfound awareness of the direct relationship between operational excellence and engineering goals has been a game-changer.

In addition to productivity improvements, automated test coverage has become more important. By automating our testing procedures, we can significantly reduce the time and effort required for manual testing while ensuring comprehensive coverage. This accelerates our development cycles and enhances the quality and reliability of our products or services. Automated test coverage can identify and address potential issues early, minimizing risks and maximizing customer satisfaction.

Upwork’s investment in operational excellence has been more than justified. Core engineering metrics have improved, enabling the entire organization to be aware of the impact of operational excellence on engineering goals. Keeping to the spirit of continuous improvement and learning, initiatives like improved team productivity metrics and automated test coverage have been identified going forward.

You might like