How Upwork improves site performance

January 24, 2024
3 min

“Web performance is all about making websites fast, including making slow processes seem fast.” — MDN

Have you ever wondered how Upwork maintains consistent page performance and responsiveness while continually improving and releasing new features? Upwork Engineering uses its own in-house monitoring tool to ensure everything is running smoothly for clients and freelancers.

Let’s review some of the tools and techniques that Upwork Engineering uses to monitor system performance.

RUM vs. LUM

In addition to using real user metrics (RUM) collected from the user to measure their experience, another approach is to use lab user metrics (LUM). LUM is a synthetic test that Upworks runs in the staging environment on dedicated servers that resemble the real user before reaching the end-user. This ensures new features don’t impact a user‘s experience.

Upwork embraces WebPageTest

Upwork measurement work utilizes WebPageTest, an open-source web performance tool that provides diagnostic information about how a web page performs under a variety of conditions (ex: different locations in the word, different devices to access Upwork’s app, different networks, and so on). Upwork engineering enhanced WebPageTest to collect more data that enables more in-depth analysis.

What is analyzed

Upwork has more than 600 tests that are run regularly, every 30 minutes on 25 dedicated test agents. This results in approximately 30k tests per day, and each page can have multiple parameters (ex: different devices, internet connectivity, configuration, and so on).

Upwork collects 232 performance metrics for each test run. This includes LCP, TBT, and SpeedIndex.

Upwork tracks the current branch of the run, the methods of the back-end calls used (to diagnose back-end performance issues), Lighthouse report results, and a whole Chrome Devtools extract.

An extra parameter is sent to diagnose possible backend performance issues. A “snapshot” of the backend call it received and how long the internal request took is returned. This information enables Upwork to measure the performance of its servers (and not the performance behavior of user browsers).

Validating the results

At this scale, some pages may become obsolete, pages may change over time, the test user may become outdated, or other factors may be found.

Upwork employs automatic validation for each test run. After the test run finishes, an automated validation checks that the intended page has correctly landed without errors such as “404,” login page, redirections, and so on. The metrics from the results are saved in the InfluxDB time-series database.

Regression analysis

A weekly regression analysis compares a page’s key performance metrics and previous results. This includes speed index, TTFB, LCP, CLS, TTI, and TBT. The median in a given timeframe is calculated and compared with last week’s results. If degradation is detected, relevant performance data should be readily available to investigate the cause. Issues are usually identified and resolved before being deployed to users.

These calculations are done automatically through Upwork’s dashboard. Viewing a side-by-side comparison of Chrome Devtools or backend call stacks just takes a click.

Page ownership

The developers and QA testers responsible for the pages being tested are involved throughout the entire process. You could even call these individuals "page owners." By injecting specific parameter values into the tests and to make the page behave in a different way, a large number of scenarios can be evaluated against each other.

Evaluating options in the pipeline

Owners can add special tests during their pipeline build to compare the performance of an old build to a new one. By viewing the numbers side-by-side, a well-informed decision can be made to determine if the new build needs to be modified before being pushed to the staging environment.

Automatically detecting any deviations

To help minimize the amount of effort required by Engineering, Upwork has developed ways to automatically detect if a page starts to behave differently than its stable baseline. If a different behavior is detected, the page owner is automatically notified of what went wrong. This level of automation makes it possible to resolve any issues quickly and efficiently.

In summary

With so much data available for review and analysis, Upwork’s engineering teams can focus on feature releases while relying on state-of-the-art tools to collect all the necessary performance information to analyze whenever needed.

You might like