An Engineering Approach to Solving SEO Problems | Upwork

October 7, 2021

5 min read

An Engineering approach to solving SEO problems in a serverless way

With a site as big as Upwork.com, gathering, cataloging, and centralizing content can be a real challenge, especially if you need to serve it in a way that's comprehensive for bots and crawlers. Therefore, it's unnecessary to get into the details of how vital Search Engine Optimization (SEO) is for any company that lives or has pieces of its services in a digital environment. Upwork is no exception, we don't take anything for granted, which means continuously working to improve and optimize the site; SEO ranking and positioning are critical to achieving that. With that into account, any problem around such matters is a genuine concern and should be treated with the utmost urgency.

We have been building Upwork with a fully distributed team for years. Considering the situation we were facing, the Site Reliability Engineering (SRE) team was entrusted to bring a swift and optimal solution; with support from the SEO and QA team. This project was delivered by a group of engineers from around the world (Mexico, Russia, Ukraine and the United States)

The problem

When Search Engine bots see the same content appear on different pages, it creates several conflicts for the algorithm, negatively impacting a website's ranking, traffic, and ultimately revenue. As Upwork continued to grow and scale, so did the content they were creating. Therefore, as the business evolved, there was a need to optimize performance by consolidating similar content's ranking signals into a unique version of the page through HTTP 301 redirects.

From a technical perspective, we had tens of thousands of URLs that needed to be redirected using SEO best practices. Of course, this has to meet some standards as being scalable across resources and geolocation, having minimal latency, and very high availability. Below are some of the initial solutions explored and chosen to discard or deprecate to help improve SEO performance and optimization.

Cloudflare Page Rules: We used Cloudflare for our content delivery network (CDN) provider. Those page rules were valuable resources on the edge that could do the trick but had a very limited and strict number of instances (120). This is a very good solution for applying quick fixes around multiple options like caching, security and performance, but considering the number of redirects was in the thousands, this solution could not scale for our needs.

Reverse Proxy Load Balancer (LB): We route millions of HTTPS requests per hour on our load balancers reverse proxy. This is useful and efficient for mapping where each request should go, but configuring this large amount of redirects on our LB is not a good practice. Having thousands of redirects living in the configuration is costly and not easy to maintain. Additionally, this layer serves the vast majority of the traffic on the site, thus not recommended.

App/Web servers: Our application layer is where the requests are served to our clients, redirections could happen here, but with hundreds of microservices and stacks, this requires a lot of effort and cross-functional collaboration, and the solution needs to be as global as possible; otherwise it will add a lot of complexity to the deployment and integration process.

Since we didn't have a quick and efficient solution internally available for this specific case, we looked into the Cloudflare CDN for a feature that could adapt to what we needed. We discovered Cloudflare workers, a service that deploys code in a serverless way on the CDN. This was a brand new tool for our teams, which presented a number of nuances to explore.

The solution

Cloudflare Worker Script

Cloudflare Workers provides a serverless execution environment that allows you to create entirely new applications or augment existing ones without configuring or maintaining infrastructure. (ref)

First and foremost, we needed to make sure that we could set redirects in an optimal way for search engines and with no relevant overhead. There's no point in improving the website's rankings if user experience suffers. So, we began by implementing something simple to start redirecting and checking how this would behave. Thus, we developed this initial piece of code:

--CODE language-markup language-js line-numbers--
async function bulkRedirects(request) {
// Get the incoming request URL and path for parsing
let url = new URL(request.url)
let path = url.pathname + url.search;
let location = await redirectMap.get(path)
// If it exists, create a redirect and return it
if (location) {
console.log("Redirecting to: " + location)
return Response.redirect(location, 301)
}
// If not, return the original request
return fetch(request);
}

addEventListener('fetch', async event => {
event.respondWith(bulkRedirects(event.request))
})

const redirectMap = new Map([
['/redirect1', 'https://www.upwork.com/'],
['/redirect2', 'https://www.upwork.com/'],
['/redirect3', 'https://www.upwork.com/hire/'],
])

‍

So far so good, we could do a simple redirection without much effort, but we were still a long way from where we ultimately needed to be. That allowed us to confirm that the solution did redirect in an SEO optimal way, with minimal redirection hops. The next step was to verify that this didn't impact performance. The SEO team synced with Quality Assurance engineers to run thorough testing and verify that the redirection was indeed transparent and efficient, we were glad to see how this service leveraged the CDN Edge Nodes to serve the content seamlessly for the users and crawlers.

We were off to a good start but still far from our goal. The next obstacle to overcome was the significant number of redirections that we needed to configure. The SEO team worked exhaustively to gather and consolidate thousands of indexed items that needed to be appropriately resolved. Hence, we had an extensive list that needed to be hosted and constantly queried. By that time, it became clear that we would need a database (DB). Hosting all of those records in the same place as the Worker code would be difficult to maintain and create complexities.

Cloudflare Key/Value Store

Workers KV is a global, low-latency, key-value data store. It supports exceptionally high read volumes with low-latency, making it possible to build highly dynamic APIs and websites which respond as quickly as a cached static file would (ref.)

That's when Cloudflare Key/Value (KV) store came into play: as a part of that same solution, it was easily integrated with the code we had. More importantly, it was very low latency and achieved the purpose of hosting the needed redirections. Example below:

UPWORK_COM_REDIRECTS_301

KEY (Source URL)

VALUE (Target URL)

/redirect1

https://www.upwork.com/

/redirect2

https://www.upwork.com/

/redirect3

https://www.upwork.com/hire/

‍

Our code evolved to this:

--CODE language-markup language-js line-numbers--
async function bulkRedirects(request) {
    // Get the incoming request URL and path for parsing
    let url = new URL(request.url)
    let path = url.pathname + url.search;
    // UPWORK_COM_REDIRECTS_301 is the name of the KV
    let location = await UPWORK_COM_REDIRECTS_301.get(path)
    // If it exists, create a redirect and return it
    if (location) {
        console.log("Redirecting to: " + location)
        return Response.redirect(location, 301)
        }
    // If not, return the original request
    return fetch(request);
}

addEventListener('fetch', async event => {
    event.respondWith(bulkRedirects(event.request))
})

‍
The team added a couple of records and confirmed that the database was working correctly and up to the needed availability, scalability, and latency standards. Since this DB is hosted and not accessible through the usual TCP connection, the team needed to set up a driver that interacts with the vendor's API to make the necessary requests. There were two outstanding issues: first, we could upload one record per request with around 3 seconds by request, for 10,000 requests that resulted in more than 8 hours to upload entries into the database.

Second, we needed to send the records of the URL information via GET method. As you might imagine, this was challenging because the data sent was encoded in a specific way that ended up looking different on the KV store. E.g., when we sent /some+URL, that could translate to /some%2BURL in the database.

To fix both situations, we jumped into discussions with the vendor. Shortly after that, we had some patches and documentation updates that addressed those issues by implementing a bulk update API and upload via POST method. Keep in mind that this was a service still in beta and problems were expected, so we worked with the owner of these solutions to improve the overall product. Win-win.

Now we moved the scenario from something like this:

To this:

As you can see, we didn't just return the pre-processed URL that helps ranking, but additionally, we removed the need for Upwork Origin servers, which turned this into a serverless solution for us.

However, our work wasn't complete. We were able to architect this setup that does what we need and in an efficient way, but the challenge now is to make this a repeatable process that's reliable and standardized. To ensure that we don't cause more problems than the ones we fix, it is essential to comply with our service standards and guarantee availability.

In this current state, applying this is a manual, error-prone process. There's no point to solving an SEO problem to achieve higher rankings if that's going to compromise the availability of our site. But, since we are talking about a CDN, it was a real possibility.

Bulk Updater

To provide a solution to the previous point, we implemented a Python script that does all the needed pre-processing and uploading of the records. This script interacts with an API method to upload the necessary redirections into the KV Store, handling bulk uploads to minimize the deployment time.

This component is an essential part of the solution because it manages all of the preparation steps to transform and verify the information. We used a Python library to structure the data to be validated prior to upload, making sure that there were no duplicates that polluted the database, transforming all the redirects into something readable by browsers (URL encoded), and also leveraging the power of the vendor's API to upload 10,000 records simultaneously.

Input Source

Last but not least, we needed reliable and controlled procedures that empower the SEO team to make changes to this database in a controllable and examinable way with the ability to revert changes in case there were issues. To solve this, we created a Git repository of the URLs that host redirects and a deployment process that includes the Bulk Updater and pushes the changes to the required environment.

Deployment Process

Now, it is time to tie all of those pieces together, we have the components that will provide the needed functionality, but we also have to ensure that they are correctly tied and secured, so there's no room for incidents. So the next step is to figure out a process that can interact and update all the needed parts.

SEO team updates the Input Source with previously validated redirects.
The SRE team runs the Bulk updater. This will read the Input Source, create a new database and send all the entries to the KV store.
The CI/CD pipeline changes the Cloudflare Workers Script to use the newly created namespace and deploys.

Results

With all of these components and features, we were able to truly seize the power of the CDN, serving the content to the user from a node closer to its physical location—all without having to manage any services and just pushing the code and the needed records to interact with it. More importantly, all the needed redirections are thoroughly validated automatically, managing hundreds of thousands of entries in a fast resilient way, assuring the integrity of such records and maximizing the availability of the resources, making this completely transparent for search engine bots, crawlers and our users.

At the time of this publication, we have more than 500k entries (redirects) in the KV store, serving more than 8 million requests and more than 100 GB of content per month. Not too long after the solution was implemented, we saw improvements of +50% in traffic and an increase in rankings compared to the previous year.

Summary

SEO is critical for any company, and we had a difficult situation, but big problems don't always require big solutions. Having a distributed and diverse team helped us achieve the creative approach that we needed. With support from our partners and smart work, we were able to combine multiple simple components: Cloudflare Workers to serve the code that executed the redirection, Cloudflare Key-Value store to host thousands of URLs, Input Source to manage all of the entries in a controlled way and the Bulk Uploader to update the database quickly. All of this in a serverless infrastructure, with high availability, resiliency, and low latency. Results were pretty much immediate; overall, we were able to deliver a simple toolset that was easily leveraged and could adapt to new practices and solve future related redirection problems.

We understand how challenging it can be to have good SEO health, so we expect this publication proves to be useful to other engineers looking for alternative solutions.