All Things Upwork

Editor’s Note: Yiota Tsakiri, a Product Architect at oDesk, led the development side of oDesk’s recent site redesign. Here is a follow-up to her last blog post, which was an inside look into the redesign. This post originally appeared on Yiota’s personal blog, YiotaBytes.

Since testing Web applications can be challenging, we wanted to share insight into how we approached testing for oDesk’s redesign. We focused on four types of testing:

  1. Unit testing
  2. Functional testing
  3. UI testing
  4. Performance testing

Here’s a closer look at how we approached each one.

Unit Testing

We invested quite a bit of time to cover our code with as many unit tests as possible. Of course the decision on how to develop unit tests was pretty easy, given that we are working on Django. We used Django’s testing framework and standard library unit test from Django utils. Currently we have an overall coverage of ~85%, and here is an example of how it changes over time.


Functional Testing

Selenium is the standard way to perform functional testing, as it simulates a user’s browsing behavior on a site. This is also what we used for oDesk’s functional testing. Since version 1.4, Django supports native selenium testing.

[…] LiveServerTestCase allows the use of automated test clients other than the Django dummy client such as, for example, the Selenium client, to execute a series of functional tests inside a browser and simulate a real user’s actions.

For every basic group of pages the oDesk visitor site supports, extensive functional tests are written to identify and report broken pieces of the site.

UI Testing

This is where things got really interesting. When we talk about UI testing, we mainly refer to testing the actual visual result that the user sees in their browser, and also making sure that the structure of the page is as expected. To give some examples of what our expectations were, we wanted to detect:

  • broken images in the site
  • broken layout on a page
  • missing text

There are various tools that can do this, but integration is not always easy. The tools that we ended up researching were:

  1. Quality Bots
  2. Fighting Layout Bugs

All of those looked pretty promising and are open source.

1. Quality Bots

This tool is especially promising. It is developed by Google and its primary goal is to reduce the regression test suite and provide free web testing at scale, with minimal human intervention. Usually UI testing that happens by different frameworks is through image comparison, but even if it sounds promising, it is not an industry de facto quality assurance methodology. As described in the Quality Bots site:

[it] will crawl the website on a given platform and browser, while crawling it will record the HTML elements rendered at each pixel of the page. Later this data will be used to compare and calculate layout score.

This approach sounded good, but integration of such a tool in our infrastructure turned out to be more time-consuming than we wanted, so we decided to defer this for a later time. However, I strongly recommend to anyone working on testing that they read through Quality Bots’ wiki/code, to understand how it works. Even if you don’t end up using the tool, you can definitely get ideas from Google’s testing procedure.

2. Fighting Layout Bugs (FLB)

Fighting Layout Bugs is an automatic library for the detection of layout bugs. It currently supports detection for the following scenarios:

  • invalid image URLs
  • text near or overlapping horizontal edge
  • text near or overlapping vertical edge
  • text with too low contrast
  • elements with invisible focus

All these scenarios are commonly found in software. Instead of catching them manually, we integrated FLB with our framework so detection happens automatically. FLB is written in Java and we integrated it in Django with py4j. The Py4j gateway server is run automatically by a fabric script executing tests. FLB is used with Firefox via the WebDriver implementation provided by Selenium. FLB test cases are invoked each time the selenium.get method is executed. Here’s how this is implemented:


As a sanity/lint check, we also validate the structure of our HTML, as invalid HTML usually leads to ugly layout bugs. is used by w3c for HTML5 validation. It validates HTML5, SVG, MathML, RDF, IRI and more. It also runs as a standalone service, so for us it was a no-brainer to use it. We integrated it by implementing a middleware, which sends content to a local instance of on process_response. An HtmlValidationError is thrown when the html is invalid. In this case, we added a list of html errors in the response and output this list of errors at the bottom of the page; here’s an example of how it looks:

Performance Testing

We used—and continue to use—various tools to test our site’s performance. A well-known tool we use is Apache’s ab tool, used for benchmarking Apache’s HTTP server. It shows how many requests per second (RPS) an Apache installation is capable of serving.

We also use Apache’s JMeter and bash scripts to produce heavy load on our servers to test their strength on different load types. With those tests:

  • we check response codes for various groups of pages
  • we measure the minimum, maximum, and average response times for accessing these links
  • we display the success rate for accessing all of the links
  • we issue random requests to our servers with various concurrency levels

Last but not least, something that we are currently looking into is a log replay mechanism to measure our performance. In general with performance testing, we can test with various loads and for some specific URLs, though the traffic we produce is not realistic. With log replay functionality we have the ability to “replay” requests based on Apache’s access log. This way, we have the ability to measure our performance under traffic that is produced by real users.

I would strongly recommend, if you’re interested in reading about performance testing, to go through this very useful resource.

Testing Results Presentation

All our tests are run with a single Fabric command, to which we can pass arguments to disable specific stages if we want to. This command is invoked in every build we run via Jenkins, and if our tests fail, the build also fails. Code coverage, counts of failing tests, screenshots of broken layouts (found via UI testing) and soon performance results are all presented through graphs in Jenkins. Here are a few example screenshots:

Screenshots of broken URLs in UI


Counts of successful vs broken tests


Example of text overlapping with edge


Testing a large webapp, like oDesk’s visitor site, can be really challenging. We heavily invested in our testing infrastructure, because we strongly believe that this is an essential component for the project’s success. We still continue our research and experimentation with various tools, as we strive to provide the best possible user experience.

What testing tools have you had success with? We would love to hear your experience in the comments section below!

Yiota Tsakiri

Product Architect

Yiota Tsakiri is a Product Architect at oDesk. She leads oDesk’s visitor site and blog projects, working to provide the end user with an excellent experience. Before joining oDesk, Yiota worked at Atypon Systems as a Software Engineer and graduated from Illinois Institute of Technology with an MSCS.