Productivity Paradox of Generative AI: A Conversation with Auste Simkute and Lev Tankelevitch

28 May 2024
Productivity Paradox of Generative AI: A Conversation with Auste Simkute and Lev Tankelevitch

Authors: Auste Simkute, Lev Tankelevitch, Allie Blaising, Ted Liu

Generative AI (gen AI) tools have great potential to enhance the productivity of knowledge workers and creatives. However, the anticipated boost in productivity isn’t always realized and can vary by use cases. Users experience unanticipated challenges including increased cognitive load, frustration over workflow interruptions, and excessive time spent on verifying the output of the AI. In short, gen AI tool functionality designed to enhance human productivity may paradoxically hinder it, due to the frictions users experience when adopting and integrating such functionality into their workflows. 

Upwork’s User Research Team and the Upwork Research Institute recently spoke with Dr. Auste Simkute and Dr. Lev Tankelevitch, researchers at Microsoft Research. Auste studies human-AI interactions from a design perspective, with a current focus on trust, productivity, and collaboration. Lev’s work revolves around augmenting human agency in collaborative knowledge work, with a current focus on designing gen AI systems that help people think clearly and act more intentionally, and improving human-AI interaction.

Our conversation was part of Upwork’s Reimagining Work—a lecture series designed to provide a forum for expert practitioners and academics to foster the exchange of views on the present and future of work. In this conversation, we discuss the cognitive and metacognitive demands of generative AI tools, how these demands affect user productivity, and the optimal designs of AI tools to contribute to user productivity. 

[1] Allie Blaising, Lead User Researcher:

From your research, what are the main reasons and cases where gen AI systems could hinder or even reduce user productivity? 

Auste and Lev: We recognized at least four key reasons why gen AI systems may interfere with users’ productivity—first, the production-to-evaluation shift. Using gen AI tools often requires users to shift from actively producing outputs, e.g., a piece of code or text, to passively reviewing or ‘proofreading’ them. The task of monitoring can be cognitively demanding, but gen AI adds further challenges due to its poor explainability, infrequent reliability, and high automation capacity. 

These factors can reduce situational awareness and make it difficult for users to debug or proofread the gen AI outputs and predict the system’s behavior, taking users’ attention away from their primary task. This productivity loss can be evident when users need to debug AI-generated code or integrate a paragraph of AI-generated text into their writing.

The second reason is unhelpful workflow restructuring. For example, gen AI changes the workflow by introducing new tasks, such as prompting or adapting outputs. These tasks can be frustrating, take a lot of time to learn, and even lead users to abandon using gen AI and complete the task manually. Changes in the workflow also disrupt the sequence of tasks. Not being able to follow a familiar sequence of steps necessary to complete a task can leave users ineffectively shifting between various steps, feeling lost and as if they were ‘managing chaos.’

The third reason that could cost productivity is task interruptions. In some current gen AI tools, generated outputs (e.g., large blocks or code or text) can interrupt users who are in states of ‘flow’ or ‘acceleration’, disrupting their productive work and costing time and effort when trying to return to it.

Lastly, productivity can be hindered because, while AI can make easy tasks easier, it can also make hard tasks harder—a phenomenon that we termed task-complexity polarization (and which has been previously referred to as “clumsy automation”). 

For example, while many programmers find it easy to use gen AI for writing boilerplate code, they often struggle to integrate multiple generated code suggestions or an extended piece of code, and they even choose to turn off the tools during these complex tasks. Similar observations were made in the context of legal analysis with GPT-4, where legal workers found the tool helpful with simple legal analysis but not complex legal reasoning.

[2] Ted Liu, Economist at the Upwork Research Institute:

What is metacognition and why is it a useful framework to think about user productivity when designing and interacting with AI tools?

Auste and Lev: Metacognition is thinking about thinking. It includes things like self-awareness, our ability to recognize one’s knowledge and thought processes; confidence, our estimate of our own knowledge and thought processes; task decomposition, our ability to break down cognitive tasks into sub-tasks; and flexibility, our ability to adapt cognitive strategies when necessary.

In our research, we find that metacognition helps us understand many of the usability challenges of current gen AI tools, which we view as imposing demands on users’ metacognition. 

For example, prompting gen AI tools requires users to be self-aware of their task goals, break them down into sub-goals, and verbalize them via prompts. Evaluating outputs from gen AI tools, such as a section of code or a written document, requires users to have an appropriate level of confidence in the domain at hand, and disentangle that from the tool’s performance in that domain. 

All of this is made difficult by gen AI systems’ non-determinism—identical prompts can produce different outputs—and by their flexibility—how consistently responsive they are to a wide range of user prompts, both a blessing and a curse. More broadly, deciding whether, when, and how to integrate gen AI tools into one’s workflows in the first place also requires self-awareness, appropriate self-confidence, and flexibility around one’s workflows.

Importantly, a metacognitive framework also provides directions for designing gen AI tools that support users’ metacognition and offer improved usability. For example, gen AI tools can be designed to proactively increase users’ self-awareness of their task goals, help break down complex tasks into sub-tasks, and support users in assessing and adjusting their own confidence during their interactions with gen AI tools and throughout their workflows. 

Indeed, we see the impressive contextual awareness and conversational ability of cutting-edge gen AI as offering exciting opportunities to support users’ metacognition in novel ways.  

[3] Allie:

Does a user’s perceived productivity loss from gen AI always correlate with actual loss? What should this mean for how we think about measuring productivity loss from gen AI? 

Auste and Lev: Productivity is notoriously difficult to measure comprehensively and accurately. Nevertheless, many studies do document productivity gains when working with gen AI, including in coding, writing, consulting, and customer support. These manifest as faster task completion without a meaningful loss in quality, and sometimes even a boost in quality. 

However, evidence suggests that people also perceive a substantially higher productivity boost than occurs in reality (e.g., people overestimate how much time they save). Moreover, in some cases, quality does in fact suffer (e.g., errors are missed or the output quality doesn’t feel human-like), leading to a trade-off between task completion speed and quality of output. It’s not clear if people are always aware of this trade-off.

Moreover, evidence of productivity gains represents average effects, but these gains also likely vary by many dimensions, including by domain, people’s pre-existing expertise levels on tasks or in working with gen AI, and the specific product in question (gen AI tools are not all made equal). 

As hinted throughout, it may also take time for users to adapt to working with gen AI tools, and for gen AI tools to adapt to user needs, with the net impact on productivity changing as these adaptations occur. In this context, understanding productivity loss is not to detract from the potential of gen AI, but rather to catch issues early on and help improve usability and productivity.

Finally, we think it’s important to rethink the concept of productivity in the context of gen AI. Productivity is not only about more work done per unit of time but also includes users’ ability to improve their thinking and be more creative, among other things, and thus ultimately provide higher-quality work. However, all of this is fuzzier and thus much harder to measure objectively.

Emerging research suggests that users find different types of outputs helpful in supporting their work. For example, some users prefer to receive AI-generated text that they can then edit, while others want feedback from AI or questions that would challenge them to think outside the box. While difficult to measure objectively, users’ feedback on whether they feel supported and inspired, or frustrated and burdened, when using these tools is a valuable indication of their effectiveness. 

More broadly, users should be able to use AI in a way that feels supportive to them and allows them to use their expertise and creative potential. Realizing the full positive impact of gen AI requires considering all these complexities. 

[4] Ted:

What are the human factors design solutions to mitigate the potential productivity challenges of AI that your research has uncovered? 

Auste and Lev: In a broad sense, Human Factors research suggests that system flexibility and tailored explainability solutions could help users interact with systems more effectively, reduce their mental workload, and help mitigate productivity losses. 

More specifically, we suggest that users should receive continuous feedback explaining systems’ behavior and the relationship between their inputs and the system outputs. For example, users could receive explanations of which code and comments Copilot relies on as input, highlight prompt changes and resulting output changes, or display alternative code solutions. This could ease the tasks of reviewing and integrating gen AI-generated outputs.

Human Factors also suggest that users should have more agency in personalizing the system and its outputs. For example, users might find it easier to adapt to workflow changes and task interruptions if they could choose times when they want to receive AI output and in what format (e.g., a chunk of text, a question, or a feedback). It could also help them match AI suggestions to their goals and preferences of complexity, length, and frequency and help accommodate varying levels of users’ expertise.

Interfaces and systems should also be designed to reflect work domain constraints and users’ preferred ways of perceiving information. To avoid productivity loss due to AI interruptions, AI suggestions should be carefully embedded in users’ workflows when they can be helpful, e.g., during states of exploration, rather than high-productivity acceleration or flow states. 

Users should be notified before being interrupted so they can briefly pause before switching between tasks, pin down their current task, and then effectively return to it after interacting with AI. Interruption could be used purposefully, e.g., to direct the user to the next best step in the task sequence.

Lastly, users should always be aware of which parts of the task AI is performing and be able to allocate tasks that they are confident AI can reliably perform. They should also be able to choose which tasks they prefer performing with or without AI support. This could help avoid productivity loss by not increasing mental load during demanding tasks.

[5] Allie

In your research, you highlight the need to balance user expertise and confidence with deeper customization in gen AI systems. Why is this balance important and how might we better design for it? 

Auste and Lev: Customization in gen AI systems, the extent of controls available to users, offers the opportunity for users to interact with and explore gen AI systems in deeper ways. However, the value of this depends on users’ expertise in a domain and with gen AI tools—e.g., whether they are expert programmers or novices, and their level of experience with gen AI tools for programming. Whereas expert users may use customization to bootstrap their own understanding of gen AI tools, novices may end up feeling overwhelmed and confused. This isn’t an entirely novel issue to gen AI, but striking the right balance remains important.

While more exploration of this challenge for gen AI is necessary, there are a few promising directions. For example, as is often the case, systems can onboard users gently, while offering the option to surface more controls as users gain more experience—i.e., customization of customization. 

The conversational ability of gen AI also offers the opportunity to design systems that periodically elicit from users their nuanced expertise levels and preferences and adjust customization options accordingly. More ambitiously, the flexibility inherent to gen AI also offers the opportunity to dynamically adapt interfaces to users’ interaction patterns and expertise levels. In short, systems should meet users where they are and guide them toward their goals.

[6] Ted

What are key gen AI productivity questions that your teams are most excited to explore in the future? 

Auste and Lev: One exciting emerging opportunity is using gen AI to directly support and augment human thinking and collaboration. GenAI tools have the potential to become assistants, advisors, or soundboards from which to bounce ideas, get feedback, and mediate collaboration, depending on users’ or teams’ needs, enhancing their strengths and providing help when needed. Achieving this will require deep understanding of how humans work and how to design systems that accommodate and support this. This should be underpinned by system flexibility, personalization, and continuous feedback. 

To this end, we are excited to explore how the contextual awareness and conversational ability of gen AI can enable this. This goes beyond merely automating tasks, aiming instead to increase human agency at work, enabling users to focus more on aspects of work they enjoy or augment their own skills.

Auste Simkute

About Auste Simkute

Auste has recently finished her PhD in Design at the University of Edinburgh. Her research explored explainability in AI-driven decision-making, incorporating insights from Human Factors and Human-Computer Interaction disciplines. Currently, Auste is working with Microsoft, focusing on different generative AI topics related to knowledge work and education. Additionally, she is a junior policy fellow, contributing to a policy report concerning generative AI regulations in the UK.

Lev Tankelevitch

About Lev Tankelevitch

Lev is a senior researcher at Microsoft Research, Cambridge UK, where he explores ways to augment human agency in collaborative work and productivity using generative AI. His research approach is mixed-methods and reflects the intersection of behavioral science, human-computer interaction, and data science. Previously, he was at the Behavioural Insights Team, where he designed and evaluated behaviorally informed interventions in health, social care, and education. He has an academic background, having completed his PhD in cognitive neuroscience at the University of Oxford. 

Allie Blaising

About Allie Blaising 

Allie Blaising is a Senior User Experience Researcher at Upwork, where she leads customer research that shapes design and business decisions across multiple verticals, with a recent focus on Generative AI product initiatives. 

Ted Liu

About Ted Liu

Ted Liu is Research Manager at Upwork, where he focuses on how work and skills evolve in relation to technological progress such as artificial intelligence. He received his PhD in economics from the University of California, Santa Cruz.

Download your copy

We may communicate with you about the information you've requested and other Upwork services. The use of your information is governed by Upwork's Privacy Policy.

Success Image
Thank you

Your copy of Productivity Paradox of Generative AI: A Conversation with Auste Simkute and Lev Tankelevitch will be delivered to your inbox soon.

Oops! Something went wrong while submitting the form.
Join the world’s work marketplace

Recommended research

Join the world’s work marketplace

Find great talent. Find great work. Are you ready to move your business or career forward?

Find talent
Find work
Get started