TL;DR
Goal: Researchers from Harvard Business School, Wharton, MIT, Warwick Business School, and Boston Consulting Group conducted field experiments to assess the impact of AI on consultant productivity and output quality.
Methodology:
- 758 consultants from Boston Consulting Group participated in randomized controlled experiments using GPT-4.
- Consultants were assigned realistic tasks inside and outside the capabilities frontier of AI.
- Performance was measured on productivity (task completion rates) and output quality (expert evaluations).
Key findings:
- For tasks inside the frontier, consultants leveraging AI exhibited 12% higher task completion rates and over 40% higher output quality.
- Consultants in the bottom half of skills benefited more from AI (43% increase) versus top half (17% increase).
- For tasks outside the frontier, consultants using AI produced worse results, with 19 percentage points lower likelihood of correct solutions.
- Two collaboration approaches emerged: Centaurs (delegating between human and AI) and Cyborgs (intertwining efforts with AI).
Recommendations:
- Firms should evaluate AI integration at the task level based on the capabilities frontier rather than wholesale adoption.
- Training and oversight is crucial to ensure appropriate usage and validations for tasks near the frontier edge.
- Roles and workflows will need adjustment to enable fluid human-AI collaboration.
- Diversity of perspectives should be maintained as AI tends to decrease output variability.
Thinking critically
Implications:
- Adoption of AI like LLMs for knowledge work tasks could substantially boost productivity, creativity, and quality (12-40% in the study), with potential to accelerate economic growth. However, lower-skilled workers may face displacement without reskilling opportunities.
- Organizations will need to redesign workflows, roles, training programs, and management practices to enable effective human-AI collaboration. Optimal collaboration may involve Centaur or Cyborg approaches.
- Overreliance on AI could lead to deskilling, erosion of critical thinking, and more homogenized outputs if variability is not actively maintained.
Alternative perspectives:
- The study was limited to a single consulting firm so findings may not generalize across other organizations and industries. More research is needed on different tasks.
- There may be longer-term risks of overreliance on AI like reduced innovation or issues with accountability that were not captured in a short-term study.
- The tasks tested may not fully represent the breadth and complexity of real-world knowledge work activities. More complex tasks may fall outside the capabilities frontier.
AI predictions:
- As AI capabilities expand, the share of tasks inside the frontier is likely to grow rapidly, accelerating integration. However, new challenges will arise as the frontier shifts.
- Effective human-AI collaboration will become a core competency, changing training priorities and potentially hiring criteria. Companies may need specialists in AI collaboration.
- Governance frameworks, monitoring, and controls will be imperative to ensure responsible and ethical AI usage and address risks as they emerge.
Glossary
- Jagged frontier: The uneven capabilities of AI where some tasks are easily automated while other tasks of similar complexity remain outside of AI's current skills. The frontier shifts as AI advances.
- Centaur: A human-AI collaboration model where humans strategically switch between themselves and AI for subtasks best suited to each, fusing together strengths like the mythical half-human, half-horse creature.
- Cyborg: A human-AI collaboration model with intricate integration where humans and AI collaboratively work on subtasks together in tight tandem, like a cybernetic organism.
- LLMs: Large language models, a type of AI like ChatGPT capable of generating human-like text.
- Retainment: The degree subjects directly retained and copied AI-generated text into their own work.
- Semantic similarity: A measure of how similar two pieces of text are in their underlying meaning, not just word usage.
- USE: Google's Universal Sentence Encoder, which encodes text into semantic vector representations.