×
Written by
Published on

Goal: Researchers from Alibaba Group, Hupan Lab and Nanyang Technological University conducted an analysis on whether GPT-4 can perform as a good data analyst. This controversial topic has drawn public attention, but there are still divergent opinions without a definitive conclusion. The goal is to answer the question “Is GPT-4 a good data analyst?” by quantitative comparison with human data analysts.

Methodology:

  • Regarded GPT-4 as a data analyst to conduct end-to-end data analysis on databases from various domains.
  • Designed prompts for GPT-4 to generate code, data visualizations and analysis.
  • Evaluated GPT-4 using automatic metrics and professional human ratings on dimensions like correctness, aesthetics and complexity.
  • Compared GPT-4 with data analysts of different seniority levels.

Key findings:

  • GPT-4 achieved 0.78 correctness on extracting data, 0.99 on selecting chart types, and 2.5/3 aesthetics rating on average.
  • GPT-4 outperformed entry-level and intern data analysts on most metrics. It achieved comparable performance to senior data analysts.
  • GPT-4 was much faster, taking 34-55 seconds per instance versus 173-648 seconds for human analysts.
  • GPT-4’s cost was only 0.45-2.5% of human data analysts’ cost.

Recommendations:

  • GPT-4 shows potential as a data analyst from preliminary studies, achieving high performance at low cost.
  • However, concerns remain around accuracy, reasoning, and handling new scenarios.
  • Further research with more real-world data is required before concluding GPT-4 can replace human data analysts.
  • Useful applications could be assisting human analysts or automating routine analytical tasks.

Implications:

  • Adopting GPT-4 for data analysis could significantly reduce talent costs for organizations, enabling them to scale data teams faster.
  • However, wide adoption could displace many human data analyst jobs and create structural unemployment in the profession.
  • Organizations that fail to adopt GPT-4 may face competitive disadvantage from data-driven decisions made faster and cheaper by competitors.

Alternative perspectives:

  • The study was limited to 1000 examples from a narrow dataset, which may not represent the full scope of real-world data analysis.
  • Human analysts likely still surpass GPT-4 in complex reasoning, creativity and handling new scenarios outside the training data.
  • Studies funded by AI creators like Anthropic may be incentivized to portray their technology favorably.

AI predictions:

  • GPT-4 will likely match or exceed human-level performance on a wider range of analytical tasks as training data and compute scale up.
  • Hybrid human-AI data analysis teams will emerge, with humans focusing on strategy, quality control and customer interactions.
  • As natural language interfaces improve, GPT-4 could enable self-service data analysis for non-technical users.

Glossary:

  • GPT-4 as a data analyst: Treating the AI system GPT-4 as if it were a human data analyst in order to evaluate its capabilities on analytical tasks.
  • End-to-end data analysis framework: A process created by the authors where GPT-4 is prompted to extract data, visualize it, and analyze it given a business question and database.
  • Code generation: GPT-4 generating SQL and Python code to query databases and visualize data based on instructions.
  • Analysis generation: GPT-4 producing written data analysis and insights when prompted with a question and extracted data.
  • Professional human evaluation: Rigorous scoring of GPT-4’s outputs on dimensions like correctness, alignment, and complexity by hired experts.
  • Cost per instance: A metric calculated by the authors to quantify the expense of each analytical task completed by GPT-4 vs human analysts.
  • Additional online information: An optional module allowing GPT-4 to incorporate real-time external knowledge, like from Google, when generating analysis.

Recommended Research Reports