Goal: Researchers from Alibaba Group, Hupan Lab and Nanyang Technological University conducted an analysis on whether GPT-4 can perform as a good data analyst. This controversial topic has drawn public attention, but there are still divergent opinions without a definitive conclusion. The goal is to answer the question “Is GPT-4 a good data analyst?” by quantitative comparison with human data analysts.
Methodology:
- Regarded GPT-4 as a data analyst to conduct end-to-end data analysis on databases from various domains.
- Designed prompts for GPT-4 to generate code, data visualizations and analysis.
- Evaluated GPT-4 using automatic metrics and professional human ratings on dimensions like correctness, aesthetics and complexity.
- Compared GPT-4 with data analysts of different seniority levels.
Key findings:
- GPT-4 achieved 0.78 correctness on extracting data, 0.99 on selecting chart types, and 2.5/3 aesthetics rating on average.
- GPT-4 outperformed entry-level and intern data analysts on most metrics. It achieved comparable performance to senior data analysts.
- GPT-4 was much faster, taking 34-55 seconds per instance versus 173-648 seconds for human analysts.
- GPT-4’s cost was only 0.45-2.5% of human data analysts’ cost.
Recommendations:
- GPT-4 shows potential as a data analyst from preliminary studies, achieving high performance at low cost.
- However, concerns remain around accuracy, reasoning, and handling new scenarios.
- Further research with more real-world data is required before concluding GPT-4 can replace human data analysts.
- Useful applications could be assisting human analysts or automating routine analytical tasks.
Implications:
- Adopting GPT-4 for data analysis could significantly reduce talent costs for organizations, enabling them to scale data teams faster.
- However, wide adoption could displace many human data analyst jobs and create structural unemployment in the profession.
- Organizations that fail to adopt GPT-4 may face competitive disadvantage from data-driven decisions made faster and cheaper by competitors.
Alternative perspectives:
- The study was limited to 1000 examples from a narrow dataset, which may not represent the full scope of real-world data analysis.
- Human analysts likely still surpass GPT-4 in complex reasoning, creativity and handling new scenarios outside the training data.
- Studies funded by AI creators like Anthropic may be incentivized to portray their technology favorably.
AI predictions:
- GPT-4 will likely match or exceed human-level performance on a wider range of analytical tasks as training data and compute scale up.
- Hybrid human-AI data analysis teams will emerge, with humans focusing on strategy, quality control and customer interactions.
- As natural language interfaces improve, GPT-4 could enable self-service data analysis for non-technical users.
Glossary:
- GPT-4 as a data analyst: Treating the AI system GPT-4 as if it were a human data analyst in order to evaluate its capabilities on analytical tasks.
- End-to-end data analysis framework: A process created by the authors where GPT-4 is prompted to extract data, visualize it, and analyze it given a business question and database.
- Code generation: GPT-4 generating SQL and Python code to query databases and visualize data based on instructions.
- Analysis generation: GPT-4 producing written data analysis and insights when prompted with a question and extracted data.
- Professional human evaluation: Rigorous scoring of GPT-4’s outputs on dimensions like correctness, alignment, and complexity by hired experts.
- Cost per instance: A metric calculated by the authors to quantify the expense of each analytical task completed by GPT-4 vs human analysts.
- Additional online information: An optional module allowing GPT-4 to incorporate real-time external knowledge, like from Google, when generating analysis.
Members also get access to our comprehensive database of AI tools and fundraising