×
Microsoft’s AI diagnostic system outperforms doctors 4x on complex cases
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft’s AI Diagnostic Orchestrator (MAI-DxO) achieved an 85% diagnostic accuracy rate on complex medical cases from the New England Journal of Medicine, more than four times higher than the 20% mean accuracy of human physicians tested. The system demonstrates how AI could enhance healthcare by improving diagnostic precision while reducing costs, though Microsoft emphasizes it’s designed to assist rather than replace doctors.

How it works: MAI-DxO transforms large language models into a collaborative diagnostic system that mimics real clinical reasoning processes.

  • The system works with multiple advanced AI models including GPT, Llama, Claude, Gemini, Grok, and DeepSeek, creating what Microsoft describes as a “virtual panel of physicians with diverse diagnostic approaches collaborating to solve diagnostic cases.”
  • Unlike traditional AI medical benchmarks that rely on memorized multiple-choice answers, MAI-DxO uses Microsoft’s Sequential Diagnosis Benchmark (SD Bench), which follows the step-by-step diagnostic process real clinicians use.
  • The system shows its reasoning as it develops diagnoses, requests tests, and tracks costs, displaying a diagnostic process familiar to human physicians.

Key performance metrics: The AI system significantly outperformed human doctors while maintaining cost efficiency.

  • MAI-DxO boosted diagnostic performance across every model tested, with the best results achieved when paired with OpenAI’s o3 model.
  • The system was compared against 21 physicians from the UK and US with five to 20 years of experience, who reached only a 20% mean accuracy rate.
  • Beyond accuracy improvements, MAI-DxO returned both higher diagnostic precision and lower costs than individual models or human physicians.

Cost management features: The system includes built-in financial guardrails that address healthcare’s pricing challenges.

  • MAI-DxO is configurable to run within cost limitations set by users or organizations, performing cost-benefit analyses of diagnostic tests.
  • Without these constraints, Microsoft noted the AI might “default to ordering every possible test — regardless of cost, patient discomfort, or delays in care.”
  • This feature directly addresses the astronomical pricing issues in US medical care that both doctors and patients must navigate.

The bigger healthcare context: Microsoft’s announcement comes as AI increasingly penetrates medical applications across the industry.

  • The company reports seeing “over 50 million health-related sessions every day” across its AI consumer products like Bing and Copilot.
  • “From a first-time knee-pain query to a late-night search for an urgent-care clinic, search engines and AI companions are quickly becoming the new front line in healthcare,” Microsoft stated.
  • MAI-DxO represents part of Microsoft’s broader “dedicated consumer health effort” launched last year, alongside other medical AI tools like RAD-DINO for radiology workflows and Microsoft Dragon Copilot for voice assistance.

What they’re saying: Microsoft acknowledges both the potential and limitations of AI in healthcare.

  • The company believes AI can surpass “clinical reasoning capabilities that, across many aspects of clinical reasoning, exceed those of any individual physician” due to its breadth of knowledge.
  • However, Microsoft conceded that MAI-DxO has only been tested on these specialized NEJM cases, making it unclear how the system would handle routine diagnostic tasks.
  • The company maintains that the system isn’t intended to replace human doctors but rather to “reshape healthcare” by giving patients reliable self-assessment options and helping doctors with complex cases.
Microsoft AI system diagnoses complex cases better than human doctors - and for less money

Recent News

Surgeon builds AI platform to improve heart ultrasound diagnostics

Her unique training method correlates ultrasound findings with actual surgical observations.

Former Scale AI exec raises $9M to build AI infrastructure for Middle East

Manual crew assignments and vehicle routing could soon be automated through AI-powered infrastructure.

Chinese startup Noetix launches $1.4K humanoid robot for consumers

The three-foot robot costs about the same as a flagship smartphone.