Where AI is genuinely working
Three use cases account for ~80% of measurable AI value in procurement today, and all three are unglamorous: spend classification, contract clause extraction, and anomaly detection on invoices.
- Spend classification: 94% accuracy on UNSPSC level 2, replaces ~3 FTE-weeks/year of manual coding
- Contract clause extraction: pulls renewal dates, price escalators and termination terms from PDFs at 91% accuracy
- Invoice anomaly detection: catches duplicate billing and price drift with 4–7x fewer false positives than rule-based systems
Where AI is overhyped
'Autonomous negotiation', 'AI-generated RFx', and 'predictive supplier risk scoring' are dominating vendor pitches but rarely surviving production. In our audit, fewer than 9% of customers who bought these features were still using them after 6 months.
The pattern is consistent: features that generate output (a draft email, a risk score, a negotiation message) are easy to demo and hard to trust. Procurement leaders quickly learn that an AI-generated output without provenance is worse than a blank page.
What top-quartile teams are funding next year
The teams getting real ROI from AI are funding a narrow stack: classification, extraction, and anomaly detection. They are explicitly defunding generative features that produce supplier-facing communication.
- 78% planning to expand spend classification coverage
- 64% planning to add contract clause extraction
- 52% planning to add invoice anomaly detection
- Only 11% planning to expand generative RFx or negotiation features
How to evaluate an AI feature in 60 seconds
Ask the vendor three questions: (1) What's the input and output, in concrete terms? (2) What's the measured accuracy in production, not in a demo? (3) What does a human do with the output, and how do they verify it?
If the answers are vague, the feature is a demo, not a tool.
