| Cost |
Sustained ≥500K tok/day at ≥70% GPU util [41][42] |
Spiky or low volume; idle time is free [2] |
| Privacy |
PHI, GDPR, air-gapped — VPC self-host is clean [3][5] |
Data permitted to leave org; BAA in place |
| Capability |
"Good enough" tier — most day-to-day tasks [29] |
Hard refactors, intensive reasoning, long-horizon agents [44] |
| Latency |
Agentic loops (10–30 API calls × round-trips compound) [6] |
Single-shot tasks; no local hardware available |
| Fine-tuning |
Need full SFT / LoRA / custom RLHF [4] |
Prompt-engineering is sufficient |
| Lock-in |
Want portability; vendors swing latency 42% mid-quarter [6] |
Managed-vendor dependency is acceptable [4] |