What breaks if we switch from REST to gRPC for all internal services?
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 mo...)
Decision
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 months), moderate-performance (Class B → hybrid REST/gRPC), and integration-heavy (Class C → remain REST for 12+ months), rather than switching all services to gRPC simultaneously.. Because a blanket REST-to-gRPC migration breaks browser client compatibility, eliminates HTTP caching infrastructure, disrupts debugging workflows (curl, Postman, browser DevTools), requires protobuf schema management overhead, and forces team reskilling simultaneously across all services — tiered classification isolates these breakage points to manageable batches while capturing performance gains where they matter most (services requiring <50ms response time).. Key failure modes: Inconsistent service boundaries causing increased cognitive load for developers maintaining both communication patterns; Premature optimization of low-traffic services consuming resources that could be allocated to actual performance bottlenecks; Misclassification of services leading to wrong protocol choice — e.g., a Class C service that actually has latency-sensitive internal callers. Thresholds: Response time < 50ms for Class A services, Class A migration within 6 months, Class C remains on REST for 12+ months
Next actions
What usually goes wrong
- Risk assessment focused on known threats, missed novel vectors
- Compliance checkbox passed but operational security remained weak
- Low-probability high-impact scenario treated as negligible
Council notes
Attack grid ⓘSurvival rate shows how the recommendation holds under stress scenarios. Low scores indicate conditional vulnerability, not a flaw in the recommendation.
Scenario detail (8)
Assumptions
- The organization runs a microservices architecture with multiple internal services communicating synchronously over REST today
- There exist measurable performance differences between REST/JSON and gRPC/protobuf for the organization's actual payload sizes and call patterns
- The engineering team has capacity to maintain two communication paradigms simultaneously during a multi-month transition
- Service classification into A/B/C tiers can be done objectively based on measurable metrics rather than political negotiation
- The organization's infrastructure (load balancers, service mesh, API gateways, observability stack) can support gRPC — specifically HTTP/2 end-to-end
Fragility signals
- Hubris: ANNOTATE
Operational signals to watch
Flip conditions
Unresolved uncertainty
- The winning branch (b002) was critiqued for not directly inventorying what breaks — it focuses on migration strategy rather than a comprehensive breakage catalog. The killed b003 branch had significantly more specific technical failure modes (proto schema corruption, transcoding latency accumulation) that the winner lacks.
- No branch provided a complete 'what breaks' inventory covering all dimensions: load balancer reconfiguration, observability pipeline changes, testing tool replacement, CI/CD pipeline modifications, service mesh compatibility, and team skill gaps.
- The <50ms threshold for Class A services and the 6/12-month timelines are synthetic — no branch grounded these numbers in measured system data or named engineering heuristics.
- Verdict is largely model-reasoning only — the 3 evidence items (quality mean=1.00) all mapped to b003 which was killed. The surviving winner has no external evidence support.
- REST+HTTP/3 optimization (b006's point) was not seriously evaluated against gRPC for internal services — this remains a legitimate unexplored alternative that could change the recommendation if benchmarked.
Branch battle map
Battle timeline (4 rounds)
Minority report
What if the opposite were true? What *improves* if we optimize REST with HTTP/3, compression, and JSON Schema instead of chasing gRPC? Both branches fixate on gRPC's hype while ignoring REST's maturity in caching, idempotency, and ecosystem tooling.
Pre-mortem (3 scenarios)
Censor oversight
REOPEN SPAR
The winning decision (b003) provides a detailed migration plan but fails to directly address the original question 'what breaks'. It also assumes certain expertise and doesn't scope infrastructure coupling. Surviving branch b002 offers a more nuanced approach that was not selected despite higher confidence in some model outputs.
Structural issues
- SELECTION MISMATCH: b002 provides a reasonable classification framework and polyglot persistence approach, which is more nuanced than b003's blanket gRPC migration
- CONSULTING FOG: The winning decision describes a migration plan but doesn't directly address 'what breaks' when switching to gRPC