Snowflake vs self-hosted ClickHouse for a 5TB/day analytics pipeline. Team of 3 data engineers, $15K/month budget, need sub-second dashboard queries.

accepted_conditional · Pro · 550s · $0.58

Viewing as: Open

5 branches explored · 2 survived · 3 rounds · integrity 75%

WeakStrong

Candidate estimate (inferred)

Risk unknown 550s

Read brief Open timeline MD ↓ Pro JSON ↓ Pro PDF ↓ Ent

Decision timeline Verdict

Deploy self-hosted ClickHouse on a 3-node cluster using i3.2xlarge instances with ReplicatedMergeTree engine and...

Decision

82%

Execution

—

Uncertainty

—

Decision

Concrete components, topology, and thresholds named below are candidate mitigations or example implementations inferred by the Council. They were not confirmed in your filing or established as part of your current environment.

Deploy self-hosted ClickHouse on 3x AWS i3.2xlarge instances ($4,200/month compute) with S3 tiered storage for 90+ day data (~$3,000/month at steady state), totaling $7,200-$9,000/month. Use ReplicatedMergeTree engine with ClickHouse Keeper (not ZooKeeper). This stays well within the $15K budget while Snowflake structurally cannot — at 5TB/day, Snowflake storage alone reaches $15K/month within 4-5 months, with compute credits pushing to $20K-$35K/month by month
ClickHouse on NVMe delivers sub-100ms on typical dashboard aggregations over billion-row tables with proper MergeTree ORDER BY keys, exceeding the sub-second requirement by 10x. Critical failure mode: operational burnout. A 3-person team will spend 20-30% of time on ops. If one engineer leaves, the remaining two face unsustainable burden. Mitigation: document runbooks aggressively in the first 60 days and keep ClickHouse Cloud as an escape hatch. Provision with Ansible/Terraform, enforce 90-day tiered storage policy to keep NVMe under 80% capacity.

Inferred specifics

Structured audit rows for Council-added details. Synthetic basis means the detail was introduced by analysis, not supplied by the filing.

Value	Kind	Basis	Where introduced
Deploy self-hosted ClickHouse on 3x AWS i3	estimate	synthetic	chosen_path
2xlarge instances	estimate	synthetic	chosen_path
$4	estimate	synthetic	chosen_path
200/month compute	estimate	synthetic	chosen_path
S3 tiered storage for 90+ day data	estimate	synthetic	chosen_path
000/month at steady state	estimate	synthetic	chosen_path
totaling $7	estimate	synthetic	chosen_path
pushing to $20K-$35K/month by month 6	estimate	synthetic	chosen_path
by month 6	estimate	synthetic	chosen_path
NVMe delivers sub-100ms on typical dashboard aggregations	threshold	synthetic	chosen_path
exceeding the sub-second requirement by 10x	estimate	synthetic	chosen_path
will spend 20-30% of time on ops	threshold	synthetic	chosen_path
the first 60 days and keep ClickHouse	estimate	synthetic	chosen_path
NVMe under 80% capacity	threshold	synthetic	chosen_path
Write a Terraform module provisioning 3x i3	estimate	synthetic	next_action
2xlarge instances in a single	estimate	synthetic	next_action
policy for 90-day data tiering — target	estimate	synthetic	next_action
0.93	estimate	synthetic	selection_rationale
0.70 confidence	estimate	synthetic	selection_rationale
$15K/month within 4-5 months regardless of query	estimate	synthetic	rejected_alternatives.rationale

Highest-probability failure mode: not computed - insufficient evidence in filing to identify with confidence.

Next actions

Candidate estimate (inferred, not source-confirmed): Write Terraform + Ansible provisioning for 3x i3.2xlarge ClickHouse cluster with ClickHouse Keeper, NVMe storage configuration, and S3 tiered storage bucket with lifecycle policy

infra · immediate

Candidate estimate (inferred, not source-confirmed): Design ReplicatedMergeTree table schemas with ORDER BY keys aligned to top 5 dashboard query patterns; benchmark sub-second performance with representative 1TB sample data

data · immediate

Candidate estimate (inferred, not source-confirmed): Create operational runbooks for ClickHouse upgrades, disk capacity alerts (80% NVMe threshold), replication lag monitoring, and Keeper failure recovery within the first 60 days

infra · before_launch

Candidate estimate (inferred, not source-confirmed): Set up Grafana + Prometheus dashboards tracking NVMe usage, replication lag, query P99 latency, and monthly infrastructure cost against $15K ceiling

infra · before_launch

Candidate estimate (inferred, not source-confirmed): Define ClickHouse Cloud escape hatch trigger: if ops time exceeds 40% of team capacity for 2 consecutive sprints OR team drops below 2 engineers, initiate migration evaluation

data · before_launch

Verdict-to-Work

A model gives you advice. VectorCourt turns the verdict into accountable work.

Budget increases to $30K+/month or Snowflake offers a negotiated enterprise rate below $10/TB compressed storage

Reversal condition · observed · investigation_wo

Create investigation WO

Snowflake becomes viable — its managed service model eliminates the operational burnout risk that is ClickHouse's primary failure mode for a 3-person team

Evidence boundary: condition flips verdict when observed

Export as markdown

### Budget increases to $30K+/month or Snowflake offers a negotiated enterprise rate below $10/TB compressed storage

- Finding ID: `reversal_condition:1_budget_increases_to__30k_month_or_snowflake_offers_a_negotiated_enterprise_rate_below__10_tb_compressed_sto`
- Subtype: `reversal_condition`
- Evidence status: `observed`
- Default work type: `investigation_wo`
- Summary: Snowflake becomes viable — its managed service model eliminates the operational burnout risk that is ClickHouse's primary failure mode for a 3-person team
- Evidence boundary: condition flips verdict when observed
- Reversal condition: Budget increases to $30K+/month or Snowflake offers a negotiated enterprise rate below $10/TB compressed storage

Acceptance criteria:
- Root cause or measurement plan is identified for the reversal condition.
- Evidence status remains marked synthetic until measured.
- Follow-up implementation work is created only after evidence is observed.

Upgrade to Pro to create governed work from this finding.

Team drops to 2 or fewer engineers, or ops burden exceeds 40% of team capacity for 2+ consecutive sprints

Reversal condition · observed · investigation_wo

Create investigation WO

Migrate to ClickHouse Cloud managed service as the pre-planned escape hatch, trading higher cost for reduced operational burden

Evidence boundary: condition flips verdict when observed

Export as markdown

### Team drops to 2 or fewer engineers, or ops burden exceeds 40% of team capacity for 2+ consecutive sprints

- Finding ID: `reversal_condition:2_team_drops_to_2_or_fewer_engineers__or_ops_burden_exceeds_40__of_team_capacity_for_2__consecutive_sprints`
- Subtype: `reversal_condition`
- Evidence status: `observed`
- Default work type: `investigation_wo`
- Summary: Migrate to ClickHouse Cloud managed service as the pre-planned escape hatch, trading higher cost for reduced operational burden
- Evidence boundary: condition flips verdict when observed
- Reversal condition: Team drops to 2 or fewer engineers, or ops burden exceeds 40% of team capacity for 2+ consecutive sprints

Acceptance criteria:
- Root cause or measurement plan is identified for the reversal condition.
- Evidence status remains marked synthetic until measured.
- Follow-up implementation work is created only after evidence is observed.

Upgrade to Pro to create governed work from this finding.

Ingestion rate drops below 1TB/day due to successful data volume reduction or business scope change

Reversal condition · observed · investigation_wo

Create investigation WO

Re-evaluate Snowflake — at lower volumes, storage costs stay within budget and the managed service benefit outweighs ClickHouse operational overhead

Evidence boundary: condition flips verdict when observed

Export as markdown

### Ingestion rate drops below 1TB/day due to successful data volume reduction or business scope change

- Finding ID: `reversal_condition:3_ingestion_rate_drops_below_1tb_day_due_to_successful_data_volume_reduction_or_business_scope_change`
- Subtype: `reversal_condition`
- Evidence status: `observed`
- Default work type: `investigation_wo`
- Summary: Re-evaluate Snowflake — at lower volumes, storage costs stay within budget and the managed service benefit outweighs ClickHouse operational overhead
- Evidence boundary: condition flips verdict when observed
- Reversal condition: Ingestion rate drops below 1TB/day due to successful data volume reduction or business scope change

Acceptance criteria:
- Root cause or measurement plan is identified for the reversal condition.
- Evidence status remains marked synthetic until measured.
- Follow-up implementation work is created only after evidence is observed.

Upgrade to Pro to create governed work from this finding.

Write Terraform + Ansible provisioning for 3x i3.2xlarge ClickHouse cluster with ClickHouse Keeper, NVMe storage configuration, and S3 ti...

Repair action · observed · repair_wo

Create repair WO

implement

Evidence boundary: infra

Export as markdown

### Write Terraform + Ansible provisioning for 3x i3.2xlarge ClickHouse cluster with ClickHouse Keeper, NVMe storage configuration, and S3 ti...

- Finding ID: `repair_action:1_write_terraform___ansible_provisioning_for_3x_i3_2xlarge_clickhouse_cluster_with_clickhouse_keeper__nvme_storage`
- Subtype: `repair_action`
- Evidence status: `observed`
- Default work type: `repair_wo`
- Summary: implement
- Evidence boundary: infra

Acceptance criteria:
- The repair is implemented with deterministic verification.
- The source verdict is linked for revalidation.

Upgrade to Pro to create governed work from this finding.

Design ReplicatedMergeTree table schemas with ORDER BY keys aligned to top 5 dashboard query patterns; benchmark sub-second performance w...

Repair action · observed · repair_wo

Create repair WO

implement

Evidence boundary: data

Export as markdown

### Design ReplicatedMergeTree table schemas with ORDER BY keys aligned to top 5 dashboard query patterns; benchmark sub-second performance w...

- Finding ID: `repair_action:2_design_replicatedmergetree_table_schemas_with_order_by_keys_aligned_to_top_5_dashboard_query_patterns__benchmark`
- Subtype: `repair_action`
- Evidence status: `observed`
- Default work type: `repair_wo`
- Summary: implement
- Evidence boundary: data

Acceptance criteria:
- The repair is implemented with deterministic verification.
- The source verdict is linked for revalidation.

Upgrade to Pro to create governed work from this finding.

Create operational runbooks for ClickHouse upgrades, disk capacity alerts (80% NVMe threshold), replication lag monitoring, and Keeper fa...

Repair action · observed · repair_wo

Create repair WO

implement

Evidence boundary: infra

Export as markdown

### Create operational runbooks for ClickHouse upgrades, disk capacity alerts (80% NVMe threshold), replication lag monitoring, and Keeper fa...

- Finding ID: `repair_action:3_create_operational_runbooks_for_clickhouse_upgrades__disk_capacity_alerts__80__nvme_threshold__replication_lag_m`
- Subtype: `repair_action`
- Evidence status: `observed`
- Default work type: `repair_wo`
- Summary: implement
- Evidence boundary: infra

Acceptance criteria:
- The repair is implemented with deterministic verification.
- The source verdict is linked for revalidation.

Upgrade to Pro to create governed work from this finding.

Set up Grafana + Prometheus dashboards tracking NVMe usage, replication lag, query P99 latency, and monthly infrastructure cost against $...

Repair action · observed · repair_wo

Create repair WO

monitor

Evidence boundary: infra

Export as markdown

### Set up Grafana + Prometheus dashboards tracking NVMe usage, replication lag, query P99 latency, and monthly infrastructure cost against $...

- Finding ID: `repair_action:4_set_up_grafana___prometheus_dashboards_tracking_nvme_usage__replication_lag__query_p99_latency__and_monthly_infr`
- Subtype: `repair_action`
- Evidence status: `observed`
- Default work type: `repair_wo`
- Summary: monitor
- Evidence boundary: infra

Acceptance criteria:
- The repair is implemented with deterministic verification.
- The source verdict is linked for revalidation.

Upgrade to Pro to create governed work from this finding.

Define ClickHouse Cloud escape hatch trigger: if ops time exceeds 40% of team capacity for 2 consecutive sprints OR team drops below 2 en...

Repair action · observed · repair_wo

Create repair WO

decide

Evidence boundary: data

Export as markdown

### Define ClickHouse Cloud escape hatch trigger: if ops time exceeds 40% of team capacity for 2 consecutive sprints OR team drops below 2 en...

- Finding ID: `repair_action:5_define_clickhouse_cloud_escape_hatch_trigger:_if_ops_time_exceeds_40__of_team_capacity_for_2_consecutive_sprints`
- Subtype: `repair_action`
- Evidence status: `observed`
- Default work type: `repair_wo`
- Summary: decide
- Evidence boundary: data

Acceptance criteria:
- The repair is implemented with deterministic verification.
- The source verdict is linked for revalidation.

Upgrade to Pro to create governed work from this finding.

This verdict stops being true when

Budget increases to $30K+/month or Snowflake offers a negotiated enterprise rate below $10/TB compressed storage → Snowflake becomes viable — its managed service model eliminates the operational burnout risk that is ClickHouse's primary failure mode for a 3-person team

Candidate estimate (inferred, not source-confirmed): Team drops to 2 or fewer engineers, or ops burden exceeds 40% of team capacity for 2+ consecutive sprints → Migrate to ClickHouse Cloud managed service as the pre-planned escape hatch, trading higher cost for reduced operational burden

Ingestion rate drops below 1TB/day due to successful data volume reduction or business scope change → Re-evaluate Snowflake — at lower volumes, storage costs stay within budget and the managed service benefit outweighs ClickHouse operational overhead

Full council reasoning, attack grid, and flip conditions included with Pro

Council notes

Socrates

**Split: First reduce data volume before choosing platform.** Before committing to either Snowflake or ClickHouse, co...

Vulcan

1) Choose Snowflake if operational simplicity matters more than cost predictability; 2) Choose self-hosted ClickHouse...

Daedalus

**Recommend: Self-hosted ClickHouse on a 3-node cluster with ReplicatedMergeTree engine and ClickHouse Keeper.** **A...

Loki

Self-hosted ClickHouse on 3 nodes for 5TB/day begs the question: with only 3 data engineers, who manages the inevitab...

Evidence boundary

Observed from your filing

Snowflake vs self-hosted ClickHouse for a 5TB/day analytics pipeline. Team of 3 data engineers, $15K/month budget, need sub-second dashboard queries.

Assumptions used for analysis

The $15K/month budget is a hard ceiling that cannot be negotiated upward
5TB/day ingestion rate is sustained and will not decrease significantly
The 3-person data engineering team has or can acquire sufficient ClickHouse operational skills within 60 days
Dashboard query patterns are predictable enough to align with MergeTree ORDER BY keys for sub-second performance
AWS i3.2xlarge instances remain available and priced at approximately $1,400/month each

Inferred candidate specifics

These details were introduced by the Council during analysis. They were not supplied in your filing.

Deploy self-hosted ClickHouse on 3x AWS i3.2xlarge instances ($4,200/month compute) with S3 tiered storage for 90+ day data (~$3,000/month at steady state), totaling $7,200-$9,000/month. Use ReplicatedMergeTree engine with ClickHouse Keeper (not ZooKeeper). This stays well within the $15K budget while Snowflake structurally cannot — at 5TB/day, Snowflake storage alone reaches $15K/month within 4-5 months, with compute credits pushing to $20K-$35K/month by month 6. ClickHouse on NVMe delivers sub-100ms on typical dashboard aggregations over billion-row tables with proper MergeTree ORDER BY keys, exceeding the sub-second requirement by 10x. Critical failure mode: operational burnout. A 3-person team will spend 20-30% of time on ops. If one engineer leaves, the remaining two face unsustainable burden. Mitigation: document runbooks aggressively in the first 60 days and keep ClickHouse Cloud as an escape hatch. Provision with Ansible/Terraform, enforce 90-day tiered storage policy to keep NVMe under 80% capacity.
Write a Terraform module provisioning 3x i3.2xlarge instances in a single AZ with ClickHouse Keeper, ReplicatedMergeTree table schemas matching the top 5 dashboard query patterns, and an S3 lifecycle policy for 90-day data tiering — target a working proof-of-concept cluster within 5 business days.
b003 had the highest confidence (0.93) among surviving branches, survived 3 rounds of adversarial scrutiny with multiple models strengthening it, named specific instance types and cost numbers, identified concrete failure modes with mitigations, and provided an actionable architecture. b001 (0.70 confidence) proposed Snowflake but could not overcome the structural budget constraint at 5TB/day ingestion. b003 was the clear dominant branch.
Snowflake's volume-proportional pricing is a structural budget wall at 5TB/day. Storage alone reaches $15K/month within 4-5 months regardless of query optimization. The $15K budget is a hard constraint that Snowflake cannot satisfy long-term at this ingestion rate. Data model redesign addresses query speed but not the cost escalation problem.
Killed in round 1. Provided no architecture, no cost analysis, no specific configuration. Restated the question as a conditional without resolving it. Structurally empty.
Killed in round 3. The 70% reduction target is aspirational with no evidence. A 4-week initiative consuming all 3 engineers produces zero pipeline progress with no guarantee of success. Even at 1.5TB/day, Snowflake storage costs still compound past $15K within ~14 months. Delays the hard decision without removing the structural constraint.
Auto-pruned as low-confidence. Raised a valid concern about operational burden that b003 already addresses explicitly with its 20-30% ops time estimate and ClickHouse Cloud escape hatch mitigation.
Write Terraform + Ansible provisioning for 3x i3.2xlarge ClickHouse cluster with ClickHouse Keeper, NVMe storage configuration, and S3 tiered storage bucket with lifecycle policy

Unknowns blocking a firmer verdict

Whether the 3-person team has sufficient ClickHouse operational expertise — if all three are Snowflake/Redshift-experienced with zero ClickHouse exposure, the 60-day ramp-up estimate may be optimistic
Schema evolution failure mode was cut off in b003's output — the full severity and mitigation for ALTER TABLE blocking on 10B+ row ReplicatedMergeTree tables was not fully articulated
The $23/TB Snowflake compressed storage pricing is the current list rate; negotiated enterprise pricing could extend Snowflake's budget viability, though likely not enough to overcome the structural gap at 5TB/day
Query pattern diversity is unknown — sub-100ms assumes dashboard queries align with MergeTree ORDER BY keys. Ad-hoc exploratory queries on non-indexed dimensions may exceed sub-second SLA
b004's killed argument about Keeper failure during peak ingest causing hours of recovery was not fully rebutted — this is a real operational risk for a 3-person team

Operational signals to watch

reversal — Budget increases to $30K+/month or Snowflake offers a negotiated enterprise rate below $10/TB compressed storage

reversal — Candidate estimate (inferred, not source-confirmed): Team drops to 2 or fewer engineers, or ops burden exceeds 40% of team capacity for 2+ consecutive sprints

reversal — Ingestion rate drops below 1TB/day due to successful data volume reduction or business scope change

Branch battle map

Battle timeline (3 rounds)

Round 1 — Initial positions · 2 branches

Branch b002 (Vulcan) eliminated — Branch b002 is structurally empty — it provides no reco...

Round 2 — Adversarial probes · 3 branches

Loki proposed branch b004

Branch b004 (Loki) eliminated — auto-pruned: unsupported low-confidence branch

Socrates proposed branch b005

Loki Self-hosted ClickHouse on 3 nodes for 5TB/day begs the question: with only 3 dat…

Socrates **Split: First reduce data volume before choosing platform.** Before committing …

Round 3 — Final convergence · 2 branches

Branch b005 (Socrates) eliminated — This branch is structurally unsound because it treats a p...

Evidence source proof

evidence source proof not available for legacy verdicts pre-2026-05-20

Markdown JSON

Council chamber

Socrates

Analyst

Vulcan

Engineer

Daedalus

Architect

Loki

Disruptor

a4ae1128-a590-4759-8825-c1f07635e6b6 · Protocol

Council archetypes represent independent reasoning perspectives. They are not individuals but structured reasoning roles.

VectorCourt processes filings through approved AI providers; per-verdict model routing is disclosed in Enterprise audit exports.

This verdict is a structured reasoning artifact, not professional advice. VectorCourt does not provide legal, financial, medical, or other professional advice. You are responsible for your own decisions.

VectorCourt · Pricing · Terms · Privacy · Refund Policy · Clerk, not judge