Claude Code for StarRocks — Workflow Guide
The Setup
You are running analytical queries with StarRocks, a high-performance OLAP database designed for real-time analytics on large datasets. StarRocks uses a columnar storage engine with vectorized execution, materialized views, and MySQL protocol compatibility. Claude Code can write SQL, but it generates PostgreSQL or MySQL queries without considering StarRocks’s table models and optimization features.
What Claude Code Gets Wrong By Default
-
Uses row-oriented table design. Claude creates tables with
CREATE TABLEusing PostgreSQL syntax. StarRocks has multiple table models — Duplicate Key, Aggregate Key, Unique Key, and Primary Key — each optimized for different query patterns. -
Ignores data distribution. Claude creates tables without distribution keys. StarRocks requires
DISTRIBUTED BY HASH(column)for data distribution across nodes — missing this causes performance issues and skewed data. -
Uses standard B-tree indexes. Claude creates
CREATE INDEXwith B-tree indexes. StarRocks uses bitmap indexes, bloom filter indexes, and column-level encoding — B-tree indexes are not the primary optimization strategy. -
Writes OLTP-style queries. Claude writes single-row lookups and transactional queries. StarRocks excels at analytical aggregations over large datasets —
GROUP BY,WINDOW, and aggregate functions on millions of rows.
The CLAUDE.md Configuration
# StarRocks Analytics Project
## Database
- Engine: StarRocks (real-time OLAP)
- Protocol: MySQL compatible
- Storage: columnar with vectorized execution
- Models: Duplicate/Aggregate/Unique/Primary Key
## StarRocks Rules
- Table models: choose based on query pattern
- Distribution: DISTRIBUTED BY HASH(key) BUCKETS n
- Partitioning: PARTITION BY RANGE for time-series
- Materialized views: CREATE MATERIALIZED VIEW for precompute
- Catalog: External catalogs for Hive, Iceberg, Delta Lake
- Loading: INSERT INTO, Stream Load, or Broker Load
## Conventions
- Duplicate Key: for raw event data, full scan queries
- Aggregate Key: for pre-aggregated metrics
- Primary Key: for real-time upserts
- Partition by date for time-series data
- Hash distribute by high-cardinality columns
- Use materialized views for common aggregations
- Connect via MySQL drivers (mysql -h host -P 9030)
Workflow Example
You want to create an analytics table for user event tracking with real-time dashboards. Prompt Claude Code:
“Create a StarRocks table for website events with timestamp, user_id, event_type, page_url, and properties. Partition by day, distribute by user_id, and create materialized views for hourly event counts and daily active users. Use the Duplicate Key model.”
Claude Code should create a DUPLICATE KEY(event_time, user_id) table with PARTITION BY RANGE(event_time) using daily partitions, DISTRIBUTED BY HASH(user_id) BUCKETS 16, a materialized view for hourly event counts using date_trunc('hour', event_time), and a materialized view for daily active users with COUNT(DISTINCT user_id).
Common Pitfalls
-
Wrong table model for the use case. Claude uses Duplicate Key for data that needs upserts. Duplicate Key stores all rows including duplicates — use Primary Key model for real-time upserts or Aggregate Key for automatic pre-aggregation.
-
Too many or too few buckets. Claude uses
BUCKETS 1orBUCKETS 1024without considering data size. As a rule of thumb, each bucket should hold 100MB-1GB of data. Too few buckets cause hotspots; too many cause overhead. -
Missing partition management. Claude creates range partitions but does not add future partitions. StarRocks needs partitions created in advance for incoming data. Use dynamic partitioning (
PROPERTIES("dynamic_partition.enable"="true")) to auto-create partitions.