Static K-means Clustering | InvestorUnknownStatic K-Means Clustering is a machine-learning-driven market regime classifier designed for traders who want a data-driven structure instead of subjective indicators or manually drawn zones.
This script performs offline (static) K-means training on your chosen historical window. Using four engineered features:
RSI (Momentum)
CCI (Price deviation / Mean reversion)
CMF (Money flow / Strength)
MACD Histogram (Trend acceleration)
It groups past market conditions into K distinct clusters (regimes). After training, every new bar is assigned to the nearest cluster via Euclidean distance in 4-dimensional standardized feature space.
This allows you to create models like:
Regime-based long/short filters
Volatility phase detectors
Trend vs. chop separation
Mean-reversion vs. breakout classification
Volume-enhanced money-flow regime shifts
Full machine-learning trading systems based solely on regimes
Note:
This script is not a universal ML strategy out of the box.
The user must engineer the feature set to match their trading style and target market.
K-means is a tool, not a ready made system, this script provides the framework.
Core Idea
K-means clustering takes raw, unlabeled market observations and attempts to discover structure by grouping similar bars together.
// STEP 1 — DATA POINTS ON A COORDINATE PLANE
// We start with raw, unlabeled data scattered in 2D space (x/y).
// At this point, nothing is grouped—these are just observations.
// K-means will try to discover structure by grouping nearby points.
//
// y ↑
// |
// 12 | •
// | •
// 10 | •
// | •
// 8 | • •
// |
// 6 | •
// |
// 4 | •
// |
// 2 |______________________________________________→ x
// 2 4 6 8 10 12 14
//
//
//
// STEP 2 — RANDOMLY PLACE INITIAL CENTROIDS
// The algorithm begins by placing K centroids at random positions.
// These centroids act as the temporary “representatives” of clusters.
// Their starting positions heavily influence the first assignment step.
//
// y ↑
// |
// 12 | •
// | •
// 10 | • C2 ×
// | •
// 8 | • •
// |
// 6 | C1 × •
// |
// 4 | •
// |
// 2 |______________________________________________→ x
// 2 4 6 8 10 12 14
//
//
//
// STEP 3 — ASSIGN POINTS TO NEAREST CENTROID
// Each point is compared to all centroids.
// Using simple Euclidean distance, each point joins the cluster
// of the centroid it is closest to.
// This creates a temporary grouping of the data.
//
// (Coloring concept shown using labels)
//
// - Points closer to C1 → Cluster 1
// - Points closer to C2 → Cluster 2
//
// y ↑
// |
// 12 | 2
// | 1
// 10 | 1 C2 ×
// | 2
// 8 | 1 2
// |
// 6 | C1 × 2
// |
// 4 | 1
// |
// 2 |______________________________________________→ x
// 2 4 6 8 10 12 14
//
// (1 = assigned to Cluster 1, 2 = assigned to Cluster 2)
// At this stage, clusters are formed purely by distance.
Your chosen historical window becomes the static training dataset , and after fitting, the centroids never change again.
This makes the model:
Predictable
Repeatable
Consistent across backtests
Fast for live use (no recalculation of centroids every bar)
Static Training Window
You select a period with:
Training Start
Training End
Only bars inside this range are used to fit the K-means model. This window defines:
the market regime examples
the statistical distributions (means/std) for each feature
how the centroids will be positioned post-trainin
Bars before training = fully transparent
Training bars = gray
Post-training bars = full colored regimes
Feature Engineering (4D Input Vector)
Every bar during training becomes a 4-dimensional point:
This combination balances: momentum, volatility, mean-reversion, trend acceleration giving the algorithm a richer "market fingerprint" per bar.
Standardization
To prevent any feature from dominating due to scale differences (e.g., CMF near zero vs CCI ±200), all features are standardized:
standardize(value, mean, std) =>
(value - mean) / std
Centroid Initialization
Centroids start at diverse coordinates using various curves:
linear
sinusoidal
sign-preserving quadratic
tanh compression
init_centroids() =>
// Spread centroids across using different shapes per feature
for c = 0 to k_clusters - 1
frac = k_clusters == 1 ? 0.0 : c / (k_clusters - 1.0) // 0 → 1
v = frac * 2 - 1 // -1 → +1
array.set(cent_rsi, c, v) // linear
array.set(cent_cci, c, math.sin(v)) // sinusoidal
array.set(cent_cmf, c, v * v * (v < 0 ? -1 : 1)) // quadratic sign-preserving
array.set(cent_mac, c, tanh(v)) // compressed
This makes initial cluster spread “random” even though true randomness is hardly achieved in pinescript.
K-Means Iterative Refinement
The algorithm repeats these steps:
(A) Assignment Step, Each bar is assigned to the nearest centroid via Euclidean distance in 4D:
distance = sqrt(dx² + dy² + dz² + dw²)
(B) Update Step, Centroids update to the mean of points assigned to them. This repeats iterations times (configurable).
LIVE REGIME CLASSIFICATION
After training, each new bar is:
Standardized using the training mean/std
Compared to all centroids
Assigned to the nearest cluster
Bar color updates based on cluster
No re-training occurs. This ensures:
No lookahead bias
Clean historical testing
Stable regimes over time
CLUSTER BEHAVIOR & TRADING LOGIC
Clusters (0, 1, 2, 3…) hold no inherent meaning. The user defines what each cluster does.
Example of custom actions:
Cluster 0 → Cash
Cluster 1 → Long
Cluster 2 → Short
Cluster 3+ → Cash (noise regime)
This flexibility means:
One trader might have cluster 0 as consolidation.
Another might repurpose it as a breakout-loading zone.
A third might ignore 3 clusters entirely.
Example on ETHUSD
Important Note:
Any change of parameters or chart timeframe or ticker can cause the “order” of clusters to change
The script does NOT assume any cluster equals any actionable bias, user decides.
PERFORMANCE METRICS & ROC TABLE
The indicator computes average 1-bar ROC for each cluster in:
Training set
Test (live) set
This helps measure:
Cluster profitability consistency
Regime forward predictability
Whether a regime is noise, trend, or reversion-biased
EQUITY SIMULATION & FEES
Designed for close-to-close realistic backtesting.
Position = cluster of previous bar
Fees applied only on regime switches. Meaning:
Staying long → no fee
Switching long→short → fee applied
Switching any→cash → fee applied
Fee input is percentage, but script already converts internally.
Disclaimers
⚠️ This indicator uses machine-learning but does not predict the future. It classifies similarity to past regimes, nothing more.
⚠️ Backtest results are not indicative of future performance.
⚠️ Clusters have no inherent “bullish” or “bearish” meaning. You must interpret them based on your testing and your own feature engineering.
Indicador Pine Script®






















