Single-value RCR¶
When you have a 1-D dataset and want to recover the underlying central value (mu) and width (sigma) in the presence of outliers, single-value RCR is what you want.
The four rejection techniques¶
import rcrpy
# Pick one based on your data:
rcrpy.RejectionTech.LS_MODE_68 # Symmetric uncontaminated, one-sided contaminants
rcrpy.RejectionTech.LS_MODE_DL # Mixed one-sided + two-sided contaminants
rcrpy.RejectionTech.SS_MEDIAN_DL # Symmetric uncontaminated, two-sided contaminants
rcrpy.RejectionTech.ES_MODE_DL # Mildly asymmetric / very small N
See section 3 of Maples et al. 2018 for a decision tree.
Unweighted, iterative rejection¶
import numpy as np
import rcrpy
# Some data: clean Gaussian + heavy one-sided contamination
rng = np.random.default_rng(42)
y = np.concatenate([
rng.normal(0, 1, size=150),
np.abs(rng.normal(0, 10, size=850)),
])
r = rcrpy.RCR(rcrpy.RejectionTech.LS_MODE_68)
r.perform_rejection(y.tolist())
print(f"mu = {r.result.mu:.3f}")
print(f"sigma = {r.result.sigma:.3f}")
print(f"stDev = {r.result.st_dev:.3f}")
print(f"kept = {int(r.result.flags.sum())} / {len(y)}")
Bulk rejection (faster on large N)¶
The bulk variant rejects many points in each pass instead of one at a time — substantially faster, with the same final result for well-behaved data:
r = rcrpy.RCR(rcrpy.RejectionTech.LS_MODE_68)
r.perform_bulk_rejection(y.tolist())
# Same r.result fields, plus rejectedY / originalY / stDevTotal which
# the bulk path populates via the C++'s setFinalVectors equivalent.
Weighted rejection¶
Pass per-point weights:
weights = np.ones(y.size)
weights[some_indices] = 0.5 # downweight some points
r = rcrpy.RCR(rcrpy.RejectionTech.LS_MODE_68)
r.perform_rejection(y.tolist(), w=weights.tolist())
Results¶
All four invocations populate r.result, which is an RCRResults
dataclass with:
Field |
What |
|---|---|
|
Recovered central value (mean / median / mode) |
|
Recovered robust 68.3-percentile deviation |
|
Recovered standard deviation |
|
Asymmetric width (when applicable) |
|
Asymmetric robust width (ES_MODE_DL) |
|
Combined width (populated by |
|
bool array, |
|
int indices of kept points |
|
Kept y / w |
|
populated by bulk path |
Performance¶
rcrpy runs single-value RCR at roughly 10× the C++ implementation’s
runtime on large datasets (N=1000). Reproduce with
../benchmarks/diagnostics.py.