- Python 100%
| src/cic_rawdataproc_check | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .python-version | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
CIC rawdataproc check
The script reprocesses the raw electrometer data with the offline
airel.pyspect.rawdataproc pipeline and compares the result with the records
that the instrument data acquisition software (CIC-DAQ / Spectops) has already
produced. The comparison is done channel by channel against the .records
reference.
What the comparison does
python -m cic_rawdataproc_check.compare_dataproc does the following:
- Reads the reference records (
RECORDS_FILE, the 1-second averaged currents and variances from the instrument) and the raw per-sample parquet files (RAW_GLOB). - Calls
assign_blocks. The raw data files contain only the measured currents and timestamps, not the operating mode, so it has to be recovered from the reference records.assign_blocksuses their time intervals to separate the raw records into blocks, i.e. periods of continuous measurement in one operating mode. A new block starts when the opmode changes or when there is a gap of more than 10 s between intervals; otherwise neighbouring intervals of the same opmode stay in one block. Every block gets a uniqueblock_id, and all records in the same block share the sameopmode(for exampleoffsetorions). Samples that fall outside every interval are dropped. - Runs the pipeline (
process) on the block-tagged raw data with fixedDataprocSettings(FIR kernel,offsetopmode, 1-second averaging aligned to the clock). This recomputes the instrument's 1-second records offline. - Removes the last hour, because the two implementations handle the end of the measurement differently.
- Joins the pipeline result with the reference by
opmodeandbegin_timerounded to 1 second, and prints for each channel the difference statistics (mean, maximum absolute, median relative) of the current and the variance. - Shows three kinds of plots: the pipeline output and the reference drawn together as time series, the histograms of their differences, and the scatter plots of pipeline against reference.
The rawdataproc pipeline
airel.pyspect.rawdataproc replicates the electrometer current processing
pipeline that runs inside the instrument data acquisition software (CIC-DAQ /
Spectops).
Block assignment with assign_blocks (step 2 above) is the first stage and
prepares the input. The process function then takes the block-tagged data and
does the rest:
- FIR filtering. Each block is convolved with the given kernel. Before that, the
samples that are further than
outlier_drop_thresholdstandard deviations from the block mean are masked. The convolution also keeps a weight for every sample, which tells how much valid input went into the output. - Offset estimation. The blocks with
offsetopmode (zero signal) are collected. A weighted linear fit over the last two offset blocks gives the offset estimate (intercept and slope). Residual variance between the estimate and signal is taken over up to three offset blocks and used as noise level estimate. - Offset correction. The linear offset estimate is subtracted from the filtered signal of each block.
- Averaging. The corrected per-sample values are averaged into fixed time
windows (
averaging_periods) and into whole blocks.
DataprocVariant selects which implementation is reproduced, CICDAQ or SPECTOPS.
The two differ in the mid-index calculation, the offset weight threshold and the
variance formula which result in minor differences in the results.
Data Files
Expected in data/ directory:
202511*.parquet- Raw measurement parquet files20251101-1s.records- 1-second aggregated records files
Running with uv (recommended)
Install uv, then:
uv run python -m cic_rawdataproc_check.compare_dataproc
Manual Installation (without uv)
Requires Python >= 3.13.
It is recommended to create and activate a virtual environment first. This way the dependencies stay isolated and do not clutter the global Python installation. With uv this is done automatically; for a manual install you have to do it yourself.
pip install .
python -m cic_rawdataproc_check.compare_dataproc
pip install . installs all dependencies listed in pyproject.toml, among them
airel-pyspect, which is taken from https://git.dndlogdp.com/sander/pyspect.git
(dev branch). PyQt6 is optional; if it is not installed, matplotlib uses
Tkinter instead.