No description
Find a file
2026-06-18 11:43:54 +03:00
src/cic_rawdataproc_check compare_dataproc: hoist input file paths to module constants 2026-06-18 11:01:41 +03:00
.gitignore .gitignore: ignore /scratch directory for throwaway diagnostic scripts 2026-06-17 15:04:47 +03:00
.pre-commit-config.yaml Initial 2025-12-10 12:41:31 +02:00
.python-version Initial 2025-12-10 12:41:31 +02:00
pyproject.toml deps: resolve airel-pyspect from git.dndlogdp.com over a git URL 2026-06-18 10:55:57 +03:00
README.md README: document the comparison script and rawdataproc pipeline 2026-06-18 11:43:54 +03:00
uv.lock deps: resolve airel-pyspect from git.dndlogdp.com over a git URL 2026-06-18 10:55:57 +03:00

CIC rawdataproc check

The script reprocesses the raw electrometer data with the offline airel.pyspect.rawdataproc pipeline and compares the result with the records that the instrument data acquisition software (CIC-DAQ / Spectops) has already produced. The comparison is done channel by channel against the .records reference.

What the comparison does

python -m cic_rawdataproc_check.compare_dataproc does the following:

  1. Reads the reference records (RECORDS_FILE, the 1-second averaged currents and variances from the instrument) and the raw per-sample parquet files (RAW_GLOB).
  2. Calls assign_blocks. The raw data files contain only the measured currents and timestamps, not the operating mode, so it has to be recovered from the reference records. assign_blocks uses their time intervals to separate the raw records into blocks, i.e. periods of continuous measurement in one operating mode. A new block starts when the opmode changes or when there is a gap of more than 10 s between intervals; otherwise neighbouring intervals of the same opmode stay in one block. Every block gets a unique block_id, and all records in the same block share the same opmode (for example offset or ions). Samples that fall outside every interval are dropped.
  3. Runs the pipeline (process) on the block-tagged raw data with fixed DataprocSettings (FIR kernel, offset opmode, 1-second averaging aligned to the clock). This recomputes the instrument's 1-second records offline.
  4. Removes the last hour, because the two implementations handle the end of the measurement differently.
  5. Joins the pipeline result with the reference by opmode and begin_time rounded to 1 second, and prints for each channel the difference statistics (mean, maximum absolute, median relative) of the current and the variance.
  6. Shows three kinds of plots: the pipeline output and the reference drawn together as time series, the histograms of their differences, and the scatter plots of pipeline against reference.

The rawdataproc pipeline

airel.pyspect.rawdataproc replicates the electrometer current processing pipeline that runs inside the instrument data acquisition software (CIC-DAQ / Spectops).

Block assignment with assign_blocks (step 2 above) is the first stage and prepares the input. The process function then takes the block-tagged data and does the rest:

  1. FIR filtering. Each block is convolved with the given kernel. Before that, the samples that are further than outlier_drop_threshold standard deviations from the block mean are masked. The convolution also keeps a weight for every sample, which tells how much valid input went into the output.
  2. Offset estimation. The blocks with offset opmode (zero signal) are collected. A weighted linear fit over the last two offset blocks gives the offset estimate (intercept and slope). Residual variance between the estimate and signal is taken over up to three offset blocks and used as noise level estimate.
  3. Offset correction. The linear offset estimate is subtracted from the filtered signal of each block.
  4. Averaging. The corrected per-sample values are averaged into fixed time windows (averaging_periods) and into whole blocks.

DataprocVariant selects which implementation is reproduced, CICDAQ or SPECTOPS. The two differ in the mid-index calculation, the offset weight threshold and the variance formula which result in minor differences in the results.

Data Files

Expected in data/ directory:

  • 202511*.parquet - Raw measurement parquet files
  • 20251101-1s.records - 1-second aggregated records files

Install uv, then:

uv run python -m cic_rawdataproc_check.compare_dataproc

Manual Installation (without uv)

Requires Python >= 3.13.

It is recommended to create and activate a virtual environment first. This way the dependencies stay isolated and do not clutter the global Python installation. With uv this is done automatically; for a manual install you have to do it yourself.

pip install .
python -m cic_rawdataproc_check.compare_dataproc

pip install . installs all dependencies listed in pyproject.toml, among them airel-pyspect, which is taken from https://git.dndlogdp.com/sander/pyspect.git (dev branch). PyQt6 is optional; if it is not installed, matplotlib uses Tkinter instead.