IonogramFilter (pynasonde.vipir.riq.parsers.filter)¶

P pynasonde.vipir.riq.parsers.filter — Coherent post-extraction echo filter for VIPIR RIQ soundings.

Overview¶

IonogramFilter takes one or more :class:~pynasonde.vipir.riq.echo.EchoExtractor objects (one per sounding) and applies a five-stage cascade of filters to the extracted echo cloud, rejecting RFI, non-planar returns, multi-hop echoes, and isolated noise. When multiple soundings are supplied the filter also enforces temporal coherence: a (frequency, height) cell must be populated in at least temporal_min_soundings consecutive soundings to be retained.

Processing pipeline¶

EchoExtractor(s)  ─┐
                   │
                   ▼
         IonogramFilter.filter()
                   │
          ┌────────┴────────────────────────────┐
          │  Stage 1: RFI blanking              │  per-frequency height IQR
          │  Stage 2: EP filter                 │  wavefront planarity
          │  Stage 3: Multi-hop removal         │  2F / 3F ground reflections
          │  Stage 4: DBSCAN noise rejection    │  (f, h, V*, A, EP) cluster
          │  Stage 5: RANSAC trace fitting      │  polynomial h*(f) outlier removal
          │  Stage 6: Temporal coherence        │  multi-scan cell occupancy
          └────────────────────────────────────┘
                   │
                   ▼
             pd.DataFrame  (filtered echoes, ``sounding_index`` column added)

Classes¶

`pynasonde.vipir.riq.parsers.filter.IonogramFilter` ¶

Multi-stage coherent filter for VIPIR ionospheric echo clouds.

Parameters¶

bool

Toggle Stage 1 (RFI blanking). Default True.

float

A frequency step is declared RFI/noise when the inter-quartile range of echo heights at that frequency exceeds this value. Ionospheric echoes cluster near E/F-layer heights (height IQR < 150 km); RFI scatters echoes across all gates (height IQR > 300–800 km). Default 300.0.

int

Minimum number of echoes at a frequency before any per-frequency Stage 1 test is applied. Default 3.

bool

Toggle Stage 2 (EP / planar-wavefront filter). Default True.

float

Echoes with residual_deg > ep_max_deg are rejected (non-planar wavefront / multipath). Ignored when ep_filter_enabled=False or when residual_deg is NaN (single-receiver echoes). Set conservatively high (e.g. 90°) — oblique real echoes can have EP 50–80°; let DBSCAN (Stage 4) handle subtler cases. Default 90.0.

bool

Toggle Stage 3 (multi-hop removal). Default True.

tuple of int

Harmonic orders to check. (2, 3) checks for 2F and 3F echoes. Default (2, 3).

float

An echo at height h is considered an n-th order multi-hop if |h - n × h_1F| < multihop_height_tol_km. Default 50.0.

float

Additionally, the candidate multi-hop echo must be weaker than the 1F echo by at least this many dB. Default 6.0.

bool

Toggle Stage 4 (DBSCAN clustering). Default True.

float

DBSCAN neighbourhood radius in normalised feature space. Default 1.0.

int

Minimum cluster size for DBSCAN. Default 5.

tuple of str

Feature columns used for DBSCAN. Columns absent from the DataFrame or entirely NaN are silently skipped. Default ("frequency_khz", "height_km", "velocity_mps", "amplitude_db", "residual_deg").

dict, optional

Per-feature normalisation scale (σ). Keys are column names, values are positive floats. If a key is missing or None, the standard deviation of that feature in the current batch is used.

bool

Toggle Stage 5 (RANSAC trace fitting). Default True.

float

Maximum height residual |h - h*(f)| for an echo to be considered an inlier of the fitted trace. Generous values (100 km) tolerate spread-F; tighter values (50 km) enforce a clean single-layer trace. Default 100.0.

int

Number of randomly sampled echoes used to estimate the trace model in each RANSAC iteration. Must be ≥ ransac_poly_degree + 1. Default 10.

int

Number of RANSAC iterations per sounding. More iterations improve robustness at the cost of compute time. Default 200.

int

Degree of the polynomial h*(f) used as the trace model. Degree 3 captures the curvature of the F-layer trace; higher degrees risk overfitting on sparse soundings. Default 3.

float

Minimum fraction of active echoes that must be inliers for the fitted model to be accepted. If no iteration reaches this threshold the stage is skipped. Default 0.3.

bool

Toggle Stage 6 (temporal coherence). Automatically disabled when only one sounding is provided. Default True.

int

An echo is retained only if a matching echo (within the tolerance windows below) exists in at least this many of the provided soundings. Default 3.

float

Frequency bin width for the temporal coherence grid (kHz). Default 50.

float

Height bin width for the temporal coherence grid (km). Default 50.

`stats: Dict[str, dict]` `property` ¶

Rejection statistics from the most recent :meth:filter call.

init(rfi_enabled=True, rfi_height_iqr_km=300.0, rfi_min_echoes=3, ep_filter_enabled=True, ep_max_deg=90.0, multihop_enabled=True, multihop_orders=(2, 3), multihop_height_tol_km=50.0, multihop_snr_margin_db=6.0, dbscan_enabled=True, dbscan_eps=1.0, dbscan_min_samples=5, dbscan_features=('frequency_khz', 'height_km', 'velocity_mps', 'amplitude_db', 'residual_deg'), dbscan_feature_scales=None, ransac_enabled=True, ransac_residual_km=100.0, ransac_min_samples=10, ransac_n_iter=200, ransac_poly_degree=3, ransac_min_inlier_fraction=0.3, temporal_enabled=True, temporal_min_soundings=3, temporal_freq_bin_khz=50.0, temporal_height_bin_km=50.0) ¶

`filter(sources)` ¶

Run all enabled filter stages and return the surviving echoes.

Parameters¶

EchoExtractor | pd.DataFrame | list thereof

One or more sounding sources. Each element may be an :class:~pynasonde.vipir.riq.echo.EchoExtractor, a pd.DataFrame of echoes, or a list of :class:~pynasonde.vipir.riq.echo.Echo objects.

Returns¶

pd.DataFrame Filtered echo DataFrame. A sounding_index column (int) is always present; for a single sounding it is all zeros. A filter_mask boolean column marks the echoes that survived all enabled stages (always True in the returned frame, kept for traceability when the caller retains the original).

`summary()` ¶

Return a human-readable rejection summary string.

Constructor parameters¶

Stage 1 — RFI blanking¶

Parameter	Type	Default	Description
`rfi_enabled`	`bool`	`True`	Enable RFI frequency blanking
`rfi_height_iqr_km`	`float`	`300.0`	Flag a frequency if its echo height IQR exceeds this value (km)
`rfi_min_echoes`	`int`	`3`	Minimum echoes at a frequency before the height-spread test applies

Detection is based on height spread, not echo count. Count-based detection fails when EchoExtractor caps the number of echoes per pulset (e.g. max_echoes_per_pulset=5), because both ionospheric and RFI frequencies then return the same number of echoes.

RFI illuminates random gates across all heights → height IQR ≈ 300–800 km. Ionospheric echoes cluster near E/F-layer heights → height IQR < 150 km.

Stage 2 — EP (wavefront residual) filter¶

Parameter	Type	Default	Description
`ep_filter_enabled`	`bool`	`True`	Enable EP filter
`ep_max_deg`	`float`	`90.0`	Maximum allowed planar-wavefront residual (degrees)

The EP parameter is the RMS residual of the least-squares fit of inter-antenna phase differences to a planar wavefront model. A large EP indicates a non-planar (multipath, distorted, or RFI-contaminated) wavefront. Use a conservative threshold (90°) — oblique real echoes routinely reach 50–80° at low SNR; let Stage 4 DBSCAN handle subtler cases.

Stage 3 — Multi-hop removal¶

Parameter	Type	Default	Description
`multihop_enabled`	`bool`	`True`	Enable multi-hop (2F, 3F) removal
`multihop_orders`	`tuple[int, ...]`	`(2, 3)`	Hop orders to test
`multihop_height_tol_km`	`float`	`50.0`	Height tolerance for Nh* matching (km)
`multihop_snr_margin_db`	`float`	`6.0`	Minimum amplitude deficit of multi-hop vs 1F echo (dB)

At each frequency, the 1F reference is the strongest echo in the lower half of the height distribution (echoes at or below the median height). This is more robust than taking the minimum-height echo, which could be a stray noise point. Echoes near N × h*(1F) (within multihop_height_tol_km) that are also at least multihop_snr_margin_db weaker than the 1F echo are labelled as N-hop artefacts and removed.

Stage 4 — DBSCAN clustering¶

Parameter	Type	Default	Description
`dbscan_enabled`	`bool`	`True`	Enable DBSCAN noise rejection
`dbscan_eps`	`float`	`1.0`	DBSCAN neighbourhood radius in normalised feature space
`dbscan_min_samples`	`int`	`5`	Minimum cluster size
`dbscan_features`	`tuple[str, ...]`	see below	DataFrame columns to use as DBSCAN features

Default features:

("frequency_khz", "height_km", "velocity_mps", "amplitude_db", "residual_deg")

Each feature is normalised by its inter-quartile range before DBSCAN so that all dimensions have comparable weight. Echoes assigned cluster label −1 (noise) are rejected.

Stage 5 — RANSAC trace fitting¶

Parameter	Type	Default	Description
`ransac_enabled`	`bool`	`True`	Enable RANSAC polynomial trace fitting
`ransac_residual_km`	`float`	`100.0`	Maximum height residual for an echo to be an inlier (km)
`ransac_min_samples`	`int`	`10`	Echoes randomly sampled per RANSAC iteration
`ransac_n_iter`	`int`	`200`	Number of RANSAC iterations per sounding
`ransac_poly_degree`	`int`	`3`	Polynomial degree for the h*(f) trace model
`ransac_min_inlier_fraction`	`float`	`0.3`	Minimum inlier fraction for a model to be accepted

Fits a degree-ransac_poly_degree polynomial h*(f) to the (frequency, height) echo cloud using Random Sample Consensus. Echoes further than ransac_residual_km from the best-fit curve are rejected as outliers. Run independently per sounding index so that each sounding's ionospheric trace is fitted separately. If no iteration achieves ransac_min_inlier_fraction of the active echoes the stage is skipped for that sounding.

Stage 6 — Temporal coherence (multi-sounding only)¶

Parameter	Type	Default	Description
`temporal_enabled`	`bool`	`True`	Enable temporal coherence filter
`temporal_min_soundings`	`int`	`3`	Minimum soundings a cell must appear in
`temporal_freq_bin_khz`	`float`	`50.0`	Frequency bin width for cell definition (kHz)
`temporal_height_bin_km`	`float`	`50.0`	Height bin width for cell definition (km)

This stage is silently skipped when only one sounding is supplied.

Quick start¶

Single sounding¶

from pynasonde.vipir.riq.echo import EchoExtractor
from pynasonde.vipir.riq.parsers.filter import IonogramFilter
from pynasonde.vipir.riq.parsers.read_riq import VIPIR_VERSION_MAP, RiqDataset

riq = RiqDataset.create_from_file(
    "WI937_2022233235902.RIQ",
    unicode="latin-1",
    vipir_config=VIPIR_VERSION_MAP.configs[1],
)
ext = EchoExtractor(
    sct=riq.sct, pulsets=riq.pulsets,
    snr_threshold_db=3.0, min_height_km=60.0, max_height_km=1000.0,
).extract()

filt = IonogramFilter(
    ep_max_deg=45.0,
    dbscan_eps=1.0,
    dbscan_min_samples=5,
    temporal_enabled=False,      # only one sounding
)

df_clean = filt.filter(ext)     # accepts single extractor or list
print(filt.summary())

Multiple soundings (temporal coherence)¶

extractors = []
for fname in riq_file_list:
    riq = RiqDataset.create_from_file(fname, ...)
    ext = EchoExtractor(...).extract()
    extractors.append(ext)

filt = IonogramFilter(
    temporal_enabled=True,
    temporal_min_soundings=3,
    temporal_freq_bin_khz=50.0,
    temporal_height_bin_km=50.0,
)

df_clean = filt.filter(extractors)          # list of extractors
# df_clean has column "sounding_index" = 0, 1, 2, ...

Output DataFrame columns¶

The returned DataFrame contains all columns from :meth:~pynasonde.vipir.riq.echo.EchoExtractor.to_dataframe plus:

Column	Type	Description
`sounding_index`	`int`	Index into the input extractor list (0 for single sounding)

Statistics¶

After calling :meth:filter, the stats attribute is populated:

{
    "rfi":       {"input": N, "rejected": N_rfi},
    "ep":        {"input": N, "rejected": N_ep},
    "multihop":  {"input": N, "rejected": N_mh},
    "dbscan":    {"input": N, "rejected": N_db},
    "temporal":  {"input": N, "rejected": N_t},   # absent for single sounding
    "summary":   {"total_input": N, "total_kept": N_k},
}

Human-readable via :meth:summary:

print(filt.summary())
# Stage         Input  Rejected  Kept   Retention
# ─────────────────────────────────────────────────
# RFI            3206        12  3194     99.6 %
# EP             3194       281  2913     91.2 %
# Multi-hop      2913        87  2826     97.0 %
# DBSCAN         2826       179  2647     93.7 %
# ─────────────────────────────────────────────────
# Total          3206       559  2647     82.6 %

Algorithm notes¶

RFI height-spread test¶

h_iqr(f) = IQR of height_km at frequency f
flag f  if  h_iqr(f) > rfi_height_iqr_km  AND  count(f) >= rfi_min_echoes

Detection is based on the height spread of echoes at each frequency, not echo count. Count-based detection is unreliable when max_echoes_per_pulset caps the per-frequency echo count. RFI illuminates random range gates → height IQR ≈ 300–800 km; ionospheric echoes cluster near E/F-layer heights → height IQR < 150 km.

Multi-hop geometry¶

A 2F echo appears at exactly twice the virtual height of the 1F echo at the same frequency because it undergoes an extra ground bounce:

h*(2F) ≈ 2 × h*(1F)
A(2F)  ≈ A(1F) − 10 to 20 dB

The filter identifies the strongest echo in the lower half of the height distribution at each frequency as the 1F reference, then flags any echo at N × h*(1F) ± multihop_height_tol_km that is also weaker by at least multihop_snr_margin_db.

DBSCAN feature scaling¶

Each feature column is independently normalised:

x_norm = (x - median(x)) / IQR(x)

This makes the dbscan_eps parameter approximately equivalent to "number of IQR units" of separation, giving it a physical interpretation independent of the units of each parameter.

Temporal coherence cell occupancy¶

cell(i) = (freq_bin, height_bin)  for echo i
occupancy(cell) = number of soundings containing ≥ 1 echo in cell
keep echo i  iff  occupancy(cell(i)) >= temporal_min_soundings

References¶

Zabotin N. A. et al. (2006). NeXtYZ: Three-dimensional electron density inversion for dynasonde ionograms. Radio Science 41(6). https://doi.org/10.1029/2005RS003352
Ester M. et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings.

Echo Extractor API
Filter examples
examples/vipir/ionogram_filter_wi937.py
examples/vipir/ionogram_filter_pl407.py
examples/vipir/ionogram_filter_multi.py

IonogramFilter (pynasonde.vipir.riq.parsers.filter)¶

Overview¶

Processing pipeline¶

Classes¶

pynasonde.vipir.riq.parsers.filter.IonogramFilter ¶

Parameters¶

stats: Dict[str, dict] property ¶

filter(sources) ¶

Parameters¶

Returns¶

summary() ¶

Constructor parameters¶

Stage 1 — RFI blanking¶

Stage 2 — EP (wavefront residual) filter¶

Stage 3 — Multi-hop removal¶

Stage 4 — DBSCAN clustering¶

Stage 5 — RANSAC trace fitting¶

Stage 6 — Temporal coherence (multi-sounding only)¶

Quick start¶

Single sounding¶

Multiple soundings (temporal coherence)¶

Output DataFrame columns¶

Statistics¶

Algorithm notes¶

RFI height-spread test¶

Multi-hop geometry¶

DBSCAN feature scaling¶

Temporal coherence cell occupancy¶

References¶

Related¶

`pynasonde.vipir.riq.parsers.filter.IonogramFilter` ¶

`stats: Dict[str, dict]` `property` ¶

`filter(sources)` ¶

`summary()` ¶