Skip to content

IonogramFilter (pynasonde.vipir.riq.parsers.filter)

P pynasonde.vipir.riq.parsers.filter — Coherent post-extraction echo filter for VIPIR RIQ soundings.

Overview

IonogramFilter takes one or more :class:~pynasonde.vipir.riq.echo.EchoExtractor objects (one per sounding) and applies a five-stage cascade of filters to the extracted echo cloud, rejecting RFI, non-planar returns, multi-hop echoes, and isolated noise. When multiple soundings are supplied the filter also enforces temporal coherence: a (frequency, height) cell must be populated in at least temporal_min_soundings consecutive soundings to be retained.

Processing pipeline

EchoExtractor(s)  ─┐
         IonogramFilter.filter()
          ┌────────┴────────────────────────────┐
          │  Stage 1: RFI blanking              │  per-frequency height IQR
          │  Stage 2: EP filter                 │  wavefront planarity
          │  Stage 3: Multi-hop removal         │  2F / 3F ground reflections
          │  Stage 4: DBSCAN noise rejection    │  (f, h, V*, A, EP) cluster
          │  Stage 5: RANSAC trace fitting      │  polynomial h*(f) outlier removal
          │  Stage 6: Temporal coherence        │  multi-scan cell occupancy
          └────────────────────────────────────┘
             pd.DataFrame  (filtered echoes, ``sounding_index`` column added)

Classes

pynasonde.vipir.riq.parsers.filter.IonogramFilter

Multi-stage coherent filter for VIPIR ionospheric echo clouds.

Parameters

bool

Toggle Stage 1 (RFI blanking). Default True.

float

A frequency step is declared RFI/noise when the inter-quartile range of echo heights at that frequency exceeds this value. Ionospheric echoes cluster near E/F-layer heights (height IQR < 150 km); RFI scatters echoes across all gates (height IQR > 300–800 km). Default 300.0.

int

Minimum number of echoes at a frequency before any per-frequency Stage 1 test is applied. Default 3.

bool

Toggle Stage 2 (EP / planar-wavefront filter). Default True.

float

Echoes with residual_deg > ep_max_deg are rejected (non-planar wavefront / multipath). Ignored when ep_filter_enabled=False or when residual_deg is NaN (single-receiver echoes). Set conservatively high (e.g. 90°) — oblique real echoes can have EP 50–80°; let DBSCAN (Stage 4) handle subtler cases. Default 90.0.

bool

Toggle Stage 3 (multi-hop removal). Default True.

tuple of int

Harmonic orders to check. (2, 3) checks for 2F and 3F echoes. Default (2, 3).

float

An echo at height h is considered an n-th order multi-hop if |h - n × h_1F| < multihop_height_tol_km. Default 50.0.

float

Additionally, the candidate multi-hop echo must be weaker than the 1F echo by at least this many dB. Default 6.0.

bool

Toggle Stage 4 (DBSCAN clustering). Default True.

float

DBSCAN neighbourhood radius in normalised feature space. Default 1.0.

int

Minimum cluster size for DBSCAN. Default 5.

tuple of str

Feature columns used for DBSCAN. Columns absent from the DataFrame or entirely NaN are silently skipped. Default ("frequency_khz", "height_km", "velocity_mps", "amplitude_db", "residual_deg").

dict, optional

Per-feature normalisation scale (σ). Keys are column names, values are positive floats. If a key is missing or None, the standard deviation of that feature in the current batch is used.

bool

Toggle Stage 5 (RANSAC trace fitting). Default True.

float

Maximum height residual |h - h*(f)| for an echo to be considered an inlier of the fitted trace. Generous values (100 km) tolerate spread-F; tighter values (50 km) enforce a clean single-layer trace. Default 100.0.

int

Number of randomly sampled echoes used to estimate the trace model in each RANSAC iteration. Must be ≥ ransac_poly_degree + 1. Default 10.

int

Number of RANSAC iterations per sounding. More iterations improve robustness at the cost of compute time. Default 200.

int

Degree of the polynomial h*(f) used as the trace model. Degree 3 captures the curvature of the F-layer trace; higher degrees risk overfitting on sparse soundings. Default 3.

float

Minimum fraction of active echoes that must be inliers for the fitted model to be accepted. If no iteration reaches this threshold the stage is skipped. Default 0.3.

bool

Toggle Stage 6 (temporal coherence). Automatically disabled when only one sounding is provided. Default True.

int

An echo is retained only if a matching echo (within the tolerance windows below) exists in at least this many of the provided soundings. Default 3.

float

Frequency bin width for the temporal coherence grid (kHz). Default 50.

float

Height bin width for the temporal coherence grid (km). Default 50.

stats: Dict[str, dict] property

Rejection statistics from the most recent :meth:filter call.

__init__(rfi_enabled=True, rfi_height_iqr_km=300.0, rfi_min_echoes=3, ep_filter_enabled=True, ep_max_deg=90.0, multihop_enabled=True, multihop_orders=(2, 3), multihop_height_tol_km=50.0, multihop_snr_margin_db=6.0, dbscan_enabled=True, dbscan_eps=1.0, dbscan_min_samples=5, dbscan_features=('frequency_khz', 'height_km', 'velocity_mps', 'amplitude_db', 'residual_deg'), dbscan_feature_scales=None, ransac_enabled=True, ransac_residual_km=100.0, ransac_min_samples=10, ransac_n_iter=200, ransac_poly_degree=3, ransac_min_inlier_fraction=0.3, temporal_enabled=True, temporal_min_soundings=3, temporal_freq_bin_khz=50.0, temporal_height_bin_km=50.0)

filter(sources)

Run all enabled filter stages and return the surviving echoes.

Parameters
EchoExtractor | pd.DataFrame | list thereof

One or more sounding sources. Each element may be an :class:~pynasonde.vipir.riq.echo.EchoExtractor, a pd.DataFrame of echoes, or a list of :class:~pynasonde.vipir.riq.echo.Echo objects.

Returns

pd.DataFrame Filtered echo DataFrame. A sounding_index column (int) is always present; for a single sounding it is all zeros. A filter_mask boolean column marks the echoes that survived all enabled stages (always True in the returned frame, kept for traceability when the caller retains the original).

summary()

Return a human-readable rejection summary string.


Constructor parameters

Stage 1 — RFI blanking

Parameter Type Default Description
rfi_enabled bool True Enable RFI frequency blanking
rfi_height_iqr_km float 300.0 Flag a frequency if its echo height IQR exceeds this value (km)
rfi_min_echoes int 3 Minimum echoes at a frequency before the height-spread test applies

Detection is based on height spread, not echo count. Count-based detection fails when EchoExtractor caps the number of echoes per pulset (e.g. max_echoes_per_pulset=5), because both ionospheric and RFI frequencies then return the same number of echoes.

RFI illuminates random gates across all heights → height IQR ≈ 300–800 km. Ionospheric echoes cluster near E/F-layer heights → height IQR < 150 km.

Stage 2 — EP (wavefront residual) filter

Parameter Type Default Description
ep_filter_enabled bool True Enable EP filter
ep_max_deg float 90.0 Maximum allowed planar-wavefront residual (degrees)

The EP parameter is the RMS residual of the least-squares fit of inter-antenna phase differences to a planar wavefront model. A large EP indicates a non-planar (multipath, distorted, or RFI-contaminated) wavefront. Use a conservative threshold (90°) — oblique real echoes routinely reach 50–80° at low SNR; let Stage 4 DBSCAN handle subtler cases.

Stage 3 — Multi-hop removal

Parameter Type Default Description
multihop_enabled bool True Enable multi-hop (2F, 3F) removal
multihop_orders tuple[int, ...] (2, 3) Hop orders to test
multihop_height_tol_km float 50.0 Height tolerance for Nh* matching (km)
multihop_snr_margin_db float 6.0 Minimum amplitude deficit of multi-hop vs 1F echo (dB)

At each frequency, the 1F reference is the strongest echo in the lower half of the height distribution (echoes at or below the median height). This is more robust than taking the minimum-height echo, which could be a stray noise point. Echoes near N × h*(1F) (within multihop_height_tol_km) that are also at least multihop_snr_margin_db weaker than the 1F echo are labelled as N-hop artefacts and removed.

Stage 4 — DBSCAN clustering

Parameter Type Default Description
dbscan_enabled bool True Enable DBSCAN noise rejection
dbscan_eps float 1.0 DBSCAN neighbourhood radius in normalised feature space
dbscan_min_samples int 5 Minimum cluster size
dbscan_features tuple[str, ...] see below DataFrame columns to use as DBSCAN features

Default features:

("frequency_khz", "height_km", "velocity_mps", "amplitude_db", "residual_deg")

Each feature is normalised by its inter-quartile range before DBSCAN so that all dimensions have comparable weight. Echoes assigned cluster label −1 (noise) are rejected.

Stage 5 — RANSAC trace fitting

Parameter Type Default Description
ransac_enabled bool True Enable RANSAC polynomial trace fitting
ransac_residual_km float 100.0 Maximum height residual for an echo to be an inlier (km)
ransac_min_samples int 10 Echoes randomly sampled per RANSAC iteration
ransac_n_iter int 200 Number of RANSAC iterations per sounding
ransac_poly_degree int 3 Polynomial degree for the h*(f) trace model
ransac_min_inlier_fraction float 0.3 Minimum inlier fraction for a model to be accepted

Fits a degree-ransac_poly_degree polynomial h*(f) to the (frequency, height) echo cloud using Random Sample Consensus. Echoes further than ransac_residual_km from the best-fit curve are rejected as outliers. Run independently per sounding index so that each sounding's ionospheric trace is fitted separately. If no iteration achieves ransac_min_inlier_fraction of the active echoes the stage is skipped for that sounding.

Stage 6 — Temporal coherence (multi-sounding only)

Parameter Type Default Description
temporal_enabled bool True Enable temporal coherence filter
temporal_min_soundings int 3 Minimum soundings a cell must appear in
temporal_freq_bin_khz float 50.0 Frequency bin width for cell definition (kHz)
temporal_height_bin_km float 50.0 Height bin width for cell definition (km)

This stage is silently skipped when only one sounding is supplied.


Quick start

Single sounding

from pynasonde.vipir.riq.echo import EchoExtractor
from pynasonde.vipir.riq.parsers.filter import IonogramFilter
from pynasonde.vipir.riq.parsers.read_riq import VIPIR_VERSION_MAP, RiqDataset

riq = RiqDataset.create_from_file(
    "WI937_2022233235902.RIQ",
    unicode="latin-1",
    vipir_config=VIPIR_VERSION_MAP.configs[1],
)
ext = EchoExtractor(
    sct=riq.sct, pulsets=riq.pulsets,
    snr_threshold_db=3.0, min_height_km=60.0, max_height_km=1000.0,
).extract()

filt = IonogramFilter(
    ep_max_deg=45.0,
    dbscan_eps=1.0,
    dbscan_min_samples=5,
    temporal_enabled=False,      # only one sounding
)

df_clean = filt.filter(ext)     # accepts single extractor or list
print(filt.summary())

Multiple soundings (temporal coherence)

extractors = []
for fname in riq_file_list:
    riq = RiqDataset.create_from_file(fname, ...)
    ext = EchoExtractor(...).extract()
    extractors.append(ext)

filt = IonogramFilter(
    temporal_enabled=True,
    temporal_min_soundings=3,
    temporal_freq_bin_khz=50.0,
    temporal_height_bin_km=50.0,
)

df_clean = filt.filter(extractors)          # list of extractors
# df_clean has column "sounding_index" = 0, 1, 2, ...

Output DataFrame columns

The returned DataFrame contains all columns from :meth:~pynasonde.vipir.riq.echo.EchoExtractor.to_dataframe plus:

Column Type Description
sounding_index int Index into the input extractor list (0 for single sounding)

Statistics

After calling :meth:filter, the stats attribute is populated:

{
    "rfi":       {"input": N, "rejected": N_rfi},
    "ep":        {"input": N, "rejected": N_ep},
    "multihop":  {"input": N, "rejected": N_mh},
    "dbscan":    {"input": N, "rejected": N_db},
    "temporal":  {"input": N, "rejected": N_t},   # absent for single sounding
    "summary":   {"total_input": N, "total_kept": N_k},
}

Human-readable via :meth:summary:

print(filt.summary())
# Stage         Input  Rejected  Kept   Retention
# ─────────────────────────────────────────────────
# RFI            3206        12  3194     99.6 %
# EP             3194       281  2913     91.2 %
# Multi-hop      2913        87  2826     97.0 %
# DBSCAN         2826       179  2647     93.7 %
# ─────────────────────────────────────────────────
# Total          3206       559  2647     82.6 %

Algorithm notes

RFI height-spread test

h_iqr(f) = IQR of height_km at frequency f
flag f  if  h_iqr(f) > rfi_height_iqr_km  AND  count(f) >= rfi_min_echoes

Detection is based on the height spread of echoes at each frequency, not echo count. Count-based detection is unreliable when max_echoes_per_pulset caps the per-frequency echo count. RFI illuminates random range gates → height IQR ≈ 300–800 km; ionospheric echoes cluster near E/F-layer heights → height IQR < 150 km.

Multi-hop geometry

A 2F echo appears at exactly twice the virtual height of the 1F echo at the same frequency because it undergoes an extra ground bounce:

h*(2F) ≈ 2 × h*(1F)
A(2F)  ≈ A(1F) − 10 to 20 dB

The filter identifies the strongest echo in the lower half of the height distribution at each frequency as the 1F reference, then flags any echo at N × h*(1F) ± multihop_height_tol_km that is also weaker by at least multihop_snr_margin_db.

DBSCAN feature scaling

Each feature column is independently normalised:

x_norm = (x - median(x)) / IQR(x)

This makes the dbscan_eps parameter approximately equivalent to "number of IQR units" of separation, giving it a physical interpretation independent of the units of each parameter.

Temporal coherence cell occupancy

cell(i) = (freq_bin, height_bin)  for echo i
occupancy(cell) = number of soundings containing ≥ 1 echo in cell
keep echo i  iff  occupancy(cell(i)) >= temporal_min_soundings

References

  • Zabotin N. A. et al. (2006). NeXtYZ: Three-dimensional electron density inversion for dynasonde ionograms. Radio Science 41(6). https://doi.org/10.1029/2005RS003352

  • Ester M. et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings.