IonogramFilter (pynasonde.vipir.riq.parsers.filter)¶
P
pynasonde.vipir.riq.parsers.filter — Coherent post-extraction echo filter
for VIPIR RIQ soundings.
Overview¶
IonogramFilter takes one or more
:class:~pynasonde.vipir.riq.echo.EchoExtractor objects (one per sounding)
and applies a five-stage cascade of filters to the extracted echo cloud,
rejecting RFI, non-planar returns, multi-hop echoes, and isolated noise.
When multiple soundings are supplied the filter also enforces temporal
coherence: a (frequency, height) cell must be populated in at least
temporal_min_soundings consecutive soundings to be retained.
Processing pipeline¶
EchoExtractor(s) ─┐
│
▼
IonogramFilter.filter()
│
┌────────┴────────────────────────────┐
│ Stage 1: RFI blanking │ per-frequency height IQR
│ Stage 2: EP filter │ wavefront planarity
│ Stage 3: Multi-hop removal │ 2F / 3F ground reflections
│ Stage 4: DBSCAN noise rejection │ (f, h, V*, A, EP) cluster
│ Stage 5: RANSAC trace fitting │ polynomial h*(f) outlier removal
│ Stage 6: Temporal coherence │ multi-scan cell occupancy
└────────────────────────────────────┘
│
▼
pd.DataFrame (filtered echoes, ``sounding_index`` column added)
Classes¶
pynasonde.vipir.riq.parsers.filter.IonogramFilter
¶
Multi-stage coherent filter for VIPIR ionospheric echo clouds.
Parameters¶
bool
Toggle Stage 1 (RFI blanking). Default True.
float
A frequency step is declared RFI/noise when the inter-quartile range
of echo heights at that frequency exceeds this value. Ionospheric
echoes cluster near E/F-layer heights (height IQR < 150 km); RFI
scatters echoes across all gates (height IQR > 300–800 km).
Default 300.0.
int
Minimum number of echoes at a frequency before any per-frequency
Stage 1 test is applied. Default 3.
bool
Toggle Stage 2 (EP / planar-wavefront filter). Default True.
float
Echoes with residual_deg > ep_max_deg are rejected (non-planar
wavefront / multipath). Ignored when ep_filter_enabled=False or
when residual_deg is NaN (single-receiver echoes). Set
conservatively high (e.g. 90°) — oblique real echoes can have
EP 50–80°; let DBSCAN (Stage 4) handle subtler cases.
Default 90.0.
bool
Toggle Stage 3 (multi-hop removal). Default True.
tuple of int
Harmonic orders to check. (2, 3) checks for 2F and 3F echoes.
Default (2, 3).
float
An echo at height h is considered an n-th order multi-hop if
|h - n × h_1F| < multihop_height_tol_km. Default 50.0.
float
Additionally, the candidate multi-hop echo must be weaker than the
1F echo by at least this many dB. Default 6.0.
bool
Toggle Stage 4 (DBSCAN clustering). Default True.
float
DBSCAN neighbourhood radius in normalised feature space. Default 1.0.
int
Minimum cluster size for DBSCAN. Default 5.
tuple of str
Feature columns used for DBSCAN. Columns absent from the DataFrame
or entirely NaN are silently skipped. Default
("frequency_khz", "height_km", "velocity_mps", "amplitude_db", "residual_deg").
dict, optional
Per-feature normalisation scale (σ). Keys are column names, values
are positive floats. If a key is missing or None, the standard
deviation of that feature in the current batch is used.
bool
Toggle Stage 5 (RANSAC trace fitting). Default True.
float
Maximum height residual |h - h*(f)| for an echo to be considered an
inlier of the fitted trace. Generous values (100 km) tolerate
spread-F; tighter values (50 km) enforce a clean single-layer trace.
Default 100.0.
int
Number of randomly sampled echoes used to estimate the trace model in
each RANSAC iteration. Must be ≥ ransac_poly_degree + 1.
Default 10.
int
Number of RANSAC iterations per sounding. More iterations improve
robustness at the cost of compute time. Default 200.
int
Degree of the polynomial h*(f) used as the trace model. Degree 3
captures the curvature of the F-layer trace; higher degrees risk
overfitting on sparse soundings. Default 3.
float
Minimum fraction of active echoes that must be inliers for the
fitted model to be accepted. If no iteration reaches this threshold
the stage is skipped. Default 0.3.
bool
Toggle Stage 6 (temporal coherence). Automatically disabled when
only one sounding is provided. Default True.
int
An echo is retained only if a matching echo (within the tolerance
windows below) exists in at least this many of the provided soundings.
Default 3.
float
Frequency bin width for the temporal coherence grid (kHz). Default 50.
float
Height bin width for the temporal coherence grid (km). Default 50.
stats: Dict[str, dict]
property
¶
Rejection statistics from the most recent :meth:filter call.
__init__(rfi_enabled=True, rfi_height_iqr_km=300.0, rfi_min_echoes=3, ep_filter_enabled=True, ep_max_deg=90.0, multihop_enabled=True, multihop_orders=(2, 3), multihop_height_tol_km=50.0, multihop_snr_margin_db=6.0, dbscan_enabled=True, dbscan_eps=1.0, dbscan_min_samples=5, dbscan_features=('frequency_khz', 'height_km', 'velocity_mps', 'amplitude_db', 'residual_deg'), dbscan_feature_scales=None, ransac_enabled=True, ransac_residual_km=100.0, ransac_min_samples=10, ransac_n_iter=200, ransac_poly_degree=3, ransac_min_inlier_fraction=0.3, temporal_enabled=True, temporal_min_soundings=3, temporal_freq_bin_khz=50.0, temporal_height_bin_km=50.0)
¶
filter(sources)
¶
Run all enabled filter stages and return the surviving echoes.
Parameters¶
EchoExtractor | pd.DataFrame | list thereof
One or more sounding sources. Each element may be an
:class:~pynasonde.vipir.riq.echo.EchoExtractor, a
pd.DataFrame of echoes, or a list of
:class:~pynasonde.vipir.riq.echo.Echo objects.
Returns¶
pd.DataFrame
Filtered echo DataFrame. A sounding_index column (int) is
always present; for a single sounding it is all zeros. A
filter_mask boolean column marks the echoes that survived
all enabled stages (always True in the returned frame,
kept for traceability when the caller retains the original).
summary()
¶
Return a human-readable rejection summary string.
Constructor parameters¶
Stage 1 — RFI blanking¶
| Parameter | Type | Default | Description |
|---|---|---|---|
rfi_enabled |
bool |
True |
Enable RFI frequency blanking |
rfi_height_iqr_km |
float |
300.0 |
Flag a frequency if its echo height IQR exceeds this value (km) |
rfi_min_echoes |
int |
3 |
Minimum echoes at a frequency before the height-spread test applies |
Detection is based on height spread, not echo count. Count-based
detection fails when EchoExtractor caps the number of echoes per pulset
(e.g. max_echoes_per_pulset=5), because both ionospheric and RFI
frequencies then return the same number of echoes.
RFI illuminates random gates across all heights → height IQR ≈ 300–800 km. Ionospheric echoes cluster near E/F-layer heights → height IQR < 150 km.
Stage 2 — EP (wavefront residual) filter¶
| Parameter | Type | Default | Description |
|---|---|---|---|
ep_filter_enabled |
bool |
True |
Enable EP filter |
ep_max_deg |
float |
90.0 |
Maximum allowed planar-wavefront residual (degrees) |
The EP parameter is the RMS residual of the least-squares fit of inter-antenna phase differences to a planar wavefront model. A large EP indicates a non-planar (multipath, distorted, or RFI-contaminated) wavefront. Use a conservative threshold (90°) — oblique real echoes routinely reach 50–80° at low SNR; let Stage 4 DBSCAN handle subtler cases.
Stage 3 — Multi-hop removal¶
| Parameter | Type | Default | Description |
|---|---|---|---|
multihop_enabled |
bool |
True |
Enable multi-hop (2F, 3F) removal |
multihop_orders |
tuple[int, ...] |
(2, 3) |
Hop orders to test |
multihop_height_tol_km |
float |
50.0 |
Height tolerance for Nh* matching (km) |
multihop_snr_margin_db |
float |
6.0 |
Minimum amplitude deficit of multi-hop vs 1F echo (dB) |
At each frequency, the 1F reference is the strongest echo in the lower half
of the height distribution (echoes at or below the median height). This is more robust than
taking the minimum-height echo, which could be a stray noise point. Echoes near N × h*(1F)
(within multihop_height_tol_km) that are also at least multihop_snr_margin_db weaker than
the 1F echo are labelled as N-hop artefacts and removed.
Stage 4 — DBSCAN clustering¶
| Parameter | Type | Default | Description |
|---|---|---|---|
dbscan_enabled |
bool |
True |
Enable DBSCAN noise rejection |
dbscan_eps |
float |
1.0 |
DBSCAN neighbourhood radius in normalised feature space |
dbscan_min_samples |
int |
5 |
Minimum cluster size |
dbscan_features |
tuple[str, ...] |
see below | DataFrame columns to use as DBSCAN features |
Default features:
Each feature is normalised by its inter-quartile range before DBSCAN so that
all dimensions have comparable weight. Echoes assigned cluster label −1
(noise) are rejected.
Stage 5 — RANSAC trace fitting¶
| Parameter | Type | Default | Description |
|---|---|---|---|
ransac_enabled |
bool |
True |
Enable RANSAC polynomial trace fitting |
ransac_residual_km |
float |
100.0 |
Maximum height residual for an echo to be an inlier (km) |
ransac_min_samples |
int |
10 |
Echoes randomly sampled per RANSAC iteration |
ransac_n_iter |
int |
200 |
Number of RANSAC iterations per sounding |
ransac_poly_degree |
int |
3 |
Polynomial degree for the h*(f) trace model |
ransac_min_inlier_fraction |
float |
0.3 |
Minimum inlier fraction for a model to be accepted |
Fits a degree-ransac_poly_degree polynomial h*(f) to the (frequency, height) echo cloud using
Random Sample Consensus. Echoes further than ransac_residual_km from the best-fit curve are
rejected as outliers. Run independently per sounding index so that each sounding's ionospheric
trace is fitted separately. If no iteration achieves ransac_min_inlier_fraction of the active
echoes the stage is skipped for that sounding.
Stage 6 — Temporal coherence (multi-sounding only)¶
| Parameter | Type | Default | Description |
|---|---|---|---|
temporal_enabled |
bool |
True |
Enable temporal coherence filter |
temporal_min_soundings |
int |
3 |
Minimum soundings a cell must appear in |
temporal_freq_bin_khz |
float |
50.0 |
Frequency bin width for cell definition (kHz) |
temporal_height_bin_km |
float |
50.0 |
Height bin width for cell definition (km) |
This stage is silently skipped when only one sounding is supplied.
Quick start¶
Single sounding¶
from pynasonde.vipir.riq.echo import EchoExtractor
from pynasonde.vipir.riq.parsers.filter import IonogramFilter
from pynasonde.vipir.riq.parsers.read_riq import VIPIR_VERSION_MAP, RiqDataset
riq = RiqDataset.create_from_file(
"WI937_2022233235902.RIQ",
unicode="latin-1",
vipir_config=VIPIR_VERSION_MAP.configs[1],
)
ext = EchoExtractor(
sct=riq.sct, pulsets=riq.pulsets,
snr_threshold_db=3.0, min_height_km=60.0, max_height_km=1000.0,
).extract()
filt = IonogramFilter(
ep_max_deg=45.0,
dbscan_eps=1.0,
dbscan_min_samples=5,
temporal_enabled=False, # only one sounding
)
df_clean = filt.filter(ext) # accepts single extractor or list
print(filt.summary())
Multiple soundings (temporal coherence)¶
extractors = []
for fname in riq_file_list:
riq = RiqDataset.create_from_file(fname, ...)
ext = EchoExtractor(...).extract()
extractors.append(ext)
filt = IonogramFilter(
temporal_enabled=True,
temporal_min_soundings=3,
temporal_freq_bin_khz=50.0,
temporal_height_bin_km=50.0,
)
df_clean = filt.filter(extractors) # list of extractors
# df_clean has column "sounding_index" = 0, 1, 2, ...
Output DataFrame columns¶
The returned DataFrame contains all columns from
:meth:~pynasonde.vipir.riq.echo.EchoExtractor.to_dataframe plus:
| Column | Type | Description |
|---|---|---|
sounding_index |
int |
Index into the input extractor list (0 for single sounding) |
Statistics¶
After calling :meth:filter, the stats attribute is populated:
{
"rfi": {"input": N, "rejected": N_rfi},
"ep": {"input": N, "rejected": N_ep},
"multihop": {"input": N, "rejected": N_mh},
"dbscan": {"input": N, "rejected": N_db},
"temporal": {"input": N, "rejected": N_t}, # absent for single sounding
"summary": {"total_input": N, "total_kept": N_k},
}
Human-readable via :meth:summary:
print(filt.summary())
# Stage Input Rejected Kept Retention
# ─────────────────────────────────────────────────
# RFI 3206 12 3194 99.6 %
# EP 3194 281 2913 91.2 %
# Multi-hop 2913 87 2826 97.0 %
# DBSCAN 2826 179 2647 93.7 %
# ─────────────────────────────────────────────────
# Total 3206 559 2647 82.6 %
Algorithm notes¶
RFI height-spread test¶
h_iqr(f) = IQR of height_km at frequency f
flag f if h_iqr(f) > rfi_height_iqr_km AND count(f) >= rfi_min_echoes
Detection is based on the height spread of echoes at each frequency, not echo count.
Count-based detection is unreliable when max_echoes_per_pulset caps the per-frequency
echo count. RFI illuminates random range gates → height IQR ≈ 300–800 km; ionospheric
echoes cluster near E/F-layer heights → height IQR < 150 km.
Multi-hop geometry¶
A 2F echo appears at exactly twice the virtual height of the 1F echo at the same frequency because it undergoes an extra ground bounce:
The filter identifies the strongest echo in the lower half of the height
distribution at each frequency as the 1F reference, then flags any echo at
N × h*(1F) ± multihop_height_tol_km that is also weaker by at least
multihop_snr_margin_db.
DBSCAN feature scaling¶
Each feature column is independently normalised:
This makes the dbscan_eps parameter approximately equivalent to
"number of IQR units" of separation, giving it a physical interpretation
independent of the units of each parameter.
Temporal coherence cell occupancy¶
cell(i) = (freq_bin, height_bin) for echo i
occupancy(cell) = number of soundings containing ≥ 1 echo in cell
keep echo i iff occupancy(cell(i)) >= temporal_min_soundings
References¶
-
Zabotin N. A. et al. (2006). NeXtYZ: Three-dimensional electron density inversion for dynasonde ionograms. Radio Science 41(6). https://doi.org/10.1029/2005RS003352
-
Ester M. et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings.
Related¶
- Echo Extractor API
- Filter examples
examples/vipir/ionogram_filter_wi937.pyexamples/vipir/ionogram_filter_pl407.pyexamples/vipir/ionogram_filter_multi.py