What 10,000 synthetic spectrograms taught us about input design

A clean-room study on representation choices for condition monitoring — window length, mel bins, and the cost of getting them wrong.

Before you tune a model, you choose an input representation — and that choice usually matters more than the architecture. This is a clean-room study, on fully synthetic data, of how spectrogram parameters shape what an acoustic condition-monitoring model can and cannot learn.

Why synthetic

Synthetic generation lets you control the one thing field data never gives you: ground truth over the full parameter sweep. We generated ~10,000 spectrograms spanning fault types, severities, and noise levels, then held representation choices as the independent variable.

The three knobs

Window length — the time/frequency trade. Too short and you blur the spectral signature; too long and transients smear.
Mel bins — resolution where it counts. More is not free; it dilutes the signal and inflates the input.
Overlap — cheap robustness, up to a point of diminishing returns.

What the sweep showed

The headline: there is a broad plateau of good configurations and a few sharp cliffs. Most of the gains came from avoiding the cliffs, not from finding a single magic setting.

Window length had the steepest cliff. Past a threshold, transient faults became unrecoverable regardless of downstream capacity.
Mel bins showed diminishing returns quickly — beyond a modest count, extra bins mostly added cost.
Overlap helped robustness to windowing phase but saturated early.

Takeaway

Spend your first tuning budget on the input representation, and verify it on data where you control ground truth. A model cannot recover information the spectrogram threw away — and the cheapest way to learn that is on synthetic signals, before any field deployment.