What 10,000 synthetic spectrograms taught us about input design
A clean-room study on representation choices for condition monitoring — window length, mel bins, and the cost of getting them wrong.
Before you tune a model, you choose an input representation — and that choice usually matters more than the architecture. This is a clean-room study, on fully synthetic data, of how spectrogram parameters shape what an acoustic condition-monitoring model can and cannot learn.
Why synthetic
Synthetic generation lets you control the one thing field data never gives you: ground truth over the full parameter sweep. We generated ~10,000 spectrograms spanning fault types, severities, and noise levels, then held representation choices as the independent variable.
The three knobs
- Window length — the time/frequency trade. Too short and you blur the spectral signature; too long and transients smear.
- Mel bins — resolution where it counts. More is not free; it dilutes the signal and inflates the input.
- Overlap — cheap robustness, up to a point of diminishing returns.
What the sweep showed
The headline: there is a broad plateau of good configurations and a few sharp cliffs. Most of the gains came from avoiding the cliffs, not from finding a single magic setting.
- Window length had the steepest cliff. Past a threshold, transient faults became unrecoverable regardless of downstream capacity.
- Mel bins showed diminishing returns quickly — beyond a modest count, extra bins mostly added cost.
- Overlap helped robustness to windowing phase but saturated early.
Takeaway
Spend your first tuning budget on the input representation, and verify it on data where you control ground truth. A model cannot recover information the spectrogram threw away — and the cheapest way to learn that is on synthetic signals, before any field deployment.
Clean-room write-up — public / synthetic data only.