Signalyze header gif
Home > Docs > On pitch extraction last updated Tuesday, December 4, 2001

On Pitch Extraction

This is a contents of a message sent to the SigList by Eric Keller, the author of Signalyze. It remains about as good help with pitch extraction as we can get...

Message dated April 24, 1996


Dear SigListers,

I have regularly been contacted by Signalyze users about difficulties in pitch extraction. Generally I've found some logical settings that facilitate extraction, or some cogent reason why an extraction is simply impossible. Some of these conditions are described in the Manual.

Here is a sequence of examinations that I apply to signals where extraction is difficult with the usual settings:

The first thing I do is to check for the presence of a strong fundamental cycle. Some voices simply don't have one, or then it is very irregular. To perform this check, zoom into a vocalic portion of the signal till you see the detailed waveform. Measure a few major cycle durations. You should find similar durations between major peaks or between major bottoms. If you don't, you may have some genuine reason why a pitch extraction is simply impossible: neuropathology, excessively noisy signal, overmodulation on input, incompatible frequency ranges, highly irregular voices, telephone speech with suppression of frequencies below 1 kHz, etc.

Now you are sure that your signal really does have a fundamental frequency. But is it in the right range? As the Manual says, the pitch extraction routines in Signalyze work best with frequencies between 80 and 600 Hz. If your signal is non-human (music, animal sounds), the basic pitch may be much higher than that. Under these circumstances, you may have to artificially reduce the declared sampling rate to acceptable levels by editing the sampling rate shown in the General Control Panel, above all the signals (shift-click on the number with the "Hz"). Perform the converse operation if you need to work with frequencies lower than 100 Hz. Once you've done this adjustment, peak-to-peak/bottom-to-bottom readings should give you values between 80 and 600 Hz.

Now you have a reasonable input signal. The next thing you have to make sure of is that the extraction routine can actually operate on your signal. For technical reasons, the FFT-comb routine runs best if your sampling frequency is around 10 kHz. The temporal structure analysis also tends to "get confused less often" if you use lower sampling frequencies. If your original sampling frequency is much greater than 10 kHz (20 kHz, 44 kHz), you should use the Complex Transformation to reduce the sampling frequency to around 10 kHz.

Now you presumably have an input signal with fundamental cycles that can actually be analyzed. At this point, you should play around with the pitch extraction settings. For most cases, the FFT comb routine is the routine of choice (some exceptions are listed in the Manual). Try some or all of the following:

  • Try an extraction every 5 ms, or even greater extraction densities (e.g., every 2 ms).
  • Apply an output filter (e.g., averaging over 8, 12 or 16 extractions).
  • Experiment with the threshold setting. The right setting has been found if the voiced portions of speech give extractions, and other portions show up as white space. If you get lines tearing off upwards or downwards at the end of your extractions, reduce the threshold.

Also, remember to use "sampled dots" in Display Setup for your pitch extraction, particularly when reading in pitch extractions you've saved as signals.

The FFT-comb routine depends for its operation on the presence of a regular harmonic structure. To examine this, perform a narrow-band spectrum, and verify (a) that there are several clearly defined peaks in the signal, and (b) that the peaks are situated about at even multiples of the fundamental frequency (e.g. if Fo=120, then harmonics should be found at about 240, 360, 480, etc. Hz). Not all (speech) sounds have a clear harmonic structure. I've observed strongly deviant harmonic structures in dysarthric speech, and at least certain types of bird song show no harmonics at all, only a fundamental frequency. In those cases, the FFT-comb will necessarily fail, while the other two routines may work quite nicely.

Furthermore, the FFT routine is optimized for the Fo mid-frequency band (120-300 Hz). If you experience difficulties in extracting in very high or very low frequencies, try reducing the declared sampling rate. In the case of a 550 Hz signal, for example, better results might be obtained by redeclaring the sampling rate to half of the real rate (e.g., 5 kHz instead of 10 kHz).


If the above does not help you get a good pitch trace from your signal, please contact Signalyze support and we'll see if we can come up with some more hints.

Signalyze footer gif