NIST Speaker Recognition Evaluations

1 minute read

Looking at the 2003 NIST SR evaluations, while we’re too late to enter this year there is some useful data available, for example the Automatically Generated word transcripts for some of their training data might be interesting to look at.

In their Rich Transcription track there is an interesting task which might be nice to attempt: meta data extraction. In the last evaluation it was just “Who Spoke When” annotation but they intend to add more target metadata in future rounds. This is quite close to some of our goals for the Meeting Room Project.

Quote from the NIST TREC-9 SDR page:

The results of the TREC-9 2000 SDR evaluation presented at TREC on November 14, 2000 showed that retrieval performance for sites on their own recognizer transcripts was virtually the same as their performance on the human reference transcripts. Therefore, retrieval of excerpts from broadcast news using automatic speech recognition for transcription was deemed to be a solved problem - even with word error rates of 30%.


And there’s more…Transcripts of meeting room data are available which seem to be manual. Meeting content doesn’t seem too exciting 🙂 but gives some idea of dialogue structure etc. in this kind of data.