All evaluation activities will be conducted using a NIST maintained web platform shared with OpenSAT. Each participant will need to create an account on this web platform to register. This will allow them to perform various activities such as registering for the evaluation, signing the data license agreement, and uploading submissions.
After registering and agreeing to the NIST FSC-P3 Terms and Conditions, participants will be able to participate in the FSC P3. This page contains step-by-step instructions for creating the evaluation account, joining a site and team, selecting tasks, and signing the relevant agreements.
This registers your submission with the scoring server. Next, you need to upload the archive containing your system output.
At this point your archive will be uploaded to the NIST server and the following will occur:
System output for each track should be submitted as a .zip that expands into a single directory of txt files containing one txt file for each recording.
Systems should output their SAD as text (txt) files 9 A NIST defined File Format, the text files are text files containing one turn per line, each line containing nine tab-delimited fields:
Test | Test Definition File Name (Value: X) |
TestSet ID | contents of the id attribute TestSet tag (Value: X) |
Test ID | contents of the id attribute of the TEST tag (Value: X) |
Task | SAD <== a literal text string, without quotations (Value: SAD) |
File ID | contents of the id attribute of the File tag (Value: X) |
Interval start | an offset, in seconds from the start of the audio file for the start of the speech/non-speech interval (Value: floating number) |
Interval end | an offset, in seconds from the end of the audio file for the end of the speech/non-speech interval (Value: floating number) |
Type | In system output: speech/non-speech without quotation marks (Value: speech/nonspeech) In the reference: S/NS for speech/non-speech |
Confidence Score | (Optional) A value in the range 0 thorugh 1.0, with higher values indicating greater confidence about the presence/absence of speech |
Use the appropriate script to generate DCF Scores for FSC P3 Challenge SAD Task
bash scoreFS03_SAD.sh <ref_path> <hyp_path> <out_path>
Example submission packet can be found in the toolkit, link provided here
System output for each track should be submitted as a .zip that expands into a single directory of txt files containing one txt file with all results as shown in the example in the submission packet.
The SID output file should be a text file containing one test-segment per line, each line containing five space delimited fields
Test | Test Definition File Name |
Prediction 1 | fTop System SpeakerID Prediction |
Prediction 2 | 2nd Most Likely System SpeakerID Prediction |
Prediction 3 | 3rd Most Likely System SpeakerID Prediction |
Prediction 4 | 4th Most Likely System SpeakerID Prediction |
Prediction 5 | 5th Most Likely System SpeakerID Prediction |
Use the appropriate script to generate Top-3 Accuracy Scores for FSC P3 Challenge SAD Task
bash scoreFS03_SID.sh <ref_path> <hyp_path> <out_path>
Example submission packet can be found in the toolkit, link provided here
System output for each track should be submitted as a .zip that expands into a single directory of s Rich Transcription Time Marked (RTTM) files containing one RTTM file for each recording.
A NIST defined File Format, the RTTM files are text files containing one turn per line, each line containing nine space-delimited fields:
Type | segment type; should always by “SPEAKER” |
File ID | file name; basename of the recording minus extension (e.g., “FS P01 eval 023”) |
Channel ID | channel (1-indexed) that turn is on; should always be “1” |
Turn Onset | onset of turn in seconds from beginning of recording |
Turn Duration | duration of turn in seconds |
Orthography Field | should always by “<NA>” |
Speaker Type | should always by “<NA>” |
Speaker Name | name of speaker of turn; should be unique within scope of each file |
Confidence Score | (Optional) system confidence (probability) that information is correct; should always be <NA> |
Use the appropriate script to generate DER Scores for FSC P3 Challenge SAD Task
bash scoreFS03_SD.sh <ref_path> <hyp_path> <out_path>
Example submission packet can be found in the toolkit, link provided here
System output for each track should be submitted as a .zip that expands into a single directory of JSON format files containing one JSON file for each recording.
The transcriptions are provided in JSON format for each file as
Speaker ID | Token: “speakerID” |
Transcription | Token: “words” |
Conversational Label | Token: “conv” |
Start Time | Token: “startTime” |
End Time | Token: “endTime” |
Use the appropriate script to generate WER Scores for FSC P3 Challenge SAD Task
bash scoreFS03_ASR.sh <ref_path> <hyp_path> <out_path>
Example submission packet can be found in the toolkit, link provided here
System output for each track should be submitted as a .zip that expands into a single directory of JSON format files containing one JSON file for each recording.
The transcriptions are provided in JSON format for each file as
Speaker ID | Token: “speakerID” |
Transcription | Token: “words” |
Conversational Label | Token: “conv” |
Start Time | Token: “startTime” |
End Time | Token: “endTime” |
Use the appropriate script to generate Top-3 Accuracy for FSC P3 Challenge SAD Task
bash scoreFS03_Conv.sh <ref_path> <hyp_path> <out_path>
All Challenge participants are required to submit a conference paper(s) describing their systems (and reporting performance on Dev and Eval sets) to the ”FEARLESS STEPS CHALLENGE PHASE-3” Special Sessions section at ISCA INTERSPEECH-2021.