Fearless Steps Challenge:
Phase III

Overview

All evaluation activities will be conducted using a NIST maintained web platform shared with OpenSAT. Each participant will need to create an account on this web platform to register. This will allow them to perform various activities such as registering for the evaluation, signing the data license agreement, and uploading submissions.

After registering and agreeing to the NIST FSC-P3 Terms and Conditions, participants will be able to participate in the FSC P3. This page contains step-by-step instructions for creating the evaluation account, joining a site and team, selecting tasks, and signing the relevant agreements.

Submitting an archive via the NIST dashboard

To sign up for an evaluation account, navigate to the OpenSAT Series FSC P3 page on the OpenSAT site and follow the steps below:

  • To submit output of a system for scoring, log into your participant account and select Dashboard from the top right of the page
  • Navigate to the Submission Management panel and click on the task that you wish to submit to. This will open the Submissions page
  • Click Add new system
  • Select the system type
  • Select primary if you wish for the scoring results to be displayed on the leaderboard
  • Select contrastive otherwise
  • Enter a name for your system
  • Click Submit

This registers your submission with the scoring server. Next, you need to upload the archive containing your system output.

  • Locate your submission on the Submissions page. As the entries on this page are displayed in ascending order of submission date, it will be at the very bottom.
  • Find your submission and Click Upload.
  • Select the output you want to upload.
  • Click Submit.

At this point your archive will be uploaded to the NIST server and the following will occur:

  • A unique submission ID will be generated; this will be used to track your submission
  • Your submission will be validated
  • If the submission passes validation, it will be scored

  • When the server finishes scoring your submission, it will display the status DONE. To access the scoring results, click on this status.
  • If for any reason scoring failed, it will display a status beginning with FAIL. Clicking on this status will open the error log from the scoring script, which can be used to debug your submission.

Submission for each Task

Speech Activity Detection

System output for each track should be submitted as a .zip that expands into a single directory of txt files containing one txt file for each recording.
Systems should output their SAD as text (txt) files 9 A NIST defined File Format, the text files are text files containing one turn per line, each line containing nine tab-delimited fields:

Test Test Definition File Name (Value: X)
TestSet ID contents of the id attribute TestSet tag (Value: X)
Test ID contents of the id attribute of the TEST tag (Value: X)
Task SAD <== a literal text string, without quotations (Value: SAD)
File ID contents of the id attribute of the File tag (Value: X)
Interval start an offset, in seconds from the start of the audio file for the start of the speech/non-speech interval (Value: floating number)
Interval end an offset, in seconds from the end of the audio file for the end of the speech/non-speech interval (Value: floating number)
Type In system output: speech/non-speech without quotation marks (Value: speech/nonspeech)
In the reference: S/NS for speech/non-speech
Confidence Score (Optional) A value in the range 0 thorugh 1.0, with higher values indicating greater confidence about the presence/absence of speech

Use the appropriate script to generate DCF Scores for FSC P3 Challenge SAD Task

USAGE:

bash scoreFS03_SAD.sh <ref_path> <hyp_path> <out_path>

  • ref_path: Reference (Ground Truth) Directory Path
  • hyp_path: Hypothesis (System Output) Directory Path
  • out_path: File Path to write DCF Scores

Example submission packet can be found in the toolkit, link provided here


Speaker Identification

System output for each track should be submitted as a .zip that expands into a single directory of txt files containing one txt file with all results as shown in the example in the submission packet.
The SID output file should be a text file containing one test-segment per line, each line containing five space delimited fields

Test Test Definition File Name
Prediction 1 fTop System SpeakerID Prediction
Prediction 2 2nd Most Likely System SpeakerID Prediction
Prediction 3 3rd Most Likely System SpeakerID Prediction
Prediction 4 4th Most Likely System SpeakerID Prediction
Prediction 5 5th Most Likely System SpeakerID Prediction

Use the appropriate script to generate Top-3 Accuracy Scores for FSC P3 Challenge SAD Task

USAGE:

bash scoreFS03_SID.sh <ref_path> <hyp_path> <out_path>

  • ref_path: Reference (Ground Truth) Directory Path
  • hyp_path: Hypothesis (System Output) Directory Path
  • out_path: File Path to write Top-3 Accuracy Score

Example submission packet can be found in the toolkit, link provided here


Speaker Diarization

System output for each track should be submitted as a .zip that expands into a single directory of s Rich Transcription Time Marked (RTTM) files containing one RTTM file for each recording.
A NIST defined File Format, the RTTM files are text files containing one turn per line, each line containing nine space-delimited fields:

Type segment type; should always by “SPEAKER”
File ID file name; basename of the recording minus extension (e.g., “FS P01 eval 023”)
Channel ID channel (1-indexed) that turn is on; should always be “1”
Turn Onset onset of turn in seconds from beginning of recording
Turn Duration duration of turn in seconds
Orthography Field should always by “<NA>”
Speaker Type should always by “<NA>”
Speaker Name name of speaker of turn; should be unique within scope of each file
Confidence Score (Optional) system confidence (probability) that information is correct; should always be <NA>

Use the appropriate script to generate DER Scores for FSC P3 Challenge SAD Task

USAGE:

bash scoreFS03_SD.sh <ref_path> <hyp_path> <out_path>

  • ref_path: Reference (Ground Truth) Directory Path
  • hyp_path: Hypothesis (System Output) Directory Path
  • out_path: File Path to write DER Scores

Example submission packet can be found in the toolkit, link provided here


Automatic Speech Recognition

System output for each track should be submitted as a .zip that expands into a single directory of JSON format files containing one JSON file for each recording.
The transcriptions are provided in JSON format for each file as .json. The JSON file includes the following pieces of information for each utterance:

Speaker ID Token: “speakerID”
Transcription Token: “words”
Conversational Label Token: “conv”
Start Time Token: “startTime”
End Time Token: “endTime”

Use the appropriate script to generate WER Scores for FSC P3 Challenge SAD Task

USAGE:

bash scoreFS03_ASR.sh <ref_path> <hyp_path> <out_path>

  • ref_path: Reference (Ground Truth) Directory Path
  • hyp_path: Hypothesis (System Output) Directory Path
  • out_path: File Path to write Overall WER Score

Example submission packet can be found in the toolkit, link provided here


Conversational Analysis

System output for each track should be submitted as a .zip that expands into a single directory of JSON format files containing one JSON file for each recording.
The transcriptions are provided in JSON format for each file as .json. The JSON file includes the following pieces of information for each utterance:

Speaker ID Token: “speakerID”
Transcription Token: “words”
Conversational Label Token: “conv”
Start Time Token: “startTime”
End Time Token: “endTime”

Use the appropriate script to generate Top-3 Accuracy for FSC P3 Challenge SAD Task

USAGE:

bash scoreFS03_Conv.sh <ref_path> <hyp_path> <out_path>

  • ref_path: Reference (Ground Truth) Directory Path
  • hyp_path: Hypothesis (System Output) Directory Path
  • out_path: File Path to write Top-3 Accuracy

Evaluation Rules

  1. Site registration will be required in order to participate
  2. Researchers who register but do not submit a system to the Challenge are considered withdrawn from the Challenge
  3. Researchers may use any audio and transcriptions to build their systems with the exception of data mentioned in the Evaluation plan
  4. Only the audio for the blind eval set (20 hours) will be released. Researchers are expected to run their systems on the blind eval set.
  5. Investigation of the evaluation data prior to submission of all systems outputs is not allowed. Human probing is prohibited.

All Challenge participants are required to submit a conference paper(s) describing their systems (and reporting performance on Dev and Eval sets) to the ”FEARLESS STEPS CHALLENGE PHASE-3” Special Sessions section at ISCA INTERSPEECH-2021.

Evaluation Protocol

  • The entire Fearless Steps Corpus (consisting of over 11,000 hours of audio from the Apollo-11 Mission) including the 100 hours is publicly available and requires no additional license to use the data.
  • There is no cost to participate in the Fearless Steps evaluation. Development data and evaluation data will be freely made available to registered participants.
  • At least one participant from each team must register on the Fearless Steps Challenge 2021.
  • System output submissions will be sent to the official Fearless Steps correspondence email-id.
  • Participants can submit at most 2 system submissions per day.
  • Results of submitted systems will be mailed to the registered email-id within a week of the submission.
  • It is required that participants agree to process the data in accordance with the following rules.