System Description Document
This document serves as an example system description document, which is being used to
provide details regarding the Submission Packet for the Fearless Steps Challenge Phase-02.
Contents of the Submission Packet:
- System Description Document: A document describing the system should be included as
a .pdf file in the base submission packet directory (same location as in the example
submission packet provided). More details on the description provided in FS01 Eval Plan
- Appendix D. Participants can also send their Interspeech paper drafts as a System
Description Document (this document will not be shared by the FS Team to anyone at
any time). If participants are sending multiple submissions for the same task based on
the system described in the paper, the same document can be used in multiple
submission packets.
- Dev and Eval : Participants are required to send their system output results on both the
Dev and Eval sets for a given task. The Dev folder should only include system output
results for a given challenge task on the Dev set provided for that task. The Eval folder
should only include system output results for a given challenge task on the Eval set
provided for that task. All system output files should have the same filename (with
different extension) as its corresponding audio stream file for tasks using audio streams
(SAD, SD_track_1, SD_track2, and ASR_track1). For tasks using audio segments
(SID, ASR_track2). For correct formatting examples, please refer to the ./egs/ directory
of the FS02 Scoring Toolkit. Additionally, details regarding the system output files are
also provided by running the shell scripts in ./scripts/ with no input arguments.
The following number of files will be expected in both folders (per task):
- Speech Activity Detection (SAD): All the files should have the .txt extension
and must follow the format provided in the FS01 Eval Plan Appendix A.3.
i. Dev: 30 files, titled FS02_dev_001.txt … to … FS02_dev_030.txt
ii. Eval: 40 files, titled FS02_eval_001.txt … to … FS02_eval_040.txt
- Speaker Diarization (SD_track1, SD_track2): Both tracks will have the same
submission format. All the files should have the .rttm extension and must follow
the format provided in the above mentioned Eval Plan, Appendix A.1.
i. Dev: 30 files, titled FS02_dev_001.rttm … to … FS02_dev_030.rttm
ii. Eval: 40 files, titled FS02_eval_001.rttm … to … FS02_eval_040.rttm
- Speaker Identification (SID): The 2 files should have the .txt extension and
must follow the format provided in the FS01 Eval Plan Appendix A.4.
i. Dev: 1 file, titled FS02_SID_uttID2spkID_Dev.txt which contains 6373 SID
utterance IDs with their 5 system predictions.
ii. Eval: 40 files, titled FS02_SID_uttID2spkID_Eval.txt which contains 8466
SID utterance IDs with their 5 system predictions.
- Automatic Speech Recognition Track-1 (ASR_track1): All the files should
have the .json extension and must follow the format provided in the FS01 Eval
Plan Appendix A.2. All .json files must have the respective fields in every
object/struct: “words”, “startTime”, and “endTime”.
i. Dev: 30 files, titled FS02_dev_001.json … to … FS02_dev_030.json
ii. Eval: 40 files, titled FS02_eval_001.json … to … FS02_eval_040.json
- Speaker Identification (SID): The 2 files should have the NO file extension and
must follow the format provided by Kaldi, describes in the “text” section of the
Kaldi Data Prep page.
i. Dev: 1 file, titled FS02_ASR_track2_transcriptions_Dev which contains
9203 utterance IDs with their corresponding transcriptions.
ii. Eval: 40 files, titled FS02_ASR_track2_transcriptions_Eval which contains
13714 utterance IDs with their corresponding transcriptions.
Additional details regarding the Evaluation and Submission Rules are provided to participants in
a separate document. Please feel free to reach out to FearlessSteps@utdallas.edu for any
queries or clarifications.