Figure shows the Stages of the Mission
The Stages 1, 5 and 6 were high impact mission-critical events which is ideal for the development of the 110-hour Challenge Corpus from Apollo 11. In addition to this, additional 10 hours of previously unreleased Apollo-8 Mission audio, and 5 hours of Apollo-13 Mission audio will be made available for participants. With the quality of speech data varying between 0 and 20 dB SNR in this challenge corpus, the channel variability and complex interactions across all five channels of interest are mostly encapsulated in the Challenge Corpus. The multichannel data (from Apollo-11) are chosen from the major events given below:
These landmark events have been found to possess rich information from the speech and language perspective.five channels out of the 29 channels were picked since it had the the most activity over the selected events.
The personnel operating these five channels (channel owners/primary speakers) were in command of the most critical aspects of the mission, with additional backroom staff looping in for interactions with the primary owners of this channel.
For Fearless Steps Challenge Phase-04, CRSS-UTDallas are collaborating with National Institute of Standards and Technology (NIST) and Linguistic Data Consortium (LDC) to host the Challenge using NIST web-resources, and LDC data-resources. Challenge submissions and scoring for all participants will be made available through thee NIST web-portal. Researchers will be able to use training and development transcripts for tasks other than their focus and use these transcripts to improve their systems. Additionally, researchers will also be able to use any publicly available data to train (and develop/advance) their systems.
The distributions and statistics in the below sections are only for Apollo-11
The distribution of total speech content in each of the channels for every event has been given in the table above. Total speech content in the challenge corpus amounts to approximately 36 hours.
To make sure there is an equitable distribution of data into training, evaluation, and development sets for the challenge tasks, we have categorized the data based on noise levels, amount of speech content, and amount of silence. Due to the long silence durations, and based on importance of the mission, the speech activity density of the corpus varies throughout the mission.
The above table above is a general analysis of the Challenge data. Even though the Researchers are not provided with the channel information of Train, Test and Dev Sets, they may be able to make inferences by computing SNR’s for each file.
Figure 2: Probability Distributions of decision parameters for Train, Dev, and Eval sets.
All decision parameters for 100 hours of audio streams are calculated individually. These parameters are then normalized to generate degradation scores across the 100 hours. These scores are time-aligned across 5 channels and averaged to provide a single degradation score per 30-minute time chunk. The above figure shows the probability distributiions of decision parameters for Train, Dev, and Eval sets. This displays that even when the overall degradation across multiple channels are large, due to the variances in channel characteristics, the distributions for Train, Dev, and Eval sets have similar means, but differing distributions.
Table 3: Duration Statistics of audio segments for ASR track2.
Shorter utterance durations and larger variations in speaker durations are portrayed in Table 3, illustrating the general duration statistics of audio segments provided for the ASR track2 task.
Table 4: General statistics for the SID task.
Simialar statistics are shown for the SID task in Table 4. The mean, median, minimum, and maximum values for cumulative speaker durations, and individual speaker utterances are all expressed in seconds.
Table 5: Baseline Results for Development and Evaluation Sets.
Table 5 shows the baseline results for Fearless Steps Challenge: Phase 02 for Development and Evaluation Sets
Table 6: Comparison of the best systems developed for all FS-1 and FS-2 challenge tasks.
Table 6 shows Comparison of the best systems developed for all FS-1 and FS-2 challenge tasks. Relative improvement of top-ranked system per task in FS-2 over FS-1 is illustrated.