The Fearless Steps Initiative by UTDallas-CRSS led to the digitization, recovery, and diarization of 19,000 hours of original analog audio data, as well as the development of algorithms to extract meaningful information from this multichannel naturalistic data resource. As an initial step to motivate a stream-lined and collaborative effort from the speech and language community, UTDallas-CRSS is hosting a series of progressively complex tasks to promote advanced research on naturalistic “Big Data” corpora. This began with ISCA INTERSPEECH-2019: "The FEARLESS STEPS Challenge: Massive Naturalistic Audio (FS-#1)". This first edition of this challenge encouraged the development of core unsupervised/semi-supervised speech and language systems for single-channel data with low resource availability, serving as the “First Step” towards extracting high-level information from such massive unlabeled corpora.
As a natural progression following the successful Inaugural Challenge FS#1, the FEARLESS STEPS Challenge Phase-#2 focuses on the development of single-channel supervised learning strategies. This FS#2 provides 80 hours of ground-truth data through Training and Development sets, with an additional 20 hours of blind-set Evaluation data. Based on feedback from the Fearless Steps participants, additional Tracks for streamlined speech recognition and speaker diarization have been included in the FS#2. The results for this Challenge will be presented at the ISCA INTERSPEECH-2020 Special Session. We encourage participants to explore any and all research tasks of interest with the Fearless Steps Corpus – with suggested Task Domains listed below. Research participants can, however, also utilize the FS#2 corpus to explore additional problems dealing with naturalistic data, which we welcome as part of the special session.
Check out this section for announcements!!
|Registration Period||February 11, 2020 to March 15, 2020|
|Challenge Start date||February 5, 2020|
|Scoring Toolkits, Evaluation Rules and Baseline Results Release||April 12, 2020|
|System Submission Opens*||May 1, 2020|
|System Submission Deadline||May 13, 2020|
|Final Results Released for all tasks||May 14, 2020 (see Leaderboard)|
|INTERSPEECH-2020 FEARLESS STEPS Special Session Submission deadline||May 8, 2020 (see IS-2020 dates)|
|Paper Revision Deadline||May 15, 2020 (see IS-2020 dates)|
*Further details regarding the system submission will be disclosed on the website and through email correspondence on April 6th,2020.
To Register, Please Click the button below!
The Entire Fearless Steps Corpus consisting of over 19,000 hours of audio from the Apollo-11 Mission is publicly available under the 'NASA Media Usage Guidelines'. For access to the complete 19,000 hours corpus, Please use the website for the Fearless Steps Challenge Phase-01. or directly contact us at FearlessSteps@utdallas.edu.
For further questions or inquiries, Please do not hesitate to contact us,
For information on the Fearless steps Challenge Phase-01, Pleaseclick here
The Fearless Steps corpus is derived fom a five-year NSF CISE funded project awarded to CRSS at the University of Texas at Dallas. UTDallas-CRSS established the hardware/software solutions to digitize and diarize 19,000 hrs of NASA Apollo data. All core Apollo data released as part of this challenge has been approved for public release by NASA Export Control. The full audio corpus is also available through UTDallas-CRSS. Any reference to or listing of organizations other than UTD is for information only; it does not imply recommendation or endorsement by UTDallas-CRSS nor does it imply that the products mentioned are necessarily the best available for that purpose.
All the conversations between Astronauts and and Mission Control Personnel during the Apollo-11 Mission were recorded by NASA. The tireless efforts of CRSS-UTD transcribers and researchers contributed to the shaping of this enormous amounts of data into a well-defined corpus to address various speech and language tasks for naturalistic audio, a portion of which is now made publicly available to the speech community through this Challenge via a creative commons license.
Note:The Creative Commons License is restricted to the efforts made by CRSS-UTD, which involves 100 hours of Challenge Corpus (audio) data sampled from 8Khz, along with its meta-data generated separately. The license also covers all the scripts which were used in the preparation of the corpus and systems built to support the tasks in this Challenge, along with the webpages developed to host the Challenge.
NASA content - images, audio, video, and computer files used in the rendition of 3-dimensional models, such as texture maps and polygon data in any format - generally are not copyrighted. You may use this material for educational or informational purposes, including photo collections, textbooks, public exhibits, computer graphical simulations and Internet Web pages. This general permission extends to personal Web pages.
News outlets, schools, and text-book authors may use NASA content without needing explicit permission. NASA content used in a factual manner that does not imply endorsement may be used without needing explicit permission. NASA should be acknowledged as the source of the material. NASA occasionally uses copyrighted material by permission on its website. Those images will be marked copyright with the name of the copyright holder. NASA's use does not convey any rights to others to use the same material. Those wishing to use copyrighted material must contact the copyright holder directly.
FEARLESS STEPS CHALLENGEbyAditya Joglekar, John H.L. Hansenis licensed under aCreative Commons Attribution 4.0 International License
Based on a work athttps://www.nasa.gov/mission_pages/apollo/apollo-11.html
Permissions beyond the scope of this license may be available athttps://www.nasa.gov/multimedia/guidelines/index.html
For Additional Information regarding Commercial and Non-Commercial Use:
Please visit: https://www.nasa.gov/multimedia/guidelines/index.html
This project was supported in part by AFRL under contractFA8750-15-1-0205, NSF-CISE Project 1219130, and partially by the University of Texas at Dallas from the DistinguishedUniversity Chair in Telecommunications Engineering held by J.H. L. Hansen. We would also like to thank Tatiana Korelsky and the National Science Foundation (NSF) for their support on this scientific and historical project. A special Thanks to Katelyn Foxworth for leading the ground-truth development efforts for the FS-02 Challenge Corpus.