Spanish to Mexican Sign Language glosses corpus for natural language processing tasks
Spanish to Mexican Sign Language glosses corpus for natural language processing tasks
Blog Article
Abstract This work shares a dataset that contains Spanish (SPA) to Mexican Sign Language (MSL) glosses -transcripted MSL- pairs of sentences for a downstream task.The methodology used to prepare the shared dataset considered the construction of SPA-to-MSL corpus with click here a specific representation of the SPA language for MSL interpretation.The proposed corpus is a reference dataset for evaluating diverse neural machine translation (NMT) system variants.With the support of grammatical MSL books and advice from MSL interpreters, this study developed a 3000 sentence pairs SPA-to-MSL dataset.
The distribution of 3000 sentences in the corpus follows the linguistic composition of the SPA language.With the aim of testing the functionality of the corpus as a data source for NMT, two neural transformers were used to test the usability of the proposed dataset.The first NMT model uses a Helsinki-NLP SPA-to-SPA transformer developed by the Language read more Technologies Research Group at the University of Helsinki.The second NMT model considers a SPA-to-SPA pre-trained neural transformer presented as a BARTO approach.
Both evaluations considered a transfer learning strategy, which has been demonstrated to be effective for modeling low-resource languages.The NMT evaluation produced 91.13 and 94.23 BLEU that coincide with the state-of-the-art results in NMT for arbitrary languages.
Moreover, the evaluation of a professional MSL interpreter established 94% of effective translation of SPA sentences in MSL structures.