SPANISH TO MEXICAN SIGN LANGUAGE GLOSSES CORPUS FOR NATURAL LANGUAGE PROCESSING TASKS

Spanish to Mexican Sign Language glosses corpus for natural language processing tasks

Spanish to Mexican Sign Language glosses corpus for natural language processing tasks

Blog Article

Abstract This work shares a dataset that contains Spanish (SPA) to Mexican Sign Language (MSL) glosses -transcripted MSL- pairs of sentences for a downstream task.The methodology used to prepare the shared dataset considered the construction of SPA-to-MSL corpus with click here a specific representation of the SPA language for MSL interpretation.The proposed corpus is a reference dataset for evaluating diverse neural machine translation (NMT) system variants.With the support of grammatical MSL books and advice from MSL interpreters, this study developed a 3000 sentence pairs SPA-to-MSL dataset.

The distribution of 3000 sentences in the corpus follows the linguistic composition of the SPA language.With the aim of testing the functionality of the corpus as a data source for NMT, two neural transformers were used to test the usability of the proposed dataset.The first NMT model uses a Helsinki-NLP SPA-to-SPA transformer developed by the Language read more Technologies Research Group at the University of Helsinki.The second NMT model considers a SPA-to-SPA pre-trained neural transformer presented as a BARTO approach.

Both evaluations considered a transfer learning strategy, which has been demonstrated to be effective for modeling low-resource languages.The NMT evaluation produced 91.13 and 94.23 BLEU that coincide with the state-of-the-art results in NMT for arbitrary languages.

Moreover, the evaluation of a professional MSL interpreter established 94% of effective translation of SPA sentences in MSL structures.

Report this page