Pseudo-Dynamic Preservation and Elucidation of Neural Processing of Endangered Languages Based on Natural Discourse Corpora with Physiological Indices

项目来源

日本学术振兴会基金(JSPS)

项目主持人

小泉 政利

项目受资助机构

東北大学

项目编号

24H00085

立项年度

2024

立项时间

未公开

研究期限

未知 / 未知

项目级别

国家级

受资助金额

47580000.00日元

学科

文学、言語学およびその関連分野

学科代码

未公开

基金类别

基盤研究(A)

关键词

オーストロネシア語族 ; マヤ語族 ; 日流語族 ; 危機言語 ; 脳機能計測 ;

伊藤彰則;那須川訓也;大塚祐子;小野創;大滝宏一;里麻奈美;木山幸子;安永大地;山田真寛;大関洋平;新国佳祐;矢野雅貴;宮川創;遊佐麻友子

参与机构

東北学院大学;上智大学;津田塾大学;中京大学;沖縄国際大学;金沢大学;大学共同利用機関法人人間文化研究機構国立国語研究所;東京大学;新潟青陵大学;東京都立大学;筑波大学;弘前学院大学

项目标书摘要:Outline of Research at the Start:現在する言語の大多数が消滅の危機に瀕しており、言語と文化の保存・復興は喫緊の課題である。この問題を解決するために、本研究では、「生理指標付き自然談話コーパス」と「AI対話システム」を活用した「危機言語の擬似動態保存」という斬新な方法を提案し実施する。また、「生理指標付き自然談話コーパス」と「行動実験・視線計測実験・脳機能計測実験」を駆使して、「少数民族の言語の脳内処理過程の解明」に取り組む。

  • 排序方式:
  • 1
  • /
  • 1.Evaluation ofDifferent Training Strategies andRecognizers inLow Resource Speech Recognition Using Wav2vec2.0

    • 关键词:
    • Decoding;Learning algorithms;Learning systems;Self-supervised learning;Signal encoding;Speech coding;Speech communication;Supervised learning;Automatic speech recognition;Character error rates;Learning frameworks;Learning strategy;Low resource languages;Low-resource speech recognition;Minority languages;Training strategy;Transformer;Wav2vec
    • Koshikawa, Takaki;Ito, Akinori;Nose, Takashi
    • 《17th International Conference on Machine Learning and Computing, ICMLC 2025》
    • 2025年
    • February 14, 2025 - February 17, 2025
    • Guangzhou, China
    • 会议

    Automatic Speech Recognition (ASR) is crucial for preserving minority languages, promoting inclusivity, and supporting education. Wav2vec2.0 Model, pre-trained through self-supervised learning, is effective for low-resource language speech recognition. Thus, this study investigates different learning strategies, recognizers, and frameworks to improve ASR performance for low-resource languages. First, we compared five learning strategies for low-resource language speech recognition using the wav2vec2.0 model. The Freeze-Transformer strategy, which fixes the CNN and low-layer Transformer blocks, achieved the lowest Character Error Rate (CER). Next, we evaluated five types of recognizers, including fully connected layers, MLP, RNN, LSTM, and GRU. The bi-GRU recognizer performed the best, achieving the lowest CER. Finally, we tested an Encoder-Decoder model with wav2vec2.0 as the encoder and a Transformer-decoder as the decoder. The results showed that the recognition performance did not improve with this model, even with a large amount of training data. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

    ...
  • 2.Improving Speaker Consistency in Speech-to-Speech Translation Using Speaker Retention Unit-to-Mel Techniques

    • 关键词:
    • Semantics;Speech enhancement;Translation (languages);End to end;French-english;Semantic content;Semantics Information;Speaker specific informations;Speech-to-speech translation;Synthesized speech;Voice quality;Waveforms
    • Zhou, Rui;Ito, Akinori;Nose, Takashi
    • 《2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024》
    • 2024年
    • December 3, 2024 - December 6, 2024
    • Macau, China
    • 会议

    We propose a Speaker-Consistent Speech-to-Speech Translation (SC-S2ST) system that effectively retains speaker-specific information. While the paradigm of Speech-to-Unit Translation (S2UT) followed by Unit-to-Waveform Vocoder has become a mainstream for End-to-End S2ST systems, due to the substantial semantic content carried by discrete units, this approach primarily captures semantic information and often results in synthesized speech that lacks speaker-specific characteristics such as accent and individual voice qualities. Existing S2UT systems with style transfer face the issue of high inference latency. To address this limitation, we introduced a Speaker-Retention Unit-to-Mel (SR-UTM) framework designed to capture and preserve speaker-specific information. We conducted experiments on the CVSS-C and CVSS-T corpora for Spanish-English and French-English translation tasks. Our approach achieved BLEU scores of 16.10 and 21.68, which are comparable to those of the baseline S2UT system. Furthermore, our SC-S2UT system excelled in preserving speaker similarity. The speaker similarity experiments showed that our method effectively retains speaker-specific information without significantly increasing inference time. These results confirm that our primary approach successfully achieve speaker-consistent speech-to-speech translation. © 2024 IEEE.

    ...
  • 排序方式:
  • 1
  • /