Deepfake Detection for a Trustworthy Speech Communication

项目来源

日本学术振兴会基金(JSPS)

项目主持人

MAWALIM CandyOlivia

项目受资助机构

北陸先端科学技術大学院大学

项目编号

25K21245

立项年度

2025

立项时间

未公开

项目级别

国家级

研究期限

未知 / 未知

受资助金额

4810000.00日元

学科

ヒューマンインタフェースおよびインタラクション関連

学科代码

未公开

基金类别

若手研究

关键词

deepfake ; spoof attacks ; speaker verification ; multilingual

参与者

未公开

参与机构

北陸先端科学技術大学院大学,先端科学技術研究科

项目标书摘要:Outline of Research at the Start:Advancements in deep learning have enabled the creation of highly realistic synthetic audio(deepfakes),posing threats to voice privacy and security.This research aims to address the limitations in existing research in deepfake detection by analyzing the physiological and acoustic characteristics of speech production mechanism that is unique from deepfakes.Robust deepfake detection methods will be developed that capable of handling diverse linguistic data,providing clear explanations for detection outcomes,and adapting to the evolving deepfake attacks。

  • 排序方式:
  • 1
  • /
  • 1.Privacy-aware speaker trait and multimodal features relationship analysis in job interviews.

    • 关键词:
    • Human-computer interaction; Privacy protection; Speaker traits; Voice anonymization
    • Mawalim, Candy Olivia;Leong, Chee Wee;Okada, Shogo
    • 《Scientific reports》
    • 2026年
    • 期刊

    As the use of speech data for applications like emotion detection and health profiling grows, so do the privacy risks associated with voice recordings that can reveal sensitive speaker traits. This study investigates voice anonymization methods designed to protect speaker identity while maintaining essential speech characteristics for accurate trait inference, specifically within the context of job interviews. Our experiments show that while anonymization alters several acoustic parameters, the anonymized speech from signal processing-based methods remains suitable for overall trait assessment, with performance comparable to original speech. The phase vocoder-based method, in particular, offers modest privacy gains with an acceptable trade-off in utility, especially in scenarios with minimal attack vectors. In contrast, a neural audio codec-based method altered prosodic features critical for speaker trait estimation, slightly reducing performance in this specific task. Despite this, when carefully configured, this method provides greater privacy and generally preserves utility for speech recognition and quality assessment, even under semi-informed attack scenarios. © 2026. The Author(s).

    ...
  • 排序方式:
  • 1
  • /