特異データを意識した学習データの人工合成

项目来源

日本学术振兴会基金(JSPS)

项目主持人

森本康彦

项目受资助机构

広島大学

立项年度

2025

立项时间

未公开

项目编号

25K15130

项目级别

国家级

研究期限

未知 / 未知

受资助金额

4550000.00日元

学科

データベース関連

学科代码

未公开

基金类别

基盤研究(C)

关键词

生成AI ; 表データ生成 ; 異常値生成 ; 特異値生成 ; プライバシー保護 ;

参与者

未公开

参与机构

広島大学，先進理工系科学研究科

项目标书摘要：Outline of Research at the Start:本研究では,学習データとして利用するための表形式のデータの人工生成法の研究を行う.データ生成では,敵対的生成ネットワーク(GAN)を使った手法が有効であるが,本研究で注目している異常値や特異値などのレアデータにおいてプライバシーリスクが指摘されている.そのリスクを回避する手法として差分プライバシー(DP)が有力であるが,レアデータを保護するためには加えるべきノイズを大きくしなければならず,データの価値が大きく損なわれる.そこで本研究では,レアデータの価値とプライバシーを両立できる新しいデータ生成技術の研究開発を行う。

排序方式：时间相关性
显示方式：列表摘要

1.Privacy-Aware Table Data Generation by Adversarial Gradient Boosting Decision Tree

关键词：
adversarial learning; decision trees; tree ensembles; privacy evaluation;K-ANONYMITY; MODEL

Jiang, Shuai;Iwata, Naoto;Kamei, Sayaka;Alam, Kazi Md. Rokibul;Morimoto, Yasuhiko
《MATHEMATICS》
2025年
13卷
15期
期刊

Privacy preservation poses significant challenges in third-party data sharing, particularly when handling table data containing personal information such as demographic and behavioral records. Synthetic table data generation has emerged as a promising solution to enable data analysis while mitigating privacy risks. While Generative Adversarial Networks (GANs) are widely used for this purpose, they exhibit limitations in modeling table data due to challenges in handling mixed data types (numerical/categorical), non-Gaussian distributions, and imbalanced variables. To address these limitations, this study proposes a novel adversarial learning framework integrating gradient boosting trees for synthesizing table data, called Adversarial Gradient Boosting Decision Tree (AGBDT). Experimental evaluations on several datasets demonstrate that our method outperforms representative baseline models regarding statistical similarity and machine learning utility. Furthermore, we introduce a privacy-aware adaptation of the framework by incorporating k-anonymization constraints, effectively reducing overfitting to source data while maintaining practical usability. The results validate the balance between data utility and privacy preservation achieved by our approach.

...

2.InstGAN: Instant Actor-Critic-Driven GAN for De Novo Molecule Generation and Property Optimization

关键词：
Drug discovery;Drug products;Generative adversarial networks;Molecules;Monte Carlo methods;Actor critic;Adversarial networks;Drug discovery;Generative model;Inherent instability;Molecular representations;Networks learning;Properties optimizations;Reinforcement learning algorithms;Tree-search

Tang, Huidong;Li, Chen;Kamei, Sayaka;Yamanishi, Yoshihiro;Morimoto, Yasuhiko
《34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025》
2025年
August 16, 2025 - August 22, 2025
Montreal, QC, Canada
会议

Deep generative models, such as generative adversarial networks (GANs), have been employed for de novo molecular generation in drug discovery. Most prior studies have utilized reinforcement learning (RL) algorithms, particularly Monte Carlo tree search (MCTS), to handle the discrete nature of molecular representations in GANs. However, due to the inherent instability in training GANs and RL models, along with the high computational cost associated with MCTS sampling, MCTS RL-based GANs struggle to scale to large chemical databases. To tackle these challenges, this study introduces a novel GAN based on actor-critic RL with instant and global rewards, called InstGAN, to generate molecules at the token-level with multi-property optimization. Furthermore, maximized information entropy is leveraged to alleviate the mode collapse. The experimental results demonstrate that InstGAN outperforms other baselines, achieves comparable performance to state-of-the-art models, and efficiently generates molecules with multi-property optimization. The code is available at: https://github.com/tang777777/InstGAN. © 2025 International Joint Conferences on Artificial Intelligence. All rights reserved.

...

排序方式：时间相关性
显示方式：列表摘要