Collaborative Research:CIF:Small:Inverse Reinforcement Learning with Heterogeneous Data:Estimation Algorithms with Finite Time and Sample Guarantees
项目来源
项目主持人
项目受资助机构
财政年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
人员信息
机构信息
项目主管部门
项目官员
1.Downlink MIMO Channel Estimation From Bits: Recoverability and Algorithm
- 关键词:
- Downlink; Channel estimation; Maximum likelihood estimation; Channelmodels; Dictionaries; Antenna arrays; Vectors; US Government; Science -general; Training; compression; quantization; limited feedback;recoverability;MASSIVE MIMO; LIMITED FEEDBACK; HARMONIC RETRIEVAL; QUANTIZATION;TENSORS; SYSTEMS; OFDM
- Shrestha, Rajesh;Shao, Mingjie;Hong, Mingyi;Ma, Wing-Kin;Fu, Xiao
- 《IEEE TRANSACTIONS ON SIGNAL PROCESSING》
- 2025年
- 73卷
- 期
- 期刊
In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy is adopted at the UE side, and then a maximum likelihood estimator (MLE) is formulated at the BS side. Recoverability of the MIMO channel under the widely used double directional model is established. Specifically, analyses are presented for two compression schemes-showing one being more overhead-economical and the other computationally lighter at the UE side. Second, to realize the MLE, an alternating direction method of multipliers (ADMM) algorithm is proposed. The algorithm is carefully designed to integrate a sophisticated harmonic retrieval (HR) solver as subroutine, which turns out to be the key of effectively tackling this hard MLE problem. Extensive numerical experiments are conducted to validate the efficacy of our approach.
...2.Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
- 关键词:
- ;
- Li, Jiaxiang;Zeng, Siliang;Wai, Hoi-To;Li, Chenliang;Garcia, Alfredo;Hong, Mingyi
- 《38th Conference on Neural Information Processing Systems, NeurIPS 2024》
- 2024年
- December 9, 2024 - December 15, 2024
- Vancouver, BC, Canada
- 会议
Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning (RL) step to fine-tune the model. Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but are robust to the presence of low-quality supervised learning data. Moreover, we discover a connection between the proposed IRL based approach, and a recent line of works called Self-Play Fine-tune (SPIN, Chen et al. [2024]). Theoretically, we show that the proposed algorithms converge to the stationary solutions of the IRL problem. Empirically, we align 1B and 7B models using proposed methods and evaluate them on a reward benchmark model and the HuggingFace Open LLM Leaderboard. The proposed methods show significant performance improvement over existing SFT approaches. Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process. Our code is available at https://github.com/JasonJiaxiangLi/Reward_learning_SFT. © 2024 Neural information processing systems foundation. All rights reserved.
...
