ユーザ行動分析のためのセッション・トークナイザーの開発
项目来源
项目主持人
项目受资助机构
项目编号
立项年度
立项时间
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
1.Function-based labels for complementary recommendation: Definition, annotation, and LLM-as-a-Judge
- 关键词:
- Behavioral research;Classification (of information);Learning systems;Machine learning;Purchasing;Annotation;Binary classification;Complementary recommendation;Complementary relationship;Decision process;Human perception;Language model;Large language model;Machine learning methods;Users' experiences
- Yamasaki, Chihiro;Sugahara, Kai;Nagi, Yuma;Okamoto, Kazushi
- 《Pattern Recognition Letters》
- 2026年
- 200卷
- 期
- 期刊
Complementary recommendations enhance the user experience by suggesting items that are frequently purchased together while serving different functions from the query item. Inferring or evaluating whether two items have a complementary relationship requires complementary relationship labels; however, defining these labels is challenging because of the inherent ambiguity of such relationships. Complementary labels based on user historical behavior logs attempt to capture these relationships, but often produce inconsistent and unreliable results. Recent efforts have introduced large language models (LLMs) to infer these relationships. However, these approaches provide a binary classification without a nuanced understanding of complementary relationships. In this study, we address these challenges by introducing Function-Based Labels (FBLs), a novel definition of complementary relationships independent of user purchase logs and the opaque decision processes of LLMs. We constructed a human-annotated FBLs dataset comprising 2759 item pairs and demonstrated that it covered possible item relationships and minimized ambiguity. We then evaluated whether machine learning methods using annotated FBLs could accurately infer labels for unseen item pairs, and whether LLM-generated complementary labels align with human perception. Among machine learning methods, ModernBERT achieved the highest performance with a Macro-F1 of 0.911, demonstrating accuracy and robustness even under limited supervision. For LLMs, GPT-4o-mini achieved high consistency (0.989) and classification accuracy (0.849) under the detailed FBL definition, while requiring only 1/842 the cost and 1/75 the time of human annotation. Overall, our study presents FBLs as a clear definition of complementary relationships, enabling more accurate inferences and automated labeling of complementary recommendations. © 2025 Elsevier B.V.
...2.A Universal Framework for Offline Serendipity Evaluation in Recommender Systems via Large Language Models
- 关键词:
- Knowledge management;Evaluation;Evaluation framework;Ground truth;Language model;Large language model;Offline;Performance;Serendipity;Unobservable;Users' satisfactions
- Tokutake, Yu;Okamoto, Kazushi;Harada, Kei;Shibata, Atsushi;Karube, Koki
- 《34th ACM International Conference on Information and Knowledge Management, CIKM 2025》
- 2025年
- November 10, 2025 - November 14, 2025
- Seoul, Korea, Republic of
- 会议
Serendipity in recommender systems (RSs) has attracted increasing attention as a concept that enhances user satisfaction by presenting unexpected and useful items. However, evaluating serendipitous performance remains challenging because its ground truth is generally unobservable. The existing offline metrics often depend on ambiguous definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability. To address this issue, we propose a universally applicable evaluation framework that leverages large language models (LLMs) known for their extensive knowledge and reasoning capabilities, as evaluators. First, to improve the evaluation performance of the proposed framework, we assessed the serendipity prediction accuracy of LLMs using four different prompt strategies on a dataset containing user-annotated serendipitous ground truth and found that the chain-of-thought prompt achieved the highest accuracy. Next, we re-evaluated the serendipitous performance of both serendipity-oriented and general RSs using the proposed framework on three commonly used real-world datasets, without the ground truth. The results indicated that there was no serendipity-oriented RS that consistently outperformed across all datasets, and even a general RS sometimes achieved higher performance than the serendipity-oriented RS. © 2025 Copyright held by the owner/author(s).
...
