機能等価メソッドデータセットの構築によるソフトウェ... - 肥後芳樹 - 日本学术振兴会基金(JS...

機能等価メソッドデータセットの構築によるソフトウェア工学タスクの高度化

项目来源

日本学术振兴会基金(JSPS)

项目主持人

肥後芳樹

项目受资助机构

大阪大学

立项年度

2024

立项时间

未公开

项目编号

24H00692

项目级别

国家级

研究期限

未知 / 未知

受资助金额

46540000.00日元

学科

情報科学、情報工学およびその関連分野

学科代码

未公开

基金类别

基盤研究(A)

关键词

ソースコード解析 ; 機能等価メソッド ; 大規模言語モデル ; コードクローン ; Java ; Python ; LLM

参与者

丸山勝久；林晋平；松本真佑；ヌリオリビエ

参与机构

大阪大学，大学院情報科学研究科；立命館大学，情報理工学部；東京科学大学，情報理工学院；大阪大学，高等共創研究院

项目标书摘要：初年度はJavaのデータセットの構築を目的としていたが,当初の計画に加え構築したデータセットを利用したLLMベースのコードクローン検出精度の改善も行うことができた.昨年度は,ソフトウェア工学における「機能等価メソッド」研究を中核に据え,①大規模データセットの構築と②LLM を用いた先進的クローン検出手法の高度化という二つの成果を得た。まず,314M行超の OSS(IJADataset)から自動テスト生成(EvoSuite)と相互実行によって機能等価で構造の異なる Java メソッドを抽出し,手作業検証を経て1342対の Functionally Equivalent Method Pair Dataset(FEMPDataset)を公開した。このデータセットを用いて NIL・InferCode・ASTNN を評価した結果，字句列ベース手法では検出漏れが多く,AST/深層学習系手法では誤検出が多いなど,既存技術の限界を定量的に示した。次に,FEMPDataset を学習データに GPT-3.5 turbo,Llama2-Chat-7B,Code-Llama-7B-Instruct をファインチューニングし,Type-4 クローン検出能力を向上させた。特に Code-Llama 系では精度・再現率とも大幅に改善し,Fine-tuned GPT-3.5 は GPT-4-turbo を上回る精度を達成した。これにより,データセット整備とモデル最適化を組み合わせることで,大規模言語モデルが従来困難だった大差分クローン検出にも有効であることを実証した。以上の成果は,新規データ資源の提供と LLM 応用法の確立を通じ,コードクローン研究と自動プログラム解析の発展に寄与するものであり,科研費による支援が両成果の基盤となった。昨年度は当初の予定以上に研究を進めることができた.今後は昨年度に継続して構築したデータセットを利用することによるコードクローン検出手法の精度改善について更に深化させていく.また,構築したデータセットを利用したリファクタリング支援も行う.さらに,Javaのデータセットを構築した際の知見を生かして,Pythonのデータセットも構築する.Reason:初年度はJavaのデータセットの構築を目的としていたが,当初の計画に加え構築したデータセットを利用したLLMベースのコードクローン検出精度の改善も行うことができた。Outline of Research at the Start:本研究では機能等価メソッドのデータセットを構築する.取得した機能等価メソッドの候補は手作業により真に機能等価であるかを確認する.データセットの構築後は,それを利用してソフトウェア工学技術の評価を行う.例えば、機能等価メソッドはコードクローン検出ツールの評価に利用できる.同機能を実装したメソッドはコードクローンとして検出されることが望ましいので,機能等価メソッドがどの程度コードクローンとして検出されるかを調査することで,コードクローン検出ツールの性能を評価できる.さらに,構築したデータセットを大規模言語モデルのファインチューニングに用いることにより,ソフトウェア工学タスクの高度化を目指す。

排序方式：时间相关性
显示方式：列表摘要

1.A Large-Scale Investigation Into the Loss of Pull Request Data on GitHub

关键词：
Software development management; Source coding; Application programminginterfaces; Testing; Codes; Soft sensors; Reviews; Maintenance; Java;Information science; Empirical software engineering; pull requests;social coding; GitHub; software mining

Tang, Bowen;Maruyama, Katsuhisa
《IEEE ACCESS》
2026年
14卷
期
期刊

Analyzing pull requests (PRs) on GitHub provides valuable insights that can improve software development and maintenance. Therefore, researchers must collect PRs for empirical studies when testing hypotheses and creating practical tools based on these insights. Unfortunately, using GitHub as a data source for PRs carries the risk of data loss, owing to its flexible resource management. Existing studies have indicated that data losses can occur in PRs; however, the types and impacts of these losses remain unclear. This study shares findings from our investigation, which analyzed 84,828 PRs from 30 GitHub repositories and 2,345,724 actions recorded within the PRs. It clarified how different types of data loss affected PRs and highlighted variations in the percentage of PRs affected by loss. The results showed that 54.79% of the PRs experienced some data loss. Source code loss was common, whereas the loss of user information and commits was less frequent. Most user information loss resulted from missing committers. Compared to PRs that were rejected, merged PRs were more likely to have source code losses. The source code loss rate was much lower in testing-related PRs than in those unrelated to the testing. PRs that lacked files written in a programming language were more prone to commit loss. These findings help researchers better understand data loss in PRs and develop effective strategies to prevent it.

...

2.An empirical study on the impact of change granularity in refactoring detection

关键词：
Error detection;Open systems;Coarse-grained;Commit history;Commit message;Empirical studies;Impact of changes;Open science;Refactoring detection;Refactorings;Software Evolution

Chen, Lei;Hayashi, Shinpei
《Journal of Systems and Software》
2026年
231卷
期
期刊

Detecting refactorings in commit history is essential to improve comprehension to code changes on code reviews, and to provide valuable information for empirical studies on software evolution. Techniques have been proposed to accurately detect refactorings on the granularity of a single commit. However, refactorings can be made over multiple commits because of their complexity or other practical development problems, which cause detecting on only the granularity of a single commit not enough. We observe that some refactorings can only be detected in coarser granularity, i.e., changes conducted over multiple commits, or in the granularity of a single commit but not in coarse-grained. We call these types of refactorings as coarse-grained refactorings (CGRs) and ephemeral refactorings (EPRs). We investigated the features and causes of CGRs and EPRs through an empirical study of 32 open-source Java projects and found that both commonly occur during development. In addition, we found that refactoring types related to splitting or merging classes and packages, as well as those involving modifications to the inheritance structure, tend to be CGRs, and types targeting small objects such as variables and attributes, and refactorings with context-sensitive detection criteria tend to be EPRs. The causes of CGRs and EPRs are analyzed and categorized, and the relationships between the commit messages of CGRs and themselves are also assessed. We found that about 20% of commit messages explicitly suggest the existence of CGRs. We suggest that CGRs and EPRs be valued in refactoring research and that detectors be extended to identify CGRs. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board. © 2025 The Authors

...

3.How Natural Language Proficiency Shapes Generative AI Code for Software Engineering Tasks

关键词：
Codes; Natural language processing; Software engineering; Softwarereliability; Software development management; Python; Softwaremeasurement; Large language models

Rojpaisarnkit, Ruksit;Fan, Youmei;Matsumoto, Kenichi;Kula, Raula Gaikovina
《IEEE SOFTWARE》
2026年
43卷
1期
期刊

Much research has focused on prompt structure, but natural language proficiency is an underexplored factor that can influence the quality of generated code. This article investigates whether English language proficiency affects the proficiency and correctness of code generated by large language models.

...

4.MORCoRA: Multi-Objective Refactoring Recommendation Considering Review Availability

关键词：
Search-based software engineering; multi-objective search; refactoring;review availability;NONDOMINATED SORTING APPROACH; GENETIC ALGORITHM; MODEL

Chen, Lei;Hayashi, Shinpei
《INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING》
2024年
卷
期
期刊

Background: Search-based refactoring involves searching for a sequence of refactorings to achieve specific objectives. Although a typical objective is improving code quality, a different perspective is also required; the searched sequence must undergo review before being applied and may not be applied if the review fails or is postponed due to no proper reviewers. Aim: Therefore, it is essential to ensure that the searched sequence of refactorings can be reviewed promptly by reviewers who meet two criteria: (1) having enough expertise and (2) being free of heavy workload. The two criteria are regarded as the review availability of the refactoring sequence. Method: We propose MORCoRA, a multi-objective search-based technique that can search for code quality improvable, semantic preserved, and high review availability possessed refactoring sequences and corresponding proper reviewers. Results: We evaluate MORCoRA on six open-source repositories. The quantitative analysis reveals that MORCoRA can effectively recommend refactoring sequences that fit the requirements. The qualitative analysis demonstrates that the refactorings recommended by MORCoRA can enhance code quality and effectively address code smells. Furthermore, the recommended reviewers for those refactorings possess high expertise and are available to review. Conclusions: We recommend that refactoring recommenders consider both the impact on quality improvement and the developer resources required for review when recommending refactorings.

...

排序方式：时间相关性
显示方式：列表摘要