機能等価メソッドデータセットの構築によるソフトウェ... - 肥後芳樹 - 日本学术振兴会基金(JS...

機能等価メソッドデータセットの構築によるソフトウェア工学タスクの高度化

项目来源

日本学术振兴会基金(JSPS)

项目主持人

肥後芳樹

项目受资助机构

大阪大学

立项年度

2024

立项时间

未公开

项目编号

24H00692

研究期限

未知 / 未知

项目级别

国家级

受资助金额

46540000.00日元

学科

情報科学、情報工学およびその関連分野

学科代码

未公开

基金类别

基盤研究(A)

关键词

ソースコード解析 ; 機能等価メソッド ; 大規模言語モデル ; コードクローン ; Java ; Python ; LLM

参与者

丸山勝久；林晋平；松本真佑；ヌリオリビエ

参与机构

大阪大学，大学院情報科学研究科；立命館大学，情報理工学部；東京科学大学，情報理工学院；大阪大学，高等共創研究院

项目标书摘要：初年度はJavaのデータセットの構築を目的としていたが,当初の計画に加え構築したデータセットを利用したLLMベースのコードクローン検出精度の改善も行うことができた.昨年度は,ソフトウェア工学における「機能等価メソッド」研究を中核に据え,①大規模データセットの構築と②LLM を用いた先進的クローン検出手法の高度化という二つの成果を得た。まず,314M行超の OSS(IJADataset)から自動テスト生成(EvoSuite)と相互実行によって機能等価で構造の異なる Java メソッドを抽出し,手作業検証を経て1342対の Functionally Equivalent Method Pair Dataset(FEMPDataset)を公開した。このデータセットを用いて NIL・InferCode・ASTNN を評価した結果，字句列ベース手法では検出漏れが多く,AST/深層学習系手法では誤検出が多いなど,既存技術の限界を定量的に示した。次に,FEMPDataset を学習データに GPT-3.5 turbo,Llama2-Chat-7B,Code-Llama-7B-Instruct をファインチューニングし,Type-4 クローン検出能力を向上させた。特に Code-Llama 系では精度・再現率とも大幅に改善し,Fine-tuned GPT-3.5 は GPT-4-turbo を上回る精度を達成した。これにより,データセット整備とモデル最適化を組み合わせることで,大規模言語モデルが従来困難だった大差分クローン検出にも有効であることを実証した。以上の成果は,新規データ資源の提供と LLM 応用法の確立を通じ,コードクローン研究と自動プログラム解析の発展に寄与するものであり,科研費による支援が両成果の基盤となった。昨年度は当初の予定以上に研究を進めることができた.今後は昨年度に継続して構築したデータセットを利用することによるコードクローン検出手法の精度改善について更に深化させていく.また,構築したデータセットを利用したリファクタリング支援も行う.さらに,Javaのデータセットを構築した際の知見を生かして,Pythonのデータセットも構築する.Reason:初年度はJavaのデータセットの構築を目的としていたが,当初の計画に加え構築したデータセットを利用したLLMベースのコードクローン検出精度の改善も行うことができた。Outline of Research at the Start:本研究では機能等価メソッドのデータセットを構築する.取得した機能等価メソッドの候補は手作業により真に機能等価であるかを確認する.データセットの構築後は,それを利用してソフトウェア工学技術の評価を行う.例えば、機能等価メソッドはコードクローン検出ツールの評価に利用できる.同機能を実装したメソッドはコードクローンとして検出されることが望ましいので,機能等価メソッドがどの程度コードクローンとして検出されるかを調査することで,コードクローン検出ツールの性能を評価できる.さらに,構築したデータセットを大規模言語モデルのファインチューニングに用いることにより,ソフトウェア工学タスクの高度化を目指す。

排序方式：时间相关性
显示方式：列表摘要

1.Coverage Isn’t Enough: SBFL-Driven Insights into Manually Created vs. Automatically Generated Tests

关键词：
Automatic test pattern generation;Software design;Software testing;Well testing;Automated test-case generations;Automatically generated;Code coverage;Fault localization;Mutation testing;Spectra's;Spectrum-based fault localization;Test case;Testing method;Testing phase

Shimizu, Sasara;Higo, Yoshiki
《26th International Conference on Product-Focused Software Process Improvement, PROFES 2025》
2026年
December 1, 2025 - December 3, 2025
Salerno, Italy
会议

The testing phase is an essential part of software development, but manually creating test cases can be time-consuming. Consequently, there is a growing need for more efficient testing methods. To reduce the burden on developers, various automated test generation tools have been developed, and several studies have been conducted to evaluate the effectiveness of the tests they produce. However, most of these studies focus primarily on coverage metrics, and only a few examine how well the tests support fault localization—particularly using artificial faults introduced through mutation testing. In this study, we compare the SBFL (Spectrum-Based Fault Localization) score and code coverage of automatically generated tests with those of manually created tests. The SBFL score indicates how accurately faults can be localized using SBFL techniques. By employing SBFL score as an evaluation metric—an approach rarely used in prior studies on test generation—we aim to provide new insights into the respective strengths and weaknesses of manually created and automatically generated tests. Our experimental results show that automatically generated tests achieve higher branch coverage than manually created tests, but their SBFL score is lower, especially for code with deeply nested structures. These findings offer guidance on how to effectively combine automatically generated and manually created testing approaches. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

...

2.How Much Can a Behavior-Preserving Changeset Be Decomposed into Refactoring Operations?

关键词：
;Behavior preservation;Refactorings

Someya, Kota;Chen, Lei;Decker, Michael J.;Hayashi, Shinpei
《41st IEEE International Conference on Software Maintenance and Evolution, ICSME 2025》
2025年
September 7, 2025 - September 12, 2025
Auckland, New zealand
会议

Developers sometimes mix behavior-preserving modifications, such as refactorings, with behavior-altering modifications, such as feature additions. Several approaches have been proposed to support understanding such modifications by separating them into those two parts. Such refactoring-aware approaches are expected to be particularly effective when the behavior-preserving parts can be decomposed into a sequence of more primitive behavior-preserving operations, such as refactorings, but this has not been explored. In this paper, as an initial validation, we quantify how much of the behavior-preserving modifications can be decomposed into refactoring operations using a dataset of functionally-equivalent method pairs. As a result, when using an existing refactoring detector, only 33.9 % of the changes could be identified as refactoring operations. In contrast, when including 67 newly defined functionally-equivalent operations, the coverage increased by over 128 %. Further investigation into the remaining unexplained differences was conducted, suggesting improvement opportunities. © 2025 IEEE.

...

3.Social Media Reactions to Open Source Promotions: AI-Powered GitHub Projects on Hacker News

关键词：
Artificial intelligence;Open systems;Social networking (online);Social sciences computing;Software design;Github project;Hacker news;LLM;News sources;Open source software projects;Open-source;Open-source softwares;Social media;Social media platforms;Spread of informations

Meakpaiboonwattana, Prachnachai;Tarntong, Warittha;Mekratanavorakul, Thai;Ragkhitwetsagul, Chaiyong;Sangaroonsilp, Pattaraporn;Kula, Raula Gaikovina;Choetkiertikul, Morakot;Matsumoto, Kenichi;Sunetnanta, Thanwadee
《41st IEEE International Conference on Software Maintenance and Evolution, ICSME 2025》
2025年
September 7, 2025 - September 12, 2025
Auckland, New zealand
会议

Social media platforms have become more influential than traditional news sources, shaping public discourse and accelerating the spread of information. With the rapid advancement of artificial intelligence (AI), open-source software (OSS) projects can leverage these platforms to gain visibility and attract contributors. In this study, we investigate the relationship between Hacker News, a social news site focused on computer science and entrepreneurship, and the extent to which it influences developer activity on the promoted GitHub AI projects. We analyzed 2,195 Hacker News (HN) stories and their corresponding comments over a two-year period. Our findings reveal that at least 19 % of AI developers promoted their GitHub projects on Hacker News, often receiving positive engagement from the community. By tracking activity on the associated 1,814 GitHub repositories after they were shared on Hacker News, we observed a significant increase in forks, stars, and contributors. These results suggest that Hacker News serves as a viable platform for AI-powered OSS projects, with the potential to gain attention, foster community engagement, and accelerate software development. © 2025 IEEE.

...

4.A Dataset of Software Bill of Materials for Evaluating SBOM Consumption Tools

关键词：
Open source software;Open systems;Tools;Bill of materials;Evaluating software;Generation tools;Material consumption;Real-world;Software bill of material;Software dependencies;Software-component;SPDX;Tool support

Kishimoto, Rio;Kanda, Tetsuya;Manabe, Yuki;Inoue, Katsuro;Qiu, Shi;Higo, Yoshiki
《22nd IEEE/ACM International Conference on Mining Software Repositories, MSR 2025》
2025年
April 27, 2025 - April 29, 2025
Ottawa, ON, Canada
会议

A Software Bill of Materials (SBOM) is becoming an essential tool for effective software dependency management. An SBOM is a list of components used in software, including details such as component names, versions, and licenses. Using SBOMs, developers can quickly identify software components and assess whether their software depends on vulnerable libraries. Numerous tools support software dependency management through SBOMs, which can be broadly categorized into two types: tools that generate SBOMs and tools that utilize SBOMs. A substantial collection of accurate SBOMs is required to evaluate tools that utilize SBOMs. However, there is no publicly available dataset specifically designed for this purpose, and research on SBOM consumption tools remains limited. In this paper, we present a dataset of SBOMs to address this gap. The dataset we constructed comprises 46 SBOMs generated from real-world Java projects, with plans to expand it to include a broader range of projects across various programming languages. Accurate and well-structured SBOMs enable researchers to evaluate the functionality of SBOM consumption tools and identify potential issues. We collected 3,271 Java projects from GitHub and generated SBOMs for 798 of them using Maven with an open-source SBOM generation tool. These SBOMs were refined through both automatic and manual corrections to ensure accuracy, currently resulting in 46 SBOMs that comply with the SPDX Lite profile, which defines minimal requirements tailored to practical workflows in industries. This process also revealed issues with the SBOM generation tools themselves. The dataset is publicly available on Zenodo (DOI: 10.5281/zenodo.14233414). © 2025 IEEE.

...

5.Revisiting Method-Level Change Prediction: A Comparative Evaluation at Different Granularities

关键词：
Computer software;Maintainability;Change prediction;Class level;Comparative evaluations;Comparison methods;Different granularities;Level change;Machine-learning;Maintenance efforts;Performance;Prediction techniques

Sugimori, Hiroto;Hayashi, Shinpei
《32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025》
2025年
March 4, 2025 - March 7, 2025
Montreal, QC, Canada
会议

To improve the efficiency of software maintenance, change prediction techniques have been proposed to predict frequently changing modules. Whereas existing techniques focus primarily on class-level prediction, method-level prediction allows for more direct identification of change locations. Method-level prediction can be useful, but it may also negatively affect prediction performance, leading to a trade-off. This makes it unclear which level of granularity users should select for their predictions. In this paper, we evaluated the performance of method-level change prediction compared with that of class-level prediction from three perspectives: direct comparison, method-level comparison, and maintenance effort-aware comparison. The results from 15 open source projects show that, although method-level prediction exhibited lower performance than class-level prediction in the direct comparison, method-level prediction outperformed class-level prediction when both were evaluated at method-level, leading to a median difference of 0.26 in accuracy. Furthermore, effort-aware comparison shows that method-level prediction performed significantly better when the acceptable maintenance effort is little. © 2025 IEEE.

...

6.Toward Automated Test Generation for Dockerfiles Based on Analysis of Docker Image Layers

关键词：
Automatic test pattern generation;Codes (symbols);Image processing;Software testing;Automated test generations;Docker;Dockerfile;General programming;Generation techniques;Image layers;Layer;Source codes;Text file;Virtualizations

Goto, Yuki;Matsumoto, Shinsuke;Kusumoto, Shinji
《29th International Conference on Evaluation and Assessment of Software Engineering, EASE 2025》
2025年
June 17, 2025 - June 20, 2025
Istanbul, Turkey
会议

Docker has gained attention as a lightweight container-based virtualization platform. The process for building a Docker image is defined in a text file called a Dockerfile. A Dockerfile can be considered as a kind of source code that contains instructions on how to build a Docker image. Its behavior should be verified through testing, as is done for source code in a general programming language. For source code in languages such as Java, search-based test generation techniques have been proposed. However, existing automated test generation techniques cannot be applied to Dockerfiles. Since a Dockerfile does not contain branches, the coverage metric, typically used as an objective function in existing methods, becomes meaningless. In this study, we propose an automated test generation method for Dockerfiles based on processing results rather than processing steps. The proposed method determines which files should be tested and generates the corresponding tests based on an analysis of Dockerfile instructions and Docker image layers. The experimental results show that the proposed method can reproduce over 80% of the tests created by developers. © 2025 Copyright held by the owner/author(s).

...

7.Exploring anInclusion Relation onTest Cases toIdentify Unit andIntegration Tests

关键词：
Integration;Debugging efforts;Inclusion relation;Integration test;Line coverage;Measurement methods;Software testings;Test case;Testing efficiency;Testing process;Unit tests

Okamoto, Ryu;Matsumoto, Shinsuke;Kusumoto, Shinji
《25th International Conference on Product-Focused Software Process Improvement, PROFES 2024》
2025年
December 2, 2024 - December 4, 2024
Tartu, Estonia
会议

In software testing, among the various types of tests, two commonly conducted ones are unit and integration tests.Unit tests verify individual functionalities, and integrationtests verify the combination of multiple functionalities. If wecan identify unit/integration tests and measure them as ordinal values, such as the degree of integration-ness, we can utilizethem to improve testing efficiency. However, the definitionsof unit/integration are ambiguous, making it difficult to distinguish between them. To the best of our knowledge, there is currentlyno method for detecting this distinction. In this study, aimingto support the testing process, we will consider a measurement method for unit/integration tests. The key idea is to utilize an inclusion relation, which naturally exists among test cases. As an application of the inclusion relation, we propose a method for ordering failed tests to streamline debugging. We conducted a mutation analysisto evaluate how much our proposal reduces debugging effort comparedto a naive method. The results showed that our proposal was effective in 29.7% of cases and confirmed an average reduction of 20.7%in debugging effort. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

...

8.ChangePrism:Visualizing the Essence of Code Changes

关键词：
Codes (symbols);Computer programming languages;Computer software maintenance;Code changes;Code comprehension;Code evolution;Commit;Component extraction;Exact change;Novel visualizations;Software maintenance and evolution;Textual difference;Two-component

Chen, Lei;Lanza, Michele;Hayashi, Shinpei
《2025 IEEE Working Conference on Software Visualization, VISSOFT 2025》
2025年
September 7, 2025 - September 8, 2025
Auckland, New zealand
会议

Understanding the changes made by developers when they submit a pull request and/or perform a commit on a repository is a crucial activity in software maintenance and evolution. The common way to review changes relies on examining code diffs, where textual differences between two file versions are highlighted in red and green to indicate additions and deletions of lines. This can be cumbersome for developers, making it difficult to obtain a comprehensive overview of all changes in a commit. Moreover, certain types of code changes can be particularly significant and may warrant differentiation from standard modifications to enhance code comprehension. We present a novel visualization approach supported by a tool named ChangePrism, which provides a way to better understand code changes. The tool comprises two components: extraction, which retrieves code changes and relevant information from the git history, and visualization, which offers both general and detailed views of code changes in commits. The general view provides an overview of different types of code changes across commits, while the detailed view displays the exact changes in the source code for each commit. Video demonstration: https://youtu.be/jMoGLfM3KIM © 2025 IEEE.

...

9.The Effects ofSemantic Information onLLM-Based Program Repair

关键词：
;Automated program repair;ChatGPT;Language model;Large language model;Model-based OPC;Performance;Prompt engineering;Semantics Information;Source codes

Hori, Shota;Matsumoto, Shinsuke;Higo, Yoshiki;Kusumoto, Shinji;Yasuda, Kazuya;Ito, Shinji;Huyen, Phan Thi Thanh
《25th International Conference on Product-Focused Software Process Improvement, PROFES 2024》
2025年
December 2, 2024 - December 4, 2024
Tartu, Estonia
会议

Large Language Model-based Automated Program Repair (LLM-APR) has recently received significant attention as a debugging assistance. Our objective is to improve the performance of LLM-APR.In this study, we focus on semantic information contained in the source code. Semantic information refers to elements used by the programmer to understand the source code, which does not contributeto compilation or execution. We picked out specification, methodnames and variable names as semantic information. In the investigation,we prepared eight prompts, each consisting of all combinations ofthree types of semantic information. The experimental results showedthat all semantic information improves the performance of LLM-APR,and variable names are particularly significant. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

...

排序方式：时间相关性
显示方式：列表摘要