SHF:Medium:Compositional Semantics-Guided Synthesis
项目来源
项目主持人
项目受资助机构
财政年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
人员信息
机构信息
项目主管部门
项目官员
1.Synthesizing Abstract Transformers forReduced-Product Domains
- 关键词:
- Digital subscriber lines;Abstract interpretations;Blow-up;Domains specific languages;Features sets;JavaScript programs;Program synthesis;Search spaces;Synthesis techniques;Task-needs;Two-product
- Kalita, Pankaj Kumar;Reps, Thomas;Roy, Subhajit
- 《31st International Static Analysis Symposium, SAS 2024》
- 2025年
- October 20, 2024 - October 22, 2024
- Pasadena, CA, United states
- 会议
Recently, we showed how to apply program-synthesis techniques to create abstract transformers in a user-provided domain-specific language (DSL) L (i.e., "L-transformers"). This algorithm does not scale when applied to reduced-product domains: synthesizing transformers for all of the component domains simultaneously blows up the search-space. Because reduced-product domains can significantly improve the precision of abstract interpretation, in this paper, we propose an algorithm to synthesize reduced L-transformers ⟨f1♯R,f2♯R,⋯,fn♯R⟩ for a product domain A1×A2×⋯×An, using multiple DSLs: L=⟨L1,L2,…,Ln⟩. Synthesis of reduced-product transformers is quite challenging: first, the synthesis task has to tackle an larger "feature set" as each component transformer now has access to the abstract inputs from all component domains in the product. Second, to ensure that the product transformer is maximally precise, the synthesis task needs to arrange for the component transformers to cooperate with each other. We implemented our algorithm in a tool, Amurth2, and used it to synthesize abstract transformers for two product domains—SAFE and JSAI—available within the SAFEstr framework for JavaScript program analysis. For four of the six operations supported by SAFEstr, Amurth2 synthesizes more precise abstract transformers than the manually written ones available in SAFEstr. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
...2.The SemGuS Toolkit
- 关键词:
- ;Open-source;Synthesis problems
- Johnson, Keith J. C.;Reynolds, Andrew;Reps, Thomas;D’Antoni, Loris
- 《36th International Conference on Computer Aided Verification, CAV 2024》
- 2024年
- July 24, 2024 - July 27, 2024
- Montreal, QC, Canada
- 会议
Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that providesa parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solvers, and presents an evaluation of the baseline enumerative solvers. © The Author(s) 2024.
...3.Grammar-Aligned Decoding
- 关键词:
- ;
- Park, Kanghee;Wang, Jiayu;Berg-Kirkpatrick, Taylor;Polikarpova, Nadia;D'Antoni, Loris
- 《38th Conference on Neural Information Processing Systems, NeurIPS 2024》
- 2024年
- December 9, 2024 - December 15, 2024
- Vancouver, BC, Canada
- 会议
Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints. © 2024 Neural information processing systems foundation. All rights reserved.
...4.The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions
- 关键词:
- Statistical tests;Uncertainty analysis;Data bias;Data impact;Dataset multiplicity;Model multiplicity;Model robustness;Procedural fairness;Test time;Time predictions;Training dataset;Uncertainty
- Meyer, Anna P.;Albarghouthi, Aws;D'antoni, Loris
- 《6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023》
- 2023年
- June 12, 2023 - June 15, 2023
- Chicago, IL, United states
- 会议
We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks a counterfactual question of what the set of resultant models (and associated test-time predictions) would be if we could somehow access all hypothetical, unbiased versions of the dataset. We discuss how to use this framework to encapsulate various sources of uncertainty in datasets' factualness, including systemic social bias, data collection practices, and noisy labels or features. We show how to exactly analyze the impacts of dataset multiplicity for a specific model architecture and type of uncertainty: linear models with label errors. Our empirical analysis shows that real-world datasets, under reasonable assumptions, contain many test samples whose predictions are affected by dataset multiplicity. Furthermore, the choice of domain-specific dataset multiplicity definition determines what samples are affected, and whether different demographic groups are disparately impacted. Finally, we discuss implications of dataset multiplicity for machine learning practice and research, including considerations for when model outcomes should not be trusted. © 2023 ACM.
...5.Modular System Synthesis
- 关键词:
- Computer aided design;Key Issues;Large programs;Modular system;Modular system design;Program synthesis;Search spaces;Sub-problems;Synthesis tool;Synthesised;System synthesis
- Park, Kanghee;Johnson, Keith J.C.;D'Antoni, Loris;Reps, Thomas
- 《23rd International Conference on Formal Methods in Computer-Aided Design, FMCAD 2023》
- 2023年
- October 24, 2023 - October 27, 2023
- Ames, IA, United states
- 会议
This paper describes a way to improve the scalability of program synthesis by exploiting modutarity: larger programs are synthesized from smaller programs. The key issue is to make each 'larger-created-from-smaller' synthesis sub-problem be of a similar nature, so that the kind of synthesis sub-problem that needs to be solved-and the size of each search space-has roughly the same character at each level. This work holds promise for creating program-synthesis tools that have far greater capabilities than currently available tools, and opens new avenues for synthesis research: how synthesis tools should support modular system design, and how synthesis applications can best exploit such capabilities. © 2023 FMCAD Association and individual authors.
...
