SHF:Medium:Compositional Semantics-Guided Synthesis

项目来源

美国国家科学基金(NSF)

项目主持人

Loris D'Antoni

项目受资助机构

UNIVERSITY OF CALIFORNIA SAN DIEGO

项目编号

2506134

财政年度

2025,2022

立项时间

未公开

项目级别

国家级

研究期限

未知 / 未知

受资助金额

1619494.00美元

学科

未公开

学科代码

未公开

基金类别

Standard Grant

关键词

Software&Hardware Foundation ; MEDIUM PROJECT ; Formal Methods and Verification

参与者

Thomas Reps

参与机构

UNIVERSITY OF CALIFORNIA,SAN DIEGO

项目标书摘要:The field of program synthesis aims to create tools that can automatically create a program from a specification of desired behavior.Synthesis holds the promise of easing the burden on programmers(e.g.,by finding solutions to tricky special cases automatically),and allowing non-programmers to create programs merely by indicating the outcome that they want the program to produce.Unfortunately,current synthesis tools do not scale up to large-scale programming problems,a situation that threatens to doom this promising technology to being a niche field.This project's novelties are ways to exploit compositionality in program synthesis in a way that allows one to create bigger programs out of smaller ones.The project builds on a recent framework called SemGuS(Semantics-Guided Synthesis),which offers a foothold on the expressibility and scalability problems of program synthesis.In principle,the framework can support the synthesis of software in layers,where implementation choices in one layer are hidden from other layers(and thus consistent with modular software design with information hiding).The goal of the project is to capitalize on the opportunity that SemGuS offers for extending synthesis to much larger systems than was possible heretofore.The work will lead to more scalable and general synthesis algorithms,with potential benefits to synthesis applications that are already in widespread use.Further development of the SemGuS framework has the potential to make synthesis more usable and programmable,and thereby allow users to carry out synthesis tasks without prior knowledge of existing synthesis tools.Most importantly,compositional synthesis will allow synthesis to scale to larger applications with more practical relevance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人员信息

Loris DAntoni(Principal Investigator):ldantoni@ucsd.edu;

机构信息

【University of California-San Diego(Performance Institution)】StreetAddress:9500 GILMAN DR,LA JOLLA,California,United States/ZipCode:920930021;【UNIVERSITY OF CALIFORNIA,SAN DIEGO】StreetAddress:9500 GILMAN DR,LA JOLLA,California,United States/PhoneNumber:8585344896/ZipCode:920930021;

项目主管部门

Directorate for Computer and Information Science and Engineering(CSE)-Division of Computing and Communication Foundations(CCF)

项目官员

Anindya Banerjee(Email:abanerje@nsf.gov;Phone:7032927885)

  • 排序方式:
  • 1
  • /
  • 1.Synthesizing Abstract Transformers forReduced-Product Domains

    • 关键词:
    • Digital subscriber lines;Abstract interpretations;Blow-up;Domains specific languages;Features sets;JavaScript programs;Program synthesis;Search spaces;Synthesis techniques;Task-needs;Two-product
    • Kalita, Pankaj Kumar;Reps, Thomas;Roy, Subhajit
    • 《31st International Static Analysis Symposium, SAS 2024》
    • 2025年
    • October 20, 2024 - October 22, 2024
    • Pasadena, CA, United states
    • 会议

    Recently, we showed how to apply program-synthesis techniques to create abstract transformers in a user-provided domain-specific language (DSL) L (i.e., "L-transformers"). This algorithm does not scale when applied to reduced-product domains: synthesizing transformers for all of the component domains simultaneously blows up the search-space. Because reduced-product domains can significantly improve the precision of abstract interpretation, in this paper, we propose an algorithm to synthesize reduced L-transformers ⟨f1♯R,f2♯R,⋯,fn♯R⟩ for a product domain A1×A2×⋯×An, using multiple DSLs: L=⟨L1,L2,…,Ln⟩. Synthesis of reduced-product transformers is quite challenging: first, the synthesis task has to tackle an larger "feature set" as each component transformer now has access to the abstract inputs from all component domains in the product. Second, to ensure that the product transformer is maximally precise, the synthesis task needs to arrange for the component transformers to cooperate with each other. We implemented our algorithm in a tool, Amurth2, and used it to synthesize abstract transformers for two product domains—SAFE and JSAI—available within the SAFEstr framework for JavaScript program analysis. For four of the six operations supported by SAFEstr, Amurth2 synthesizes more precise abstract transformers than the manually written ones available in SAFEstr. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

    ...
  • 2.The SemGuS Toolkit

    • 关键词:
    • ;Open-source;Synthesis problems
    • Johnson, Keith J. C.;Reynolds, Andrew;Reps, Thomas;D’Antoni, Loris
    • 《36th International Conference on Computer Aided Verification, CAV 2024》
    • 2024年
    • July 24, 2024 - July 27, 2024
    • Montreal, QC, Canada
    • 会议

    Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that providesa parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solvers, and presents an evaluation of the baseline enumerative solvers. © The Author(s) 2024.

    ...
  • 3.Grammar-Aligned Decoding

    • 关键词:
    • ;
    • Park, Kanghee;Wang, Jiayu;Berg-Kirkpatrick, Taylor;Polikarpova, Nadia;D'Antoni, Loris
    • 《38th Conference on Neural Information Processing Systems, NeurIPS 2024》
    • 2024年
    • December 9, 2024 - December 15, 2024
    • Vancouver, BC, Canada
    • 会议

    Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints. © 2024 Neural information processing systems foundation. All rights reserved.

    ...
  • 4.The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

    • 关键词:
    • Statistical tests;Uncertainty analysis;Data bias;Data impact;Dataset multiplicity;Model multiplicity;Model robustness;Procedural fairness;Test time;Time predictions;Training dataset;Uncertainty
    • Meyer, Anna P.;Albarghouthi, Aws;D'antoni, Loris
    • 《6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023》
    • 2023年
    • June 12, 2023 - June 15, 2023
    • Chicago, IL, United states
    • 会议

    We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks a counterfactual question of what the set of resultant models (and associated test-time predictions) would be if we could somehow access all hypothetical, unbiased versions of the dataset. We discuss how to use this framework to encapsulate various sources of uncertainty in datasets' factualness, including systemic social bias, data collection practices, and noisy labels or features. We show how to exactly analyze the impacts of dataset multiplicity for a specific model architecture and type of uncertainty: linear models with label errors. Our empirical analysis shows that real-world datasets, under reasonable assumptions, contain many test samples whose predictions are affected by dataset multiplicity. Furthermore, the choice of domain-specific dataset multiplicity definition determines what samples are affected, and whether different demographic groups are disparately impacted. Finally, we discuss implications of dataset multiplicity for machine learning practice and research, including considerations for when model outcomes should not be trusted. © 2023 ACM.

    ...
  • 5.Modular System Synthesis

    • 关键词:
    • Computer aided design;Key Issues;Large programs;Modular system;Modular system design;Program synthesis;Search spaces;Sub-problems;Synthesis tool;Synthesised;System synthesis
    • Park, Kanghee;Johnson, Keith J.C.;D'Antoni, Loris;Reps, Thomas
    • 《23rd International Conference on Formal Methods in Computer-Aided Design, FMCAD 2023》
    • 2023年
    • October 24, 2023 - October 27, 2023
    • Ames, IA, United states
    • 会议

    This paper describes a way to improve the scalability of program synthesis by exploiting modutarity: larger programs are synthesized from smaller programs. The key issue is to make each 'larger-created-from-smaller' synthesis sub-problem be of a similar nature, so that the kind of synthesis sub-problem that needs to be solved-and the size of each search space-has roughly the same character at each level. This work holds promise for creating program-synthesis tools that have far greater capabilities than currently available tools, and opens new avenues for synthesis research: how synthesis tools should support modular system design, and how synthesis applications can best exploit such capabilities. © 2023 FMCAD Association and individual authors.

    ...
  • 排序方式:
  • 1
  • /