SHF:Medium:Compositional Semantics-Guided Synthesis

项目来源

美国国家科学基金(NSF)

项目主持人

Loris D'Antoni

项目受资助机构

UNIVERSITY OF CALIFORNIA SAN DIEGO

财政年度

2025,2022

立项时间

未公开

项目编号

2506134

项目级别

国家级

研究期限

未知 / 未知

受资助金额

1619494.00美元

学科

未公开

学科代码

未公开

基金类别

Standard Grant

关键词

Software&Hardware Foundation ; MEDIUM PROJECT ; Formal Methods and Verification

参与者

Thomas Reps

参与机构

UNIVERSITY OF CALIFORNIA,SAN DIEGO

项目标书摘要:The field of program synthesis aims to create tools that can automatically create a program from a specification of desired behavior.Synthesis holds the promise of easing the burden on programmers(e.g.,by finding solutions to tricky special cases automatically),and allowing non-programmers to create programs merely by indicating the outcome that they want the program to produce.Unfortunately,current synthesis tools do not scale up to large-scale programming problems,a situation that threatens to doom this promising technology to being a niche field.This project's novelties are ways to exploit compositionality in program synthesis in a way that allows one to create bigger programs out of smaller ones.The project builds on a recent framework called SemGuS(Semantics-Guided Synthesis),which offers a foothold on the expressibility and scalability problems of program synthesis.In principle,the framework can support the synthesis of software in layers,where implementation choices in one layer are hidden from other layers(and thus consistent with modular software design with information hiding).The goal of the project is to capitalize on the opportunity that SemGuS offers for extending synthesis to much larger systems than was possible heretofore.The work will lead to more scalable and general synthesis algorithms,with potential benefits to synthesis applications that are already in widespread use.Further development of the SemGuS framework has the potential to make synthesis more usable and programmable,and thereby allow users to carry out synthesis tasks without prior knowledge of existing synthesis tools.Most importantly,compositional synthesis will allow synthesis to scale to larger applications with more practical relevance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人员信息

Loris DAntoni(Principal Investigator):ldantoni@ucsd.edu;

机构信息

【University of California-San Diego(Performance Institution)】StreetAddress:9500 GILMAN DR,LA JOLLA,California,United States/ZipCode:920930021;【UNIVERSITY OF CALIFORNIA,SAN DIEGO】StreetAddress:9500 GILMAN DR,LA JOLLA,California,United States/PhoneNumber:8585344896/ZipCode:920930021;

项目主管部门

Directorate for Computer and Information Science and Engineering(CSE)-Division of Computing and Communication Foundations(CCF)

项目官员

Anindya Banerjee(Email:abanerje@nsf.gov;Phone:7032927885)

  • 排序方式:
  • 2
  • /
  • 1.Nice to Meet You: Synthesizing Practical MLIR Abstract Transformers

    • 关键词:
    • Abstracting;Program diagnostics;Static analysis;Surface mount technology;Abstract domains;Abstract interpretations;Abstract semantics;Intermediate representations;LLVM;Novel techniques;Optimisations;Program synthesis;Simple++;Synthesised
    • Peng, Xuanyu;Kennedy, Dominic;Fan, Yuyou;Greenman, Ben;Regehr, John;D'Antoni, Loris
    • 《Proceedings of the ACM on Programming Languages》
    • 2026年
    • 10卷
    • 期刊

    Static analyses play a fundamental role during compilation: they discover facts that are true in all executions of the code being compiled, and then these facts are used to justify optimizations and diagnostics. Each static analysis is based on a collection of abstract transformers that provide abstract semantics for the concrete instructions that make up a program. It can be challenging to implement abstract transformers that are sound, precise, and efficient—and in fact both LLVM and GCC have suffered from miscompilations caused by unsound abstract transformers. Moreover, even after more than 20 years of development, LLVM lacks abstract transformers for hundreds of instructions in its intermediate representation (IR). We developed NiceToMeetYou: a program synthesis framework for abstract transformers that are aimed at the kinds of non-relational integer abstract domains that are heavily used by today’s production compilers. It exploits a simple but novel technique for breaking the synthesis problem into parts: each of our transformers is the meet of a collection of simpler, sound transformers that are synthesized such that each new piece fills a gap in the precision of the final transformer. Our design point is bulk automation: no sketches are required. Transformers are verified by lowering to a previously-created SMT dialect of MLIR. Each of our synthesized transformers is provably sound and some (17 %) are more precise than those provided by LLVM. © 2026 Owner/Author.

    ...
  • 2.Semantics of Sets of Programs

    • 关键词:
    • Computer software reusability;Context sensitive grammars;Formal logic;Logic programming;Program debugging;Reusability;'current;Compositional semantics;Infinite set of program;Program behavior;Program synthesis;Property;Simple++;Unrealizability logic;Verification framework;Verification techniques
    • Kim, Jinwoo;Nagy, Shaan;Reps, Thomas;D'Antoni, Loris
    • 《Proceedings of the ACM on Programming Languages》
    • 2025年
    • 9卷
    • OOPSLA1期
    • 期刊

    Applications like program synthesis sometimes require proving that a property holds for all of the infinitely many programs described by a grammar - -i.e., an inductively defined set of programs. Current verification frameworks overapproximate programs' behavior when sets of programs contain loops, including two Hoare-style logics that fail to be relatively complete when loops are allowed. In this work, we prove that compositionally verifying simple properties for infinite sets of programs requires tracking distinct program behaviors over unboundedly many executions. Tracking this information is both necessary and sufficient for verification. We prove this fact in a general, reusable theory of denotational semantics that can model the expressivity and compositionality of verification techniques over infinite sets of programs. We construct the minimal compositional semantics that captures simple properties of sets of programs and use it to derive the first sound and relatively complete Hoare-style logic for infinite sets of programs. Thus, our methods can be used to design minimally complex, compositional verification techniques for sets of programs. © 2025 Owner/Author.

    ...
  • 3.Synthesizing Abstract Transformers forReduced-Product Domains

    • 关键词:
    • Digital subscriber lines;Abstract interpretations;Blow-up;Domains specific languages;Features sets;JavaScript programs;Program synthesis;Search spaces;Synthesis techniques;Task-needs;Two-product
    • Kalita, Pankaj Kumar;Reps, Thomas;Roy, Subhajit
    • 《31st International Static Analysis Symposium, SAS 2024》
    • 2025年
    • October 20, 2024 - October 22, 2024
    • Pasadena, CA, United states
    • 会议

    Recently, we showed how to apply program-synthesis techniques to create abstract transformers in a user-provided domain-specific language (DSL) L (i.e., "L-transformers"). This algorithm does not scale when applied to reduced-product domains: synthesizing transformers for all of the component domains simultaneously blows up the search-space. Because reduced-product domains can significantly improve the precision of abstract interpretation, in this paper, we propose an algorithm to synthesize reduced L-transformers ⟨f1♯R,f2♯R,⋯,fn♯R⟩ for a product domain A1×A2×⋯×An, using multiple DSLs: L=⟨L1,L2,…,Ln⟩. Synthesis of reduced-product transformers is quite challenging: first, the synthesis task has to tackle an larger "feature set" as each component transformer now has access to the abstract inputs from all component domains in the product. Second, to ensure that the product transformer is maximally precise, the synthesis task needs to arrange for the component transformers to cooperate with each other. We implemented our algorithm in a tool, Amurth2, and used it to synthesize abstract transformers for two product domains—SAFE and JSAI—available within the SAFEstr framework for JavaScript program analysis. For four of the six operations supported by SAFEstr, Amurth2 synthesizes more precise abstract transformers than the manually written ones available in SAFEstr. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

    ...
  • 4.Weighted Context-Free-Language Ordered Binary Decision Diagrams

    • 关键词:
    • Benchmarking;Boolean functions;Context free languages;Context sensitive languages;Quantum electronics;Best-case double-exponential compression;Context-free languages;Decision diagram;Double exponential;Matched path;Ordered binary decision diagrams;Procedure call;Quantum circuit;Quantum simulations;Weighted decision diagrams
    • Sistla, Meghana;Chaudhuri, Swarat;Reps, Thomas
    • 《Proceedings of the ACM on Programming Languages》
    • 2024年
    • 8卷
    • OOPSLA2期
    • 期刊

    This paper presents a new data structure, called Weighted Context-Free-Language Ordered BDDs (WCFLOBDDs), which are a hierarchically structured decision diagram, akin to Weighted BDDs (WBDDs) enhanced with a procedure-call mechanism. For some functions, WCFLOBDDs are exponentially more succinct than WBDDs. They are potentially beneficial for representing functions of type Bn → D, when a function’s image V ⊆ D has many different values. We apply WCFLOBDDs in quantum-circuit simulation, and find that they perform better than WBDDs on certain benchmarks. With a 15-minute timeout, the number of qubits that can be handled by WCFLOBDDs is 1-64× that of WBDDs (and 1-128× that of CFLOBDDs, which are an unweighted version of WCFLOBDDs). These results support the conclusion that for this application—from the standpoint of problem size, measured as the number of qubits—WCFLOBDDs provide the best of both worlds: performance roughly matches whichever of WBDDs and CFLOBDDs is better. (From the standpoint of running time, the results are more nuanced.) © 2024 Copyright held by the owner/author(s).

    ...
  • 5.Automating Pruning in Top-Down Enumeration for Program Synthesis Problems with Monotonic Semantics

    • 关键词:
    • C (programming language);Context free grammars;Context sensitive grammars;Program debugging;Abstract domains;Abstract interpretations;Abstract semantics;Condition;Domain specific;Flow analysis;Grammar flow analyze;Program synthesis;Synthesis problems;Topdown
    • Johnson, Keith J.C.;Krishnan, Rahul;Reps, Thomas;D'Antoni, Loris
    • 《Proceedings of the ACM on Programming Languages》
    • 2024年
    • 8卷
    • OOPSLA2期
    • 期刊

    In top-down enumeration for program synthesis, abstraction-based pruning uses an abstract domain to approximate the set of possible values that a partial program, when completed, can output on a given input. If the set does not contain the desired output, the partial program and all its possible completions can be pruned. In its general form, abstraction-based pruning requires manually designed, domain-specific abstract domains and semantics, and thus has only been used in domain-specific synthesizers. This paper provides sufficient conditions under which a form of abstraction-based pruning can be automated for arbitrary synthesis problems in the general-purpose Semantics-Guided Synthesis (SemGuS) framework without requiring manually-defined abstract domains. We show that if the semantics of the language for which we are synthesizing programs exhibits some monotonicity properties, one can obtain an abstract interval-based semantics for free from the concrete semantics of the programming language, and use such semantics to effectively prune the search space. We also identify a condition that ensures such abstract semantics can be used to compute a precise abstraction of the set of values that a program derivable from a given hole in a partial program can produce. These precise abstractions make abstraction-based pruning more effective. We implement our approach in a tool, Moito, which can tackle synthesis problems defined in the SemGuS framework. Moito can automate interval-based pruning without any a-priori knowledge of the problem domain, and solve synthesis problems that previously required domain-specific, abstraction-based synthesizers - e.g., synthesis of regular expressions, CSV file schema, and imperative programs from examples. © 2024 Owner/Author.

    ...
  • 6.Synthesizing Formal Semantics from Executable Interpreters

    • 关键词:
    • Benchmarking;Context sensitive grammars;Problem oriented languages;Program compilers;Program debugging;Program translators;Semantics;Syntactics;Error prone tasks;Executables;Expert users;Formal Semantics;Horn clause;Program synthesis;Program Verification;Semgi;SyGuS;Synthesis problems
    • Liu, Jiangyi;Murphy, Charlie;Grover, Anvay;Johnson, Keith J.C.;Reps, Thomas;D'Antoni, Loris
    • 《Proceedings of the ACM on Programming Languages》
    • 2024年
    • 8卷
    • OOPSLA2期
    • 期刊

    Program verification and synthesis frameworks that allow one to customize the language in which one is interested typically require the user to provide a formally defined semantics for the language. Because writing a formal semantics can be a daunting and error-prone task, this requirement stands in the way of such frameworks being adopted by non-expert users. We present an algorithm that can automatically synthesize inductively defined syntax-directed semantics when given (i) a grammar describing the syntax of a language and (ii) an executable (closed-box) interpreter for computing the semantics of programs in the language of the grammar. Our algorithm synthesizes the semantics in the form of Constrained-Horn Clauses (CHCs), a natural, extensible, and formal logical framework for specifying inductively defined relations that has recently received widespread adoption in program verification and synthesis. The key innovation of our synthesis algorithm is a Counterexample-Guided Synthesis (CEGIS) approach that breaks the hard problem of synthesizing a set of constrained Horn clauses into small, tractable expression-synthesis problems that can be dispatched to existing SyGuS synthesizers. Our tool Synantic synthesized inductively-defined formal semantics from 14 interpreters for languages used in program-synthesis applications. When synthesizing formal semantics for one of our benchmarks, Synantic unveiled an inconsistency in the semantics computed by the interpreter for a language of regular expressions; fixing the inconsistency resulted in a more efficient semantics and, for some cases, in a 1.2x speedup for a synthesizer solving synthesis problems over such a language. © 2024 Owner/Author.

    ...
  • 7.Newtonian Program Analysis of Probabilistic Programs

    • 关键词:
    • Data flow analysis;Encoding (symbols);Linearization;Newton-Raphson method;Trees (mathematics);Algebraic program analyse;Control-flow;Equation systems;Interprocedural program analysis;Newton's methods;Newtonians;Non-iterative;Probabilistic programs;Program analysis;Semi-ring
    • Wang, Di;Reps, Thomas
    • 《Proceedings of the ACM on Programming Languages》
    • 2024年
    • 8卷
    • OOPSLA1期
    • 期刊

    Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic programs with unstructured control-flow, nondeterminism, and general recursion. NPA-PMA is based on Newtonian Program Analysis (NPA), a generalization of Newton's method to solve equation systems over semirings. The key challenge for developing NPA-PMA is to handle multiple kinds of confluences in both the algebraic structures that specify analyses and the equation systems that encode control flow: semirings support a single confluence operation, whereas NPA-PMA involves three confluence operations (conditional, probabilistic, and nondeterministic). Our work introduces ω-continuous pre-Markov algebras (ωPMAs) to factor out common parts of different analyses; adopts regular infinite-tree expressions to encode probabilistic programs with unstructured control-flow; and presents a linearization method that makes Newton's method applicable to the setting of regular-infinite-tree equations over ωPMAs. NPA-PMA allows analyses to supply a non-iterative strategy to solve linearized equations. Our experimental evaluation demonstrates that (i) NPA-PMA holds considerable promise for outperforming Kleene iteration, and (ii) provides great generality for designing program analyses. © 2024 Owner/Author.

    ...
  • 8.The SemGuS Toolkit

    • 关键词:
    • ;Open-source;Synthesis problems
    • Johnson, Keith J. C.;Reynolds, Andrew;Reps, Thomas;D’Antoni, Loris
    • 《36th International Conference on Computer Aided Verification, CAV 2024》
    • 2024年
    • July 24, 2024 - July 27, 2024
    • Montreal, QC, Canada
    • 会议

    Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that providesa parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solvers, and presents an evaluation of the baseline enumerative solvers. © The Author(s) 2024.

    ...
  • 9.Grammar-Aligned Decoding

    • 关键词:
    • ;
    • Park, Kanghee;Wang, Jiayu;Berg-Kirkpatrick, Taylor;Polikarpova, Nadia;D'Antoni, Loris
    • 《38th Conference on Neural Information Processing Systems, NeurIPS 2024》
    • 2024年
    • December 9, 2024 - December 15, 2024
    • Vancouver, BC, Canada
    • 会议

    Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate this problem by greedily restricting what tokens an LLM can output at each step to guarantee that the output matches a given constraint. Specifically, in grammar-constrained decoding (GCD), the LLM's output must follow a given grammar. In this paper we demonstrate that GCD techniques (and in general constrained decoding techniques) can distort the LLM's distribution, leading to outputs that are grammatical but appear with likelihoods that are not proportional to the ones given by the LLM, and so ultimately are low-quality. We call the problem of aligning sampling with a grammar constraint, grammar-aligned decoding (GAD), and propose adaptive sampling with approximate expected futures (ASAp), a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint. Our algorithm uses prior sample outputs to soundly overapproximate the future grammaticality of different output prefixes. Our evaluation on code generation and structured NLP tasks shows how ASAp often produces outputs with higher likelihood (according to the LLM's distribution) than existing GCD techniques, while still enforcing the desired grammatical constraints. © 2024 Neural information processing systems foundation. All rights reserved.

    ...
  • 10.The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

    • 关键词:
    • Statistical tests;Uncertainty analysis;Data bias;Data impact;Dataset multiplicity;Model multiplicity;Model robustness;Procedural fairness;Test time;Time predictions;Training dataset;Uncertainty
    • Meyer, Anna P.;Albarghouthi, Aws;D'antoni, Loris
    • 《6th ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023》
    • 2023年
    • June 12, 2023 - June 15, 2023
    • Chicago, IL, United states
    • 会议

    We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks a counterfactual question of what the set of resultant models (and associated test-time predictions) would be if we could somehow access all hypothetical, unbiased versions of the dataset. We discuss how to use this framework to encapsulate various sources of uncertainty in datasets' factualness, including systemic social bias, data collection practices, and noisy labels or features. We show how to exactly analyze the impacts of dataset multiplicity for a specific model architecture and type of uncertainty: linear models with label errors. Our empirical analysis shows that real-world datasets, under reasonable assumptions, contain many test samples whose predictions are affected by dataset multiplicity. Furthermore, the choice of domain-specific dataset multiplicity definition determines what samples are affected, and whether different demographic groups are disparately impacted. Finally, we discuss implications of dataset multiplicity for machine learning practice and research, including considerations for when model outcomes should not be trusted. © 2023 ACM.

    ...
  • 排序方式:
  • 2
  • /