Inferring gene regulatory circuitry from functional genomics data

项目来源

美国卫生和人类服务部基金(HHS)

项目主持人

SEN, SHURJO KUMAR

项目受资助机构

COLUMBIA UNIV NEW YORK MORNINGSIDE

立项年度

2020

立项时间

未公开

项目编号

5R01HG003008-15

项目级别

国家级

研究期限

未知 / 未知

受资助金额

560259.00美元

学科

Biotechnology;Genetics;Human Genome

学科代码

未公开

基金类别

Non-SBIR/STTR RPGs

关键词

未公开

参与者

BUSSEMAKER, HARMEN J

参与机构

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

项目标书摘要:PROJECT SUMMARY It has long been known that methylation of genomic DNA correlates with gene expression. However, the structural mechanisms that underlie these observations remain obscure. In this project, we will pursue several innovative strategies for studying how methylation affects transcription factor (TF) binding. First, we will use Methyl-SELEX-seq ? a novel experimental method developed in the previous cycle of this grant that uses barcoded mixtures of methylated and unmethylated DNA ligands ? to create detailed maps of the effect of methylation on binding affinity for a broad panel of human transcription factors from various structural families. Second, we will perform detailed computational analyses and follow-up experiments to test the hypothesis that methylation causes local changes in DNA shape, which in turn modify TF binding affinity. We have shown that adding a methyl group in the major groove changes the geometry of the minor groove and enhances the electrostatic interaction between negative charges in the DNA minor groove and positively charged amino- acids in the TF. We will extend these analyses to other DNA modifications, as well as a wider range of DNA shape parameters and associated flexibility parameters. By building interpretable TF-DNA recognition models that integrate base, shape, and flexibility features using a powerful new machine learning framework developed in the previous funding cycle, we will make specific predictions regarding sequence and methylation readout mechanisms, and validate these using SELEX experiments with mutated TFs. To assess to what extent our quantitative models for binding to naked DNA built from SELEX data are predictive of binding to genomic DNA in the context of the living cell, we will perform detailed parallel analyses of SELEX and ChIP- seq data for Hox proteins and other TFs. Finally, to study the relationship between DNA binding and gene expression control in human cell lines, we will exploit Survey of Regulatory Elements (SuRE-seq), a novel massively parallel reporter assay that provides unique information about the autonomous transcriptional activity for each of >108 overlapping genomic fragments.

  • 排序方式:
  • 2
  • /
  • 1.Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning

    • 关键词:
    • CRYSTAL-STRUCTURE; RECOGNITION CODE; PROTEINS; AFFINITY; EXPRESSION;COMPLEX; ENERGY; SEQ
    • Liu, Shaoxun;Gomez-Alcala, Pilar;Leemans, Christ;Glassford, William J.;Melo, Lucas A. N.;Lu, Xiang-Jun;Mann, Richard S.;Bussemaker, Harmen J.
    • 《NUCLEIC ACIDS RESEARCH》
    • 2025年
    • 53卷
    • 16期
    • 期刊

    Sequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput in vitro binding assays coupled with machine learning have made it possible to accurately define such molecular recognition in a biophysically interpretable way for hundreds of TFs across many structural families, providing new avenues for predicting how the sequence preference of a TF is impacted by disease-associated mutations in its DNA binding domain. We developed a method based on a reference-free tetrahedral representation of variation in base preference within a given structural family that can be used to accurately predict the effect of mutations in the protein sequence of the TF. Using the basic helix-loop-helix (bHLH) and homeodomain (HD) families as test cases, our results demonstrate the feasibility of accurately predicting the shifts (Delta Delta Delta G/RT) in binding free energy associated with TF mutants by leveraging high-quality DNA binding models for sets of homologous wild-type TFs.

    ...
  • 2.Transcription factor paralogs orchestrate alternative gene regulatory networks by context-dependent cooperation with multiple cofactors

    • 关键词:
    • DNA-BINDING SPECIFICITY; HOX GENES; DROSOPHILA; EXTRADENTICLE;EVOLUTION; RECOGNITION; ULTRABITHORAX; INTEGRATION; HOMOTHORAX;EXPRESSION
    • Feng, Sidian;Rastogig, Chaitanya;Loker, Ryan;Glassford, William J.;Rube, H. Tomas;Bussemaker, Harmen J.;Mann, Richard S.
    • 《NATURE COMMUNICATIONS》
    • 2022年
    • 13卷
    • 1期
    • 期刊

    In eukaryotes, members of transcription factor families often exhibit similar DNA binding properties in vitro, yet orchestrate paralog-specific gene regulatory networks in vivo. The serially homologous first (T1) and third (T3) thoracic legs of Drosophila, which are specified by the Hox proteins Scr and Ubx, respectively, offer a unique opportunity to address this paradox in vivo. Genome-wide analyses using epitope-tagged alleles of both Hox loci in the T1 and T3 leg imaginal discs, the precursors to the adult legs and ventral body regions, show that similar to 8% of Hox binding is paralog-specific. Binding specificity is mediated by interactions with distinct cofactors in different domains: the Hox cofactor Exd acts in the proximal domain and is necessary for Scr to bind many of its paralog-specific targets, while in the distal leg domain, the homeodomain protein Distal-less (DII) enhances Scr binding to a different subset of loci. These findings reveal how Hox paralogs, and perhaps paralogs of other transcription factor families, orchestrate alternative downstream gene regulatory networks with the help of multiple, context-specific cofactors.

    ...
  • 3.Transcription factor regulation of eQTL activity across individuals and tissues

    • 关键词:
    • NF-KAPPA-B; GENETIC-VARIATION; FACTOR-BINDING; EXPRESSION; DISEASE;VARIANTS; LINKS; RISK
    • Flynn, Elise D.;Tsu, Athena L.;Kasela, Silva;Kim-Hellmuth, Sarah;Aguet, Francois;Ardlie, Kristin;Bussemaker, Harmen;Mohammadi, Pejman;Lappalainen, Tuuli
    • 《PLOS GENETICS》
    • 2022年
    • 18卷
    • 1期
    • 期刊

    Author summaryGene expression is regulated by local genomic sequence and can be affected by genetic variants. In the human population, tens of thousands of cis-regulatory variants have been discovered that are associated with altered gene expression across tissues, cell types, or environmental conditions. Understanding the molecular mechanisms of how these small changes in the genome sequence affect genome function would offer insight to the genetic regulatory code and how gene expression is controlled across tissues and environments. Current research efforts suggest that many regulatory variants' effects on gene expression are mediated by them altering the binding of transcription factors, which are proteins that bind to DNA to regulate gene expression. Here, we exploit the natural variation of TF activity among 49 tissues and between 838 individuals to elucidate which TFs regulate which regulatory variants. We find 10,098 TF-eQTL interactions across 2,136 genes that are supported by at least two lines of evidence. We validate these interactions using functional genomic and experimental approaches, and we find indication that they may pinpoint mechanisms of environment-specific genetic regulatory effects and genetic variants associated to diseases and traits.Tens of thousands of genetic variants associated with gene expression (cis-eQTLs) have been discovered in the human population. These eQTLs are active in various tissues and contexts, but the molecular mechanisms of eQTL variability are poorly understood, hindering our understanding of genetic regulation across biological contexts. Since many eQTLs are believed to act by altering transcription factor (TF) binding affinity, we hypothesized that analyzing eQTL effect size as a function of TF level may allow discovery of mechanisms of eQTL variability. Using GTEx Consortium eQTL data from 49 tissues, we analyzed the interaction between eQTL effect size and TF level across tissues and across individuals within specific tissues and generated a list of 10,098 TF-eQTL interactions across 2,136 genes that are supported by at least two lines of evidence. These TF-eQTLs were enriched for various TF binding measures, supporting with orthogonal evidence that these eQTLs are regulated by the implicated TFs. We also found that our TF-eQTLs tend to overlap genes with gene-by-environment regulatory effects and to colocalize with GWAS loci, implying that our approach can help to elucidate mechanisms of context-specificity and trait associations. Finally, we highlight an interesting example of IKZF1 TF regulation of an APBB1IP gene eQTL that colocalizes with a GWAS signal for blood cell traits. Together, our findings provide candidate TF mechanisms for a large number of eQTLs and offer a generalizable approach for researchers to discover TF regulators of genetic variant effects in additional QTL datasets.

    ...
  • 4.New restraints and validation approaches for nucleic acid structures in PDB-REDO

    • 关键词:
    • nucleic acid restraints; Watson-Crick base pairs; validation; PDB-REDO;x3DNA-DSSR;CONFORMATION-DEPENDENT RESTRAINTS; CRYSTAL-STRUCTURE; REFINEMENT;POLYNUCLEOTIDES; VISUALIZATION; PARAMETERS; REFMAC5; TOOLS; PAIR
    • de Vries, Ida;Kwakman, Tim;Lu, Xiang-Jun;Hekkelman, Maarten L.;Deshpande, Mandar;Velankar, Sameer;Perrakis, Anastassis;Joosten, Robbie P.
    • 《ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY》
    • 2021年
    • 77卷
    • Pt 9期
    • 期刊

    The quality of macromolecular structure models crucially depends on refinement and validation targets, which optimally describe the expected chemistry. Commonly used software for these two procedures has been designed and developed in a protein-centric manner, resulting in relatively few established features for the refinement and validation of nucleic acid-containing structure models. Here, new nucleic acid-specific approaches implemented in PDB-REDO are described, including a new restraint model using noncovalent geometries (base-pair hydrogen bonding and base-pair stacking) as refinement targets. New validation routines are also presented, including a metric for Watson-Crick base-pair geometry normality (Z(bpG)). Applying the PDB-REDO pipeline with the new restraint model to the whole Protein Data Bank (PDB) demonstrates an overall positive effect on the quality of nucleic acid-containing structure models. Finally, we discuss examples of improvements in the geometry of specific nucleic acid structures in the PDB. The new PDB-REDO models and pipeline are available at https://pdb-redo.eu/.

    ...
  • 5.Low percolation conductive graphite flakes-filled poly(urethane-imide) composites with high thermal stability via imidization self-foaming structure

    • 关键词:
    • Graphite;Solvents;Tensile strength;Percolation (fluids);Esters;Mesh generation;Percolation (computer storage);Thermodynamic stability;Electrical conductivity;Elongation at break;Foaming structure;High thermal stability;Percolation thresholds;Poly(urethane-imide)s;Thermal imidization;Zero temperature coefficients
    • Xia, W.;Feng, Y.-H.;Zou, J.;Huang, J.;Guo, M.-M.;Zhang, P.
    • 《Materials Today Chemistry》
    • 2021年
    • 21卷
    • 期刊

    In the study, the conductive graphite flakes filled poly(urethane-imide) composites (PUI/GFs) with high performance were constructed by the thermal imidization self-foaming reaction. It was found that the foaming action could promote the redistribution of GFs during curing process and the formation of stable linear conductive pathways. The percolation threshold of PUI/GFs composites was lowered from 1.26 wt% (2000 mesh GFs) or 0.86 wt% (1000 mesh GFs) to 0.79 wt% (500 mesh GFs), which were relatively low percolation thresholds for polymer/GFs composites so far. When the content of 500 mesh GFs was 4.0 wt%, the electrical conductivity of the composite was as high as 3.96 × 10−1 S/m. Also, a poly(urethane-imide) (PUI) matrix with excellent thermal stability (Td10%: 334.97 °C) and mechanical properties (elongation at break: 324.52%, tensile strength: 15.88 MPa) was obtained by introducing the rigid aromatic heterocycle into the polyurethane (PU) hard segments. Moreover, the zero temperature coefficient of resistivity for the composites was observed at the temperature range from 30 °C to 200 °C. Consequently, PUI/GFs composites may provide the novel strategy for considerable conductive materials with high thermal stability in electrical conductivity. © 2021 Elsevier Ltd

    ...
  • 6.Landscape of DNA binding signatures of myocyte enhancer factor-2B reveals a unique interplay of base and shape readout

    • 关键词:
    • TRANSCRIPTION FACTOR-BINDING; NUCLEIC-ACIDS; FORCE-FIELD; RECOGNITION;MEF2; SPECIFICITY; DOMAIN; RECRUITMENT; ACTIVATION; MUTATIONS
    • Machado, Ana Carolina Dantas;Cooper, Brendon H.;Lei, Xiao;Di Felice, Rosa;Chen, Lin;Rohs, Remo
    • 《NUCLEIC ACIDS RESEARCH》
    • 2020年
    • 48卷
    • 15期
    • 期刊

    Myocyte enhancer factor-2B (MEF2B) has the unique capability of binding to its DNA target sites with a degenerate motif, while still functioning as a gene-specific transcriptional regulator. Identifying its DNA targets is crucial given regulatory roles exerted by members of the MEF2 family and MEF2B's involvement in B-cell lymphoma. Analyzing structural data and SELEX-seq experimental results, we deduced the DNA sequence and shape determinants of MEF2B target sites on a high-throughput basis in vitro for wild-type and mutant proteins. Quantitative modeling of MEF2B binding affinities and computational simulations exposed the DNA readout mechanisms of MEF2B. The resulting binding signature of MEF2B revealed distinct intricacies of DNA recognition compared to other transcription factors. MEF2B uses base readout at its half-sites combined with shape readout at the center of its degenerate motif, where A-tract polarity dictates nuances of binding. The predominant role of shape readout at the center of the core motif, with most contacts formed in the minor groove, differs from previously observed protein-DNA readout modes. MEF2B, therefore, represents a unique protein for studies of the role of DNA shape in achieving binding specificity. MEF2B-DNA recognition mechanisms are likely representative for other members of the MEF2 family.

    ...
  • 7.Systematic in vitro profiling of off-target affinity, cleavage and efficiency for CRISPR enzymes

    • 关键词:
    • RNA-GUIDED ENDONUCLEASE; DNA-BINDING SPECIFICITY; GENOME-WIDE ANALYSIS;CAS9; SEQ; ACTIVATION; NUCLEASES; CPF1; INTERROGATION; GENES
    • Zhang, Liyang;Rube, H. Tomas;Vakulskas, Christopher A.;Behlke, Mark A.;Bussemaker, Harmen J.;Pufall, Miles A.
    • 《NUCLEIC ACIDS RESEARCH》
    • 2020年
    • 48卷
    • 9期
    • 期刊

    CRISPR RNA-guided endonucleases (RGEs) cut or direct activities to specific genomic loci, yet each has off-target activities that are often unpredictable. We developed a pair of simple in vitro assays to systematically measure the DNA-binding specificity (Spec-seq), catalytic activity specificity (SEAM-seq) and cleavage efficiency of RGEs. By separately quantifying binding and cleavage specificity, Spec/SEAM-seq provides detailed mechanistic insight into off-target activity. Feature-based models generated from Spec/SEAM-seq data for SpCas9 were consistent with previous reports of its in vitro and in vivo specificity, validating the approach. Spec/SEAM-seq is also useful for profiling less-well characterized RGEs. Application to an engineered SpCas9, HiFi-SpCas9, indicated that its enhanced target discrimination can be attributed to cleavage rather than binding specificity. The ortholog ScCas9, on the other hand, derives specificity from binding to an extended PAM. The decreased off-target activity of AsCas12a (Cpf1) appears to be primarily driven by DNA-binding specificity. Finally, we performed the first characterization of CasX specificity, revealing an all-or-nothing mechanism where mismatches can be bound, but not cleaved. Together, these applications establish Spec/SEAM-seq as an accessible method to rapidly and reliably evaluate the specificity of RGEs, Cas::gRNA pairs, and gain insight into the mechanism and thermodynamics of target discrimination.

    ...
  • 8.Context-Dependent Gene Regulation by Homeodomain Transcription Factor Complexes Revealed by Shape-Readout Deficient Proteins

    • 关键词:
    • DNA SHAPE; BINDING SITES; HOX; DROSOPHILA; FEATURES; GENOME; PBX1;SPECIFICITY; RECOGNITION; ASSOCIATION
    • Kribelbauer, Judith F.;Loker, Ryan E.;Feng, Siqian;Rastogi, Chaitanya;Abe, Namiko;Rube, H. Tomas;Bussemaker, Harmen J.;Mann, Richard S.
    • 《MOLECULAR CELL》
    • 2020年
    • 78卷
    • 1期
    • 期刊

    Eukaryotic transcription factors (TFs) form complexes with various partner proteins to recognize their genomic target sites. Yet, how the DNA sequence determines which TF complex forms at any given site is poorly understood. Here, we demonstrate that high-throughput in vitro DNA binding assays coupled with unbiased computational analysis provide unprecedented insight into how different DNA sequences select distinct compositions and configurations of homeodomain TF complexes. Using inferred knowledge about minor groove width readout, we design targeted protein mutations that destabilize homeodomain binding both in vitro and in vivo in a complex-specific manner. By performing parallel systematic evolution of ligands by exponential enrichment sequencing (SELEX-seq), chromatin immunoprecipitation sequencing (ChIP-seq), RNA sequencing (RNA-seq), and Hi-C assays, we not only classify the majority of in vivo binding events in terms of complex composition but also infer complex-specific functions by perturbing the gene regulatory network controlled by a single complex.

    ...
  • 9.DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes

    • 关键词:
    • SPECIFICITY; RECOGNITION
    • Sagendorf, Jared M.;Markarian, Nicholas;Berman, Helen M.;Rohs, Remo
    • 《NUCLEIC ACIDS RESEARCH》
    • 2020年
    • 48卷
    • D1期
    • 期刊

    DNAproDB (https://dnaprodb.usc.edu) is a web-based database and structural analysis tool that offers a combination of data visualization, data processing and search functionality that improves the speed and ease with which researchers can analyze, access and visualize structural data of DNA-protein complexes. In this paper, we report significant improvements made to DNAproDB since its initial release. DNAproDB now supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations, multiple DNA-protein complexes within a DNAproDB entry and model indexing for analysis of ensemble data. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features, improved structural moiety assignment and use of more sequence-based annotations. We have redesigned our report pages and search forms to support these enhancements, and the DNAproDB website has been improved to be more responsive and user-friendly. DNAproDB is now integrated with the Nucleic Acid Database, and we have increased our coverage of available Protein Data Bank entries. Our database now contains 95% of all available DNA-protein complexes, making our tools for analysis of these structures accessible to a broad community.

    ...
  • 10.TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites

    • 关键词:
    • MONTE-CARLO SIMULATIONS; CPG METHYLATION; NUCLEIC-ACIDS; SPECIFICITY;PREDICTION; RECOGNITION; PROTEINS; IDENTIFY; ORIGINS
    • Chiu, Tsu-Pei;Xin, Beibei;Markarian, Nicholas;Wang, Yingfei;Rohs, Remo
    • 《NUCLEIC ACIDS RESEARCH》
    • 2020年
    • 48卷
    • D1期
    • 期刊

    TFBSshape (https://tfbsshape.usc.edu) is a motif database for analyzing structural profiles of transcription factor binding sites (TFBSs). The main rationale for this database is to be able to derive mechanistic insights in protein-DNA readout modes from sequencing data without available structures. We extended the quantity and dimensionality of TFBSshape, from mostly in vitro to in vivo binding and from unmethylated to methylated DNA. This new release of TFBSshape improves its functionality and launches a responsive and user-friendly web interface for easy access to the data. The current expansion includes new entries from the most recent collections of transcription factors (TFs) from the JASPAR and UniPROBE databases, methylated TFBSs derived fromin vitro high-throughput EpiSELEX-seq binding assays and in vivo methylated TFBSs from the MeDReaders database. TFBSshape content has increased to 2428 structural profiles for 1900 TFs from 39 different species. The structural profiles for each TFBS entry now include 13 shape features and minor groove electrostatic potential for standard DNA and four shape features for methylated DNA. We improved the flexibility and accuracy for the shape-based alignment of TFBSs and designed new tools to compare methylated and unmethylated structural profiles of TFs and methods to derive DNA shape-preserving nucleotide mutations in TFBSs.

    ...
  • 排序方式:
  • 2
  • /