CAREER:Mining biological ... - Chi Zhang - 美国国家科学基金(NSF...

CAREER:Mining biological functions from single cell multi-omics data

项目来源

美国国家科学基金(NSF)

项目主持人

Chi Zhang

项目受资助机构

OREGON HEALTH & SCIENCE UNIVERSITY

财政年度

2025,2020

立项时间

未公开

项目编号

2528521

研究期限

未知 / 未知

项目级别

国家级

受资助金额

1220051.00美元

学科

未公开

学科代码

未公开

基金类别

Continuing grant

关键词

Innovation:Bioinformatics ; CAREER-Faculty Erly Career Dev

参与者

未公开

参与机构

OREGON HEALTH&SCIENCE UNIVERSITY

项目标书摘要：Biological functional activities include intracellular functions such as transcriptional regulation,metabolism,and signaling transduction,and intercellular activities such as cell-cell interactions.With the advent of single cell multi-omics(scMulti-seq)biotechnology,researchers can study the biological functions of a complex biological system at the cellular resolution.The integrative analysis of scMulti-seq data and multiple study objects produces a wealth of rich information that enables the characterization of species or tissue specific biological functions,and at the same time,poses great challenge on how to identify and extract biologically meaningful data patterns.Though substantial amount of efforts has been made to interpret data patterns in single cell multi omics data,most of the existing methods focused on unsupervised learning in a completely data driven manner without considering the rich existing knowledge.In addition,depending on the types of biological functions,their underlying mathematical representation forms are different in scMulti-seq data.This calls for systems biology models and machine learning concepts to target true biological functions from scMulti-seq data.The first challenge to study biological functions from scMulti-seq data is to derive the data patterns that correspond to true biological functions and develop proper computational models for specific biological mechanisms and pathways.The second challenge lies in the difficulty of knowledge representation and sharing across the studies for different species,tissue types and experimental conditions.There remains an urgent need to integrate knowledge derived from disparate data sources to optimize the biological functional modeling,such that the learned knowledge could be utilized to study other biological systems or data types and promote the generation of new hypotheses.The PI’s long-term career goal is to develop mathematical formulations and computational methods to model biological functions from multi-omics data.This project will develop new mathematical models and an advanced computational framework to optimize the mining of biological functions,by integrating scMulti-seq data with context specific and general knowledge derived from independent data sets or experiments.The PI's research team will achieve the goals through the following three objectives.First,a novel subspace representation model will be developed to identify transcriptional regulation and functional gene modules.The proposed method will be empowered by a novel local low-rank matrix detection method to detect gene co regulation modules and a meta-learning framework to optimize results interpretation.Second,the PI's research team will develop a new graph neural network architecture to estimate cell-wise functional activities for flux carrying networks and a graph data clustering method to identify cell groups with varied functional states and distinct pathways.Thirdly,a knowledge graph will be constructed to represent the biological functions derived from scMulti-seq data,which enables the integration of independent knowledge derived from literature data and development of new biological hypotheses.The project is expected to deliver novel computational tools that can effectively explore biological functions from a wide range of heterogeneous datasets,and it could provide new capabilities for functional interpretation of individual data sets by maximizing the utilization of existing scMulti-seq and literature data,and reasoning of new biological hypotheses and mechanisms.Educationally,the scientific discoveries,including developed methods and biological knowledge,will be seamlessly integrated into an online educational knowledge base for large-scale public engagement,and will also lead to new project-based interdisciplinary training for high school,undergraduate and graduate students.The results of this project can be found at:https://zcslab.github.io/.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人员信息

Chi Zhang(Principal Investigator)：zhangchi@ohsu.edu；

机构信息

【Oregon Health&Science University(Performance Institution)】StreetAddress：3181 SW SAM JACKSON PARK RD,PORTLAND,Oregon,United States/ZipCode：972393011；【OREGON HEALTH&SCIENCE UNIVERSITY】StreetAddress：3181 SW SAM JACKSON PARK RD,PORTLAND,Oregon,United States/PhoneNumber：5034947784/ZipCode：972393011；

项目主管部门

Directorate for Biological Sciences(BIO)-Division of Biological Infrastructure(DBI)

项目官员

Krisztina Varga(Email：kvarga@nsf.gov；Phone：7032928297)

排序方式：时间相关性
显示方式：列表摘要

1.Generalized Matrix Local Low Rank Representation by Random Projection and Submatrix Propagation

关键词：
Approximation theory;Computation theory;Local low rank matrix;Low rank approximations;Low-rank matrices;matrix;Matrix approximation;Random projections;Randomized matrix approximation;Representation learning;Sub-matrix detection;Submatrix

Dang, Pengtao;Zhu, Haiqi;Guo, Tingbo;Wan, Changlin;Zhao, Tong;Salama, Paul;Wang, Yijie;Cao, Sha;Zhang, Chi
《29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023》
2023年
August 6, 2023 - August 10, 2023
Long Beach, CA, United states
会议

Matrix low rank approximation is an effective method to reduce or eliminate the statistical redundancy of its components. Compared with the traditional global low rank methods such as singular value decomposition (SVD), local low rank approximation methods are more advantageous to uncover interpretable data structures when clear duality exists between the rows and columns of the matrix. Local low rank approximation is equivalent to low rank submatrix detection. Unfortunately,existing local low rank approximation methods can detect only submatrices of specific mean structure, which may miss a substantial amount of true and interesting patterns. In this work, we develop a novel matrix computational framework called RPSP (Random Probing based submatrix Propagation) that provides an effective solution for the general matrix local low rank representation problem. RPSP detects local low rank patterns that grow from small submatrices of low rank property, which are determined by a random projection approach. RPSP is supported by theories of random projection. Experiments on synthetic data demonstrate that RPSP outperforms all state-of-the-art methods, with the capacity to robustly and correctly identify the low rank matrices when the pattern has a similar mean as the background, background noise is heteroscedastic and multiple patterns present in the data. On real-world datasets, RPSP also demonstrates its effectiveness in identifying interpretable local low rank matrices. © 2023 ACM.

...

2.Bias Aware Probabilistic Boolean Matrix Factorization

关键词：
Collaborative filtering;Matrix algebra;Matrix factorization;Probability distributions;Stochastic models;Bias levels;Boolean Matrix;Combinatorial problem;Datapoints;Dimensionality reduction;Factorization methods;Matrix factorizations;Noise models;Probabilistics;Real-world

Wan, Changlin;Dang, Pengtao;Zhao, Tong;Zang, Yong;Zhang, Chi;Cao, Sha
《38th Conference on Uncertainty in Artificial Intelligence, UAI 2022》
2022年
August 1, 2022 - August 5, 2022
Eindhoven, Netherlands
会议

Boolean matrix factorization (BMF) is a combinatorial problem arising from a wide range of applications including recommendation system, collaborative filtering, and dimensionality reduction. Currently, the noise model of existing BMF methods is often assumed to be homoscedastic; however, in real world data scenarios, the deviations of observed data from their true values are almost surely diverse due to stochastic noises, making each data point not equally suitable for fitting a model. In this case, it is not ideal to treat all data points as equally distributed. Motivated by such observations, we introduce a probabilistic BMF model that recognizes the object- and feature-wise bias distribution respectively, called bias aware BMF (BABF). To the best of our knowledge, BABF is the first approach for Boolean decomposition with consideration of the feature-wise and object-wise bias in binary data. We conducted experiments on datasets with different levels of background noise, bias level, and sizes of the signal patterns, to test the effectiveness of our method in various scenarios. We demonstrated that our model outperforms the state-of-the-art factorization methods in both accuracy and efficiency in recovering the original datasets, and the inferred bias level is highly significantly correlated with true existing bias in both simulated and real world datasets. © 2022 UAI. All Rights Reserved.

...

排序方式：时间相关性
显示方式：列表摘要