精准医学大数据管理和共享技术平台
项目来源
项目主持人
项目受资助机构
项目编号
立项年度
立项时间
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
1.Cross-Linked Unified Embedding for cross-modality representation learning
- 关键词:
- Benchmarking;Cells;Cytology;Data integration;Genome;Modal analysis;Cross modality;Embeddings;Genomics;Global integration;Incomplete observation;Multi-modal;Multi-modal data;Multi-modal learning;Real-world;Single cells
- Tu, Xinming;Cao, Zhi-Jie;Xia, Chen-Rui;Mostafavi, Sara;Gao, Ge
- 《36th Conference on Neural Information Processing Systems, NeurIPS 2022》
- 2022年
- November 28, 2022 - December 9, 2022
- New Orleans, LA, United states
- 会议
Multi-modal learning is essential for understanding information in the real world. Jointly learning from multi-modal data enables global integration of both shared and modality-specific information, but current strategies often fail when observations from certain modalities are incomplete or missing for part of the subjects. To learn comprehensive representations based on such modality-incomplete data, we present a semi-supervised neural network model called CLUE (Cross-Linked Unified Embedding). Extending from multi-modal VAEs, CLUE introduces the use of cross-encoders to construct latent representations from modality-incomplete observations. Representation learning for modality-incomplete observations is common in genomics. For example, human cells are tightly regulated across multiple related but distinct modalities such as DNA, RNA, and protein, jointly defining a cell's function. We benchmark CLUE on multi-modal data from single cell measurements, illustrating CLUE's superior performance in all assessed categories of the NeurIPS 2021 Multimodal Single-cell Data Integration Competition. While we focus on analysis of single cell genomic datasets, we note that the proposed cross-linked embedding strategy could be readily applied to other cross-modality representation learning problems. © 2022 Neural information processing systems foundation. All rights reserved.
...2.Multi-label classification of fundus images based on graph convolutional network
- 关键词:
- Diabetic retinopathy; Fundus images; GCN; Multi-label;DIABETIC-RETINOPATHY; FLUORESCEIN ANGIOGRAPHY; PREVALENCE; LESIONS
- Cheng, Yinlin;Ma, Mengnan;Li, Xingyu;Zhou, Yi
- 《International Conference on Health Big Data and Artificial Intelligence》
- 2021年
- OCT 29-NOV 01, 2020
- Guangzhou, PEOPLES R CHINA
- 会议
Background: Diabetic Retinopathy (DR) is the most common and serious microvascular complication in the diabetic population. Using computer-aided diagnosis from the fundus images has become a method of detecting retinal diseases, but the detection of multiple lesions is still a difficult point in current research. Methods: This study proposed a multi-label classification method based on the graph convolutional network (GCN), so as to detect 8 types of fundus lesions in color fundus images. We collected 7459 fundus images (1887 left eyes, 1966 right eyes) from 2282 patients (1283 women, 999 men), and labeled 8 types of lesions, laser scars, drusen, cup disc ratio (C/D > 0.6), hemorrhages, retinal arteriosclerosis, microaneurysms, hard exudates and soft exudates. We constructed a specialized corpus of the related fundus lesions. A multi-label classification algorithm for fundus images was proposed based on the corpus, and the collected data were trained. Results: The average overall F1 Score (OF1) and the average per-class F1 Score (CF1) of the model were 0.808 and 0.792 respectively. The area under the ROC curve (AUC) of our proposed model reached 0.986, 0.954, 0.946, 0.957, 0.952, 0.889, 0.937 and 0.926 for detecting laser scars, drusen, cup disc ratio, hemorrhages, retinal arteriosclerosis, microaneurysms, hard exudates and soft exudates, respectively. Conclusions: Our results demonstrated that our proposed model can detect a variety of lesions in the color images of the fundus, which lays a foundation for assisting doctors in diagnosis and makes it possible to carry out rapid and efficient large-scale screening of fundus lesions.
...3.Research on epileptic EEG recognition based on improved residual networks of 1-D CNN and indRNN
- 关键词:
- Epilepsy; Residual network; CNN; indRNN; RCNN;CLASSIFICATION; PREDICTION; TERM
- Ma, Mengnan;Cheng, Yinlin;Wei, Xiaoyan;Chen, Ziyi;Zhou, Yi
- 《International Conference on Health Big Data and Artificial Intelligence》
- 2021年
- OCT 29-NOV 01, 2020
- Guangzhou, PEOPLES R CHINA
- 会议
Background Epilepsy is one of the diseases of the nervous system, which has a large population in the world. Traditional diagnosis methods mostly depended on the professional neurologists' reading of the electroencephalogram (EEG), which was time-consuming, inefficient, and subjective. In recent years, automatic epilepsy diagnosis of EEG by deep learning had attracted more and more attention. But the potential of deep neural networks in seizure detection had not been fully developed. Methods In this article, we used a one-dimensional convolutional neural network (1-D CNN) to replace the residual network architecture's traditional convolutional neural network (CNN). Moreover, we combined the Independent recurrent neural network (indRNN) and CNN to form a new residual network architecture-independent convolutional recurrent neural network (RCNN). Our model can achieve an automatic diagnosis of epilepsy EEG. Firstly, the important features of EEG were learned by using the residual network architecture of 1-D CNN. Then the relationship between the sequences were learned by using the recurrent neural network. Finally, the model outputted the classification results. Results On the small sample data sets of Bonn University, our method was superior to the baseline methods and achieved 100% classification accuracy, 100% classification specificity. For the noisy real-world data, our method also exhibited powerful performance. Conclusion The model we proposed can quickly and accurately identify the different periods of EEG in an ideal condition and the real-world condition. The model can provide automatic detection capabilities for clinical epilepsy EEG detection. We hoped to provide a positive significance for the prediction of epileptic seizures EEG.
...4.Multi-label classification of fundus images based on graph convolutional network
- 关键词:
- Diabetic retinopathy; Fundus images; GCN; Multi-label;DIABETIC-RETINOPATHY; FLUORESCEIN ANGIOGRAPHY; PREVALENCE; LESIONS
- Cheng, Yinlin;Ma, Mengnan;Li, Xingyu;Zhou, Yi
- 《International Conference on Health Big Data and Artificial Intelligence》
- 2021年
- OCT 29-NOV 01, 2020
- Guangzhou, PEOPLES R CHINA
- 会议
Background: Diabetic Retinopathy (DR) is the most common and serious microvascular complication in the diabetic population. Using computer-aided diagnosis from the fundus images has become a method of detecting retinal diseases, but the detection of multiple lesions is still a difficult point in current research. Methods: This study proposed a multi-label classification method based on the graph convolutional network (GCN), so as to detect 8 types of fundus lesions in color fundus images. We collected 7459 fundus images (1887 left eyes, 1966 right eyes) from 2282 patients (1283 women, 999 men), and labeled 8 types of lesions, laser scars, drusen, cup disc ratio (C/D > 0.6), hemorrhages, retinal arteriosclerosis, microaneurysms, hard exudates and soft exudates. We constructed a specialized corpus of the related fundus lesions. A multi-label classification algorithm for fundus images was proposed based on the corpus, and the collected data were trained. Results: The average overall F1 Score (OF1) and the average per-class F1 Score (CF1) of the model were 0.808 and 0.792 respectively. The area under the ROC curve (AUC) of our proposed model reached 0.986, 0.954, 0.946, 0.957, 0.952, 0.889, 0.937 and 0.926 for detecting laser scars, drusen, cup disc ratio, hemorrhages, retinal arteriosclerosis, microaneurysms, hard exudates and soft exudates, respectively. Conclusions: Our results demonstrated that our proposed model can detect a variety of lesions in the color images of the fundus, which lays a foundation for assisting doctors in diagnosis and makes it possible to carry out rapid and efficient large-scale screening of fundus lesions.
...5.Research on epileptic EEG recognition based on improved residual networks of 1-D CNN and indRNN
- 关键词:
- Epilepsy; Residual network; CNN; indRNN; RCNN;CLASSIFICATION; PREDICTION; TERM
- Ma, Mengnan;Cheng, Yinlin;Wei, Xiaoyan;Chen, Ziyi;Zhou, Yi
- 《International Conference on Health Big Data and Artificial Intelligence》
- 2021年
- OCT 29-NOV 01, 2020
- Guangzhou, PEOPLES R CHINA
- 会议
Background Epilepsy is one of the diseases of the nervous system, which has a large population in the world. Traditional diagnosis methods mostly depended on the professional neurologists' reading of the electroencephalogram (EEG), which was time-consuming, inefficient, and subjective. In recent years, automatic epilepsy diagnosis of EEG by deep learning had attracted more and more attention. But the potential of deep neural networks in seizure detection had not been fully developed. Methods In this article, we used a one-dimensional convolutional neural network (1-D CNN) to replace the residual network architecture's traditional convolutional neural network (CNN). Moreover, we combined the Independent recurrent neural network (indRNN) and CNN to form a new residual network architecture-independent convolutional recurrent neural network (RCNN). Our model can achieve an automatic diagnosis of epilepsy EEG. Firstly, the important features of EEG were learned by using the residual network architecture of 1-D CNN. Then the relationship between the sequences were learned by using the recurrent neural network. Finally, the model outputted the classification results. Results On the small sample data sets of Bonn University, our method was superior to the baseline methods and achieved 100% classification accuracy, 100% classification specificity. For the noisy real-world data, our method also exhibited powerful performance. Conclusion The model we proposed can quickly and accurately identify the different periods of EEG in an ideal condition and the real-world condition. The model can provide automatic detection capabilities for clinical epilepsy EEG detection. We hoped to provide a positive significance for the prediction of epileptic seizures EEG.
...6.DRACP: a novel method for identification of anticancer peptides
- 关键词:
- Anticancer peptides; Deep belief network; Relevance vector machine;Random forest; Cancer;AMINO-ACID-COMPOSITION; TOOL
- Zhao, Tianyi;Hu, Yang;Zang, Tianyi
- 《Biological Ontologies and Knowledge Bases Workshop》
- 2020年
- NOV 18-21, 2019
- San Diego, CA
- 会议
BackgroundMillions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs.ResultsFirstly, we extracted the feature of ACPs in two aspects: sequence and chemical characteristics of amino acids. For sequence, average 20 amino acids composition was extracted. For chemical characteristics, we classified amino acids into six groups based on the patterns of hydrophobic and hydrophilic residues. Then, deep belief network has been used to encode the features of ACPs. Finally, we purposed Random Relevance Vector Machines to identify the true ACPs. We call this method 'DRACP' and tested the performance of it on two independent datasets. Its AUC and AUPR are higher than 0.9 in both datasets.ConclusionWe developed a novel method named 'DRACP' and compared it with some traditional methods. The cross-validation results showed its effectiveness in identifying ACPs.
...7.LncDisAP: A computation model for LncRNA-disease association prediction based on multiple biological datasets (Open Access)
- Wang, Yongtian ; Juan, Liran ; Peng, Jiajie ; Zang, Tianyi ; Wang, Yadong
- 《BMC Bioinformatics》
- 2019年
- 会议
Background: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. Results: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. Conclusion: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment. © 2019 The Author(s).
...8.LncDisAP: A computation model for LncRNA-disease association prediction based on multiple biological datasets
- 关键词:
- Statistical tests;Diagnosis;Collaborative filtering;Large dataset;Computation theory;Computational methods;RNA;Network coding;Computational model;Disease associations;Functional associations;Functional similarity;Non-coding RNAs;Random walking with restart;Receiver operating characteristic curves;Recommendation strategies
- Wang, Yongtian;Juan, Liran;Peng, Jiajie;Zang, Tianyi;Wang, Yadong
- 2019年
- 会议
Background: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. Results: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. Conclusion: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment. © 2019 The Author(s).
...9.Human mitochondrial genome compression using machine learning techniques
- 关键词:
- Compression; Human mitochondrial genomes; Machine learning
- Wang, Rongjie;Zang, Tianyi;Wang, Yadong
- 《IEEE International Conference on Bioinformatics and Biomedicine -Human Genomics》
- 2019年
- DEC 03-06, 2018
- Madrid, SPAIN
- 会议
Background In recent years, with the development of high-throughput genome sequencing technologies, a large amount of genome data has been generated, which has caused widespread concern about data storage and transmission costs. However, how to effectively compression genome sequences data remains an unsolved problem. Results In this paper, we propose a compression method using machine learning techniques (DeepDNA), for compressing human mitochondrial genome data. The experimental results show the effectiveness of our proposed method compared with other on the human mitochondrial genome data. Conclusions The compression method we proposed can be classified as non-reference based method, but the compression effect is comparable to that of reference based methods. Moreover, our method not only have a well compression results in the population genome with large redundancy, but also in the single genome with small redundancy. The codes of DeepDNA are available at .
...10.Mining Pharmaceutical Product Data Related to Payment Pattern from the CMS Open Payments Data: A Case Study in Thoracic Surgery
- 关键词:
- Health expenditures; drug industry; medical informatics
- Na, Xu;Guo, Haihong;Wu, Sizhu;Li, Jiao
- 《17th World Congress of Medical and Health Informatics 》
- 2019年
- AUG 25-30, 2019
- Int Med Informat Assoc, Lyon, FRANCE
- 会议
This study used descriptive statistical analyses to investigate the payment characteristics and to discuss the regularity of highest paying industries. Payments by 4.70% of highest paying industries (N=446) accounted for 85% of the total (US $72,458,304) in 2014-2016. A tiny minority of highest paying industries control the majority of payments. Large payments from these industries are highly associated with few specific products. Furthermore, payment patterns among the industries include concentration and diversification.
...
