基于关键词抽取的云环境密文检索研究

项目来源

国家自然科学基金(NSFC)

项目主持人

杨震

项目受资助机构

北京工业大学

立项年度

2016

立项时间

未公开

项目编号

61671030

项目级别

国家级

研究期限

未知 / 未知

受资助金额

58.00万元

学科

信息科学-电子学与信息系统-信息系统与系统安全

学科代码

F-F01-F0102

基金类别

面上项目

关键词

可搜索加密机制 ; 关键词标引 ; 检索风险分析 ; 查询扩展 ; 云环境检索 ; Cloud Information Retrieval ; Searchable Encryption Scheme ; Keyword Extraction ; Query Expansion ; Retrieval Risk Analysis

参与者

才智;庄俊玺;王坚;曹怀虎;姚应哲;李超阳;陈伟桐;李怡德

参与机构

中央财经大学

项目标书摘要:云计算深刻改变了现代信息服务的计算、存储和管理模式,当前愈来愈多的信息选择存储到远端云服务器中。但用户和云服务提供商之间缺乏互信,用户数据不得不先加密后存储到云中。正是因为文档在云端以加密形式存储,使得传统检索模型都因无法理解文档而失效,云信息检索面临极大的挑战。为了解决这一难题,首先,课题研究基于贝叶斯风险模型的云信息检索风险建模。通过将云检索视为特殊的信息检索问题,在贝叶斯风险模型框架下实现云检索的最小风险建模。在此基础上,研究适用于云计算场景的云文档关键词分析、抽取与索引建立机制。基于词语空间分布和统计特性结合的方法实现极高精度的云文档关键词抽取。再次,研究支持极端短文本检索的可检索加密协议设计。在保证用户隐私和信息安全的基础上,实现支持极端短文本检索的可检索加密协议设计,提高云信息检索性能。最后,课题将构建一个金融云信息检索原型验证系统,并建立可为本类研究提供样本的信息检索语料库。

Application Abstract: With the rapid growth of internet usage and decentralized computing,storage and management characteristics of modern information services have start a new trend,with more and more sensitive information being transferred to the cloud.Unfortunately,the mutual distrust between the data owner and the cloud service provider,data usually have to be encrypted prior to out-sourcing for data privacy and to protect data from unsolicited accesses,which presents the enormous challenge of using data effectively to retrieve documents.Since the encrypted document in cloud is incomprehensible,which creates great challenges in cloud retrieval model definition,keyword index building,and searchable encryption scheme design.To remedy these challenges,in this work,after a review of current research literature,we first build a cloud information retrieval framework and formalize its retrieval risk formally.Secondly,since the existing searchable encryption schemes suffered from the inappropriate keywords selection,a new keyword detection measure based on the spatial distribution of a particular word is proposed.Thirdly,we modify the current searchable encryption scheme to support the state-of-art information retrieval methods,such as vector space model,probabilistic modeling,and language modeling,while the current solution only support simple equality queries on encrypted data that provide a slight better result than random selection.Besides,a financial cloud information retrieval system and the corresponding corpus will be built based on the above theoretical research and deployed for practical uses.This project,having promising academic and practical values,will promote the modernization and scientific level of the modern information retrieval technologies.

项目受资助省

北京市

项目结题报告(全文)

云计算深刻改变了现代信息服务的计算、存储和管理模式,当前愈来愈多的信息选择存储到远端云服务器中。但用户和云服务提供商之间缺乏互信,用户数据不得不先加密后存储到云中。正是因为文档在云端以加密形式存储,使得传统检索模型都因无法理解文档而失效,云信息检索面临极大的挑战。经过四年的努力,课题组严格按照项目申请计划,顺利完成项目预定研究目标,获得以下成果:1.课题研究基于贝叶斯风险模型的云信息检索风险建模。通过将云检索视为特殊的信息检索问题,在贝叶斯风险模型框架下实现云检索的最小风险建模。2.在此基础上,研究适用于云计算场景的云文档关键词分析、抽取与索引建立机制。基于词语空间分布和统计特性结合的方法实现极高精度的云文档关键词抽取。3.再次,研究支持极端短文本检索的可检索加密协议设计。在保证用户隐私和信息安全的基础上,实现支持极端短文本检索的可检索加密协议设计,提高云信息检索性能。此外,针对典型的云计算场景,即工业互联网场景开展标准化工作。项目负责人作为主编,提出了国际标准《信息技术安全技术工业互联网平台安全参考模型》,在2018年国际网络安全标准化工组ISO/IEC JTC1 SC27会议上成功立项为SP研究项目,并在2019年法国会议上成为新工作提案(NP24392),是我国在工业互联网领域立项的第一个国际标准。团队开发出多款信息内容检索系统,包括文本时间摘要系统、微博推荐系统、突发事件分析系统等,参加国际文本检索会议(TREC)获得佳绩,在TREC 2019大会的Incident Streams Track上获得A轮性能单项指标第一的优异成绩。目前项目已在IEEE Transactions on Vehicular Technology、IEEE Transactions on Neural Networks and Learning Systems、电子学报等刊物上发表论文11篇,其中SCI/EI收录10/10篇,被他人引用120余次;主编国际标准(草案)1项,国家标准1项目;申请国家发明专利17项,其中授权4项,登记软件著作权5项;主办IEEE ICIVC’20国际会议;部分研究成果获2017年吴文俊人工智能科学技术奖一等奖。已经培养教授/博导1名,副教授1名,博士后1名;入选长城学者1名;培养研究生19名(其中4名博士生、15名硕士生)。

  • 排序方式:
  • 1
  • /
  • 2.Disk Failure Prediction with Multiple Channel Convolutional Neural Network

    • 关键词:
    • Long short-term memory;Convolutional neural networks;Fault detection;Convolution;Learning systems;Convolutional neural network;Datacenter;Date center;Deep learning;Disk failure;Failures prediction;Multiple channel convolutional neural network based LSTM;Multiple channels;Network-based;Prediction horizon
    • Wu, Jian;Yu, Haiyang;Yang, Zhen;Yin, Ruiping
    • 《2021 International Joint Conference on Neural Networks, IJCNN 2021》
    • 2021年
    • July 18, 2021 - July 22, 2021
    • Virtual, Shenzhen, China
    • 会议

    With the increase of data centers, the number of disks also grows rapidly. Therefore, the prediction of disk failures has become an important task for both academia and industry. Existing prediction schemes predict disk failure in the short prediction horizon or with a short time window. However, these schemes cannot achieve ideal performance for a long prediction horizon with a long time window. In this paper, we proposed a deep learning method that can effectively solve the above problems. We refine the Self-Monitoring, Analysis and Reporting Technology (SMART) attributes by using information entropy to select the most related attributes for prediction. Moreover, we proposed the Multiple Channel Convolutional Neural Network based LSTM (MCCNN-LSTM) model to predict whether disk failures will occur in a given disk in next few days. We further evaluate the MCCNN-LSTM model by comparing it with the state-of-the-art works. Extensive experiments show that our model can improve FDR (Fault Detection Rate) to 99.8% and reduce FAR (False Alarm Rate) to 0.2%.
    © 2021 IEEE.

    ...
  • 3.User Response-Based Fake News Detection on Social Media

    • 关键词:
    • Deep learning;Social networking (online);Fake detection;Information dissemination;Information retrieval;Random forests;Bag-of-words models;Categorical data;Communication platforms;Deep learning;Fake news detection;Information communication;Information sharing platforms;Mass scale;Social media;User’ response
    • Kidu, Hailay;Misgna, Haile;Li, Tong;Yang, Zhen
    • 《4th International Conference on Applied Informatics, ICAI 2021》
    • 2021年
    • October 28, 2021 - October 30, 2021
    • Buenos Aires, Argentina
    • 会议

    Social media has been a major information sharing and communication platform for individuals and organizations on a mass scale. Its ability to engage users to react to information posted on this media in the form of like, share, and comment made it a preferable information sharing platform by many. But the contents posted on social media are not filtered, fact checked or judged by an editorial body like any traditional news platform. Therefore, individuals, institutions and communities who consume news from social media are vulnerable to misinformation by malicious authors. In this work, we are proposing an approach that detects fake news by investigating the reaction of users to a post composed by malicious authors. Using features extracted by bag-of-words model and TF-IDF from text based replies (comments), and visual emotion responses in the form of categorical data, we built models that predicted news as fake or real. We have designed and conducted a series of experiments to evaluate the performance of our approach. The results show the proposed approach outperforms the baseline in all the six models. In particular, our models from random forest, logistic regression, and XGBoost algorithms produce a precision of 0.97, a recall of 0.99 and an F1 of 0.98.
    © 2021, Springer Nature Switzerland AG.

    ...
  • 4.CC-loss: Channel correlation loss for image classification

    • 关键词:
    • Classification (of information);Structure (composition);Computer vision;Deep learning;Channel correlation;Classification datasets;Discriminative ability;Euclidean distance matrices;Feature distribution;Feature embedding;Learning models;State of the art
    • Song, Zeyu;Chang, Dongliang;Ma, Zhanyu;Li, Xiaoxu;Tan, Zheng-Hua
    • 《25th International Conference on Pattern Recognition, ICPR 2020》
    • 2020年
    • January 10, 2021 - January 15, 2021
    • Virtual, Milan, Italy
    • 会议

    The loss function is a key component in deep learning models. A commonly used loss function for classification is the cross entropy loss, which is a simple yet effective application of information theory for classification problems. Based on this loss, many other loss functions have been proposed, e.g., by adding intra-class and inter-class constraints to enhance the discriminative ability of the learned features. However, these loss functions fail to consider the connections between the feature distribution and the model structure. Aiming at addressing this problem, we propose a channel correlation loss (CC-Loss) that is able to constrain the specific relations between classes and channels as well as maintain the intra-class and the inter-class separability. CC-Loss uses a channel attention module to generate channel attention of features for each sample in the training stage. Next, an Euclidean distance matrix is calculated to make the channel attention vectors associated with the same class become identical and to increase the difference between different classes. Finally, we obtain a feature embedding with good intra-class compactness and inter-class separability. Experimental results show that two different backbone models trained with the proposed CC-Loss outperform the state-of-the-art loss functions on three image classification datasets. © 2020 IEEE

    ...
  • 5.Research on Fast Kernel Subspace Face Recognition Based on Deep Belief Network

    • 关键词:
    • Metadata;Classification (of information);Classification algorithm;Conventional techniques;Deep belief networks;Feature transformations;High dimensional spaces;Low-dimensional spaces;Subspace face recognition;Technical and fundamental analysis
    • Wang, Jian;Wang, Shi;Zhang, Wei
    • 《2019 3rd International Conference on Electrical, Mechanical and Computer Engineering, ICEMCE 2019》
    • 2019年
    • August 9, 2019 - August 11, 2019
    • Guizhou, China
    • 会议

    Face recognition usually uses different features as input signals. There are many conventional techniques being used and these include technical and fundamental analysis. In this paper, the sample data is mapped from low-dimensional space to high-dimensional space by the kernel method, which makes the classification algorithm have the ability to deal with non-linear data and can solve the small sample problem. At the same time, deep belief network is used as feature transformation and classification to mine feature information of high-dimensional face data. The experimental results show that the optimal recognition rate of the proposed algorithm in a specific face database is up to 96%.
    © Published under licence by IOP Publishing Ltd.

    ...
  • 6.Analysis and Prediction of Satisfaction Index of Online Learning

    • 关键词:
    • E-learning;Students;Interaction evaluations;Learning satisfactions;Online platforms;Prediction accuracy;Process of learning;Social environment;Student teachers;Technological environment
    • Wang, Jian;Chai, Yanmei;Zhang, Wei;Zhang, Yuanyuan
    • 《2019 3rd International Conference on Electrical, Mechanical and Computer Engineering, ICEMCE 2019》
    • 2019年
    • August 9, 2019 - August 11, 2019
    • Guizhou, China
    • 会议

    In the process of learning with the help of online platform, students' learning satisfaction is an important factor that constitutes the effect of online teaching. On the basis of the existing research, this study proposes three factors, namely, cognitive level, technological environment and social environment, to constitute online learning satisfaction. At the same time, through statistical analysis, the key factors affecting learning satisfaction are sorted out, the selection model of identifying the key factors affecting learning satisfaction is established, and online learning satisfaction is fitted by prediction. The study shows that self-efficacy evaluation, teaching support evaluation, platform use evaluation and student-teacher interaction evaluation have a great impact on students' learning satisfaction in the process of online learning. At the same time, the 93% prediction accuracy of the prediction model based on the above-mentioned research can be achieved. In addition, this paper also puts forward suggestions on how to strengthen and improve learning satisfaction.
    © Published under licence by IOP Publishing Ltd.

    ...
  • 7.Short-term Load Prediction of Cloud Computing Based on Fuzzy Information Granulation SVM

    • 关键词:
    • Support vector machines;Forecasting;Granulation;Fuzzy information granulation;Gravitational search algorithm (GSA);Information granulation;Regression predictions;Short term load predictions;Short term loads;Simulation training;Three parameters
    • Wang, Jian;Zhang, Yuanyuan
    • 《2019 3rd International Conference on Electrical, Mechanical and Computer Engineering, ICEMCE 2019》
    • 2019年
    • August 9, 2019 - August 11, 2019
    • Guizhou, China
    • 会议

    In order to predict the short-term load variation range and trend of cloud computing, this paper proposed a prediction model based on information granulation support vector machine (IGSVM). Taking the historical load value as a sample to do simulation training, through Gravitational Search Algorithm (GSA) to optimize the parameters of SVM, and make regression prediction to three parameters of triangular fuzzy particles, Low, R and Up, to obtain the variation range and trend of short-term load. The result is consistent with the actual situation, which verifies the validity of the model and provides the basis for actual operation and maintenance.
    © Published under licence by IOP Publishing Ltd.

    ...
  • 排序方式:
  • 1
  • /