大规模并行应用软件协同开发与运行支撑环境

项目来源

国(略)研(略)((略)D(略)

项目主持人

廖(略)

项目受资助机构

中(略)

立项年度

2(略)

立项时间

未(略)

项目编号

2(略)YFB0202201

项目级别

国(略)

研究期限

未(略) (略)

受资助金额

0(略)万(略)

学科

高(略)算

学科代码

未(略)

基金类别

“高(略)算”重点专项

关键词

E级 ; 高(略)算 ; 协同设计 ; 应用开发 ; Exascale ; HPC ; Co-Design ; Application Development

参与者

杜(略)钟(略)郭(略)余(略)

参与机构

未(略)

项目标书摘要:E级(略),体系结构复杂,针(略)临编程复杂、并行开(略)缩短E级应用开发周(略),建立适用于E级计(略)环境势在必行。本文(略)方法、多源数据的管(略)行支撑综合优化技术(略)文完成了支持协同开(略)用户对接不同国产超(略)境;实现对结构和非(略),研究了数据的快速(略)作流相结合的数据管(略)用定制化和优化运行(略)应用研究提升运行系(略)现对多集群的统一管(略)应用的工作流管理服(略)理相配合实现计算任(略)经建立大规模并行应(略)的原型系统,对公众(略)

Applicati(略): As exas(略)omputer h(略)scale sys(略)plex arch(略)ltiple ch(略)r develop(略)e of exas(略) such as (略)rogrammin(略)fficiency(略)l develop(略)risen.To (略) cycle of(略)pplicatio(略)nt and im(略)opment an(略)al effici(略)st soluti(略)rm a suit(略)ign and o(略)environme(略)cale supe(略)This pape(略)the inves(略)r critica(略)including(略)based co-(略)lopment a(略)ti-source(略)ement,and(略)optimizat(略) multi-le(略)e stacks (略)ystem,com(略)ary and s(略)mework.He(略)ieved int(略)elopment (略) for co-d(略)ed manage(略)th struct(略)structure(略)is achiev(略)stigated (略)ogy of fa(略)ex and qu(略)a managem(略)rating wo(略)nology;(i(略)echnology(略)pplicatio(略)tion and (略)n of oper(略)onment,in(略)the metho(略)e IO perf(略) big data(略)ns;(iii)r(略) multi-cl(略)ed manage(略)lly estab(略)main-orie(略)ow manage(略)e which c(略)high effi(略)uling of (略)asks in c(略)with data(略) and clus(略)ent.This (略) establis(略)type syst(略)borative (略) and oper(略)rge-scale(略)pplicatio(略)vailable (略)ic.

项目受资助省

广(略)

  • 排序方式:
  • 1
  • /
  • 1.大规模并行应用软件协同开发与运行支撑环境年度报告(Annual Report on Large Scale Parallel Co-design Software development and Operational Environment)

    • 关键词:
    • E级、高性能计算、协同设计、应用开发、Exascale、HPC、Co-Design、Application Development
    • 廖湘科;杜云飞;钟康游;郭贵鑫;
    • 《中山大学;中山大学;中山大学;中山大学;》
    • 2019年
    • 报告

    目前各国不遗余力地推进E级系统生态的建设。E级系统生态不仅包括 E 级主机,也包括发挥E 级机计算能力所需的方法和工具。而E级超级计算机系统规模庞大、体系结构复杂,针对E级系统的软件研发面临编程复杂、并行开发效率低下等挑战。为了缩短E级应用开发周期,提升开发和运行效率,建立适用于E级计算的协同开发与运行支撑环境势在必行。 本文就构件化的协同设计开发方法、多源数据的管理技术以及跨软件栈的运行支撑综合优化技术等关键问题展开研究。 本文完成了支持协同开发的集成开发环境,支持用户对接不同国产超级计算机的开发和运行环境;研究了数据的快速索引与查询技术,实现在高性能计算系统上对基于并行文件系统管理科学大数据进行高效地索引与查询;研究与工作流相结合的数据管理技术,实现结合数据位置信息优化调度流程,支撑高性能计算、大数据分析等融合工作流的数据管理;研究针对领域应用定制化和优化运行环境的技术,优化容器运行时系统支持RDMA网络虚拟化,针对大数据等IO密集型应用研究提升运行系统的IO性能的方法并实现一套IO优化软件框架;研究对多集群资源的统一管理技术,研究同一集群上不同类型应用负载的融合调度技术,并初步建立面向领域应用的工作流管理服务,与数据管理、集群管理相配合实现计算任务的高效调度。 本课题已经建立大规模并行应用的协同开发与运行支撑的原型系统,对公众提供服务。 At present,China,the United States,Europe,and Japan have spared no effort to promote the construction of exascale computing ecosystem.exascale computing ecosystem includes not only the exascale computing facilities,but also the methods and tools required to exert the computing capabilities of the exascale computer. As exascale supercomputer has a large scale system and complex architecture,multiple challenges for developing software of exascale system such as intricate programming and low efficiency of parallel development have arisen.To shorten the cycle of exascale application development and improve development and operational efficiency,the best solution is to form a suitable co-design and operational environment for exascale supercomputing. This paper presents the investigation for critical problems including component-based co-design development approach,multi-source data management,and operation optimization through multi-level software stacks including system,computing library and software framework. Here we(i)achieved an integrated development environment for co-design that supports users to connect with different domestic supercomputers in the development and operating environment;(ii)investigated the technology of fast data index and query technology to manage the scientific big data based on parallel file systems,data management collaborating workflow technology to achieve optimized scheduling processes combined with data location information;(iii)studied technology aimed at application customization and optimization of operating environment,optimized the container runtime system to support RDMA virtualization,investigated the method to improve IO performance for big data applications,realized the multi-cluster unified management system and the uber-scheduler that has the ability to schedule resource across different workload engines(Slurm and Kubernetes),initially established a domain-oriented workflow management service which can realize high efficient scheduling of computing tasks in combination with data management and cluster management. This project has established a prototype system for collaborative development and operation support of large-scale parallel applications,and is available to the public.

    ...
  • 2.大规模并行应用软件协同开发与运行支撑环境技术报告(Large Scale Parallel Co-design Software development and Operational Environment Technology Report)

    • 关键词:
    • E级、高性能计算、协同设计、应用开发、Exascale、HPC、Co-Design、Application Development
    • 廖湘科;杜云飞;钟康游;郭贵鑫;余阳;
    • 《中山大学;中山大学;中山大学;中山大学;中山大学;》
    • 2019年
    • 报告

    E级超级计算机系统规模庞大,体系结构复杂,针对E级系统的软件研发面临编程复杂、并行开发效率低下等挑战。为了缩短E级应用开发周期,提升开发和运行效率,建立适用于E级计算的协同开发与运行支撑环境势在必行。本文就构件化的协同设计开发方法、多源数据的管理技术以及跨软件栈的运行支撑综合优化技术等关键问题展开研究。本文完成了支持协同开发的集成开发环境,支持用户对接不同国产超级计算机的开发和运行环境;实现对结构和非结构化数据库的统一管理,研究了数据的快速索引与查询技术以及与工作流相结合的数据管理技术;研究针对领域应用定制化和优化运行环境的技术,针对大数据应用研究提升运行系统的IO性能的方法;实现对多集群的统一管理,并初步建立面向领域应用的工作流管理服务,与数据管理、集群管理相配合实现计算任务的高效调度。本课题已经建立大规模并行应用的协同开发与运行支撑的原型系统,对公众提供服务。 As exascale supercomputer has a large scale system and complex architecture,multiple challenges for developing software of exascale system such as intricate programming and low efficiency of parallel development have arisen.To shorten the cycle of exascale application development and improve development and operational efficiency,the best solution is to form a suitable co-design and operational environment for exascale supercomputing.This paper presents the investigation for critical problems including component-based co-design development approach,multi-source data management,and operation optimization through multi-level software stacks including system,computing library and software framework.Here we(i)achieved integrated development environment for co-design,unified management for both structured and unstructured database is achieved and investigated the technology of fast data index and query and data management collaborating workflow technology;(ii)studied technology aimed at application customization and optimization of operating environment,investigated the method to improve IO performance for big data applications;(iii)realized the multi-cluster unified management,initially established a domain-oriented workflow management service which can realize high efficient scheduling of computing tasks in combination with data management and cluster management.This project has established a prototype system of collaborative development and operation of large-scale parallel applications,and is available to the public.

    ...
  • 排序方式:
  • 1
  • /