EarthCube Capabilities:OpenMindat-Open Access and Interoperable Mineralogy Data to Broaden Community Access and Advance Geoscience Research

项目来源

美国国家科学基金(NSF)

项目主持人

Xiaogang Ma

项目受资助机构

Regents of the University of Idaho

项目编号

2126315

立项年度

2021

立项时间

未公开

项目级别

国家级

研究期限

未知 / 未知

受资助金额

792475.00美元

学科

未公开

学科代码

未公开

基金类别

Standard Grant

关键词

EarthCube ; EXP PROG TO STIM COMP RES

参与者AI

林铭杰;翁岳鹏;陈明贤;游舒羽;马小刚;张继吟

参与机构AI

福建农林大学;爱达荷大学

项目标书摘要:Mindat is a community-driven,free-access,online database that records information about all known mineral species and their worldwide distribution.Although all the data on the Mindat website are free for users to browse,the machine interface for data access and download has never been fully established.The OpenMindat project will,for the first time,allow automated querying and downloads from this data resource for academic research.This effort includes technical developments to establish open data access,research and training activities to advance data curation and data-driven geoscience discovery,and outreach activities to EarthCube and the broad geoscience communities.Opening the Mindat data for free academic use will encourage a new generation of research in geosciences as well as other disciplines.Mindat is already an important resource for geoscience education.Currently,it receives more than 3.5 million page views every month.These new data access tools in OpenMindat will make it easier for educational access to mineralogical data in the classroom,the laboratory,and even from home,allowing students greater opportunities to experiment with mineralogical data science.The OpenMindat project will involve moving all appropriate Mindat data into an open science compatible license,building and operating a web-based platform for both automated queries and bulk data downloads,preparing all documentation on the use of this data,and building a suite of developer tools including packages in Python and R for direct data access from workflow platforms.OpenMindat will also deploy metadata standards to establish connections to EarthCube GeoCODES.The project will create several training positions and organize a list of engagement and outreach activities,with priorities given to underrepresented groups.Computational and statistical work on large mineralogy datasets has driven the recent studies in Mineral Evolution and Mineral Ecology of which the Mindat data was a critical component.OpenMindat will democratize this research allowing anyone wishing to utilize the Mindat data for research to do so immediately.Using machine learning techniques in combination with the OpenMindat dataset raises the possibility of finding previously unseen patterns in the mineralogical diversity on the Earth and beyond,such as,comparing the mineral assemblages and localities on Earth with other planets.This project will illustrate the importance of collecting and providing certain information when analyzing mineral samples and thus cause a cultural shift in mineralogical data collection and sharing.Likewise,the studies in this project will be an example for how rapidly scientific discovery can move forward when the data are in place and coupled with advanced analytical techniques and data science expertise.The service and tools that will be developed as part of OpenMindat will themselves be open-sourced and potentially of benefit to other projects wishing to provide access to their data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • 排序方式:
  • 1
  • /
  • 1.A review of machine learning in geochemistry and cosmochemistry: Method improvements and applications

    • 关键词:
    • LIBS; XAFS; Mapping; Water; soil prediction; Molecular machine learning;Reactive-transport modeling;INDUCED BREAKDOWN SPECTROSCOPY; REACTIVE TRANSPORT MODELS; UNDISCOVEREDMINERAL-DEPOSITS; ARTIFICIAL NEURAL-NETWORKS; SUPPORT VECTOR MACHINE;RANDOM FOREST; CENTRAL VALLEY; WATER-QUALITY; THEORETICAL CALCULATION;ISOTOPE FRACTIONATIONS

    The development of analytical and computational techniques and growing scientific funds collectively contribute to the rapid accumulation of geoscience data. The massive amount of existing data, the increasing complexity, and the rapid acquisition rates require novel approaches to efficiently discover scientific stories embedded in the data related to geochemistry and cosmochemistry. Machine learning methods can discover and describe the hidden patterns in intricate geochemical and cosmochemical big data. In recent years, considerable efforts have been devoted to the applications of machine learning methods in geochemistry and cosmochemistry. Here, we review the main applications including rock and sediment identification, digital mapping, water and soil quality prediction, and deep space exploration. Research method improvements, such as spectroscopy interpretation, numerical modeling, and molecular machine learning, are also discussed. Based on the up-to-date machine learning/deep learning techniques, we foresee the vast opportunities of implementing artificial intelligence and developing databases in geochemistry and cosmochemistry studies, as well as communicating geochemists/ cosmochemists and data scientists.

    ...
  • 2.Knowledge graph construction and application in geosciences: A review

    • 关键词:
    • Knowledge graph; Open data; Machine learning; Artificial intelligence;Data science;SEMANTIC WEB; INFORMATION EXTRACTION; IMAGE-ANALYSIS; DATA RESOURCES;ONTOLOGY; CLASSIFICATION; REPRESENTATION; VISUALIZATION; PROVENANCE;CHALLENGES

    Knowledge graph (KG) is a topic of great interests to geoscientists as it can be deployed throughout the data life cycle in data-intensive geoscience studies. Nevertheless, comparing with the large amounts of publications on machine learning applications in geosciences, summaries and reviews of geoscience KGs are still limited. The aim of this paper is to present a comprehensive review of KG construction and implementation in geosciences. It consists of four major parts: 1) concepts relevant to KG and approaches for KG construction, 2) KG application in data collection, curation, and service, 3) KG application in data analysis, and 4) challenges and trends of geoscience KG creation and application in the near future. For each of the first three parts, a list of concepts, exemplar studies, and best practices are summarized. Those summaries are synthesized together in the challenge and trend analyses. As artificial intelligence and data science are thriving in geosciences, we hope this review of geoscience KGs can be of value to practitioners in data-intensive geoscience studies.

    ...
  • 3.A review of machine learning in geochemistry and cosmochemistry: Method improvements and applications

    • 关键词:
    • LIBS; XAFS; Mapping; Water; soil prediction; Molecular machine learning;Reactive-transport modeling;INDUCED BREAKDOWN SPECTROSCOPY; REACTIVE TRANSPORT MODELS; UNDISCOVEREDMINERAL-DEPOSITS; ARTIFICIAL NEURAL-NETWORKS; SUPPORT VECTOR MACHINE;RANDOM FOREST; CENTRAL VALLEY; WATER-QUALITY; THEORETICAL CALCULATION;ISOTOPE FRACTIONATIONS

    The development of analytical and computational techniques and growing scientific funds collectively contribute to the rapid accumulation of geoscience data. The massive amount of existing data, the increasing complexity, and the rapid acquisition rates require novel approaches to efficiently discover scientific stories embedded in the data related to geochemistry and cosmochemistry. Machine learning methods can discover and describe the hidden patterns in intricate geochemical and cosmochemical big data. In recent years, considerable efforts have been devoted to the applications of machine learning methods in geochemistry and cosmochemistry. Here, we review the main applications including rock and sediment identification, digital mapping, water and soil quality prediction, and deep space exploration. Research method improvements, such as spectroscopy interpretation, numerical modeling, and molecular machine learning, are also discussed. Based on the up-to-date machine learning/deep learning techniques, we foresee the vast opportunities of implementing artificial intelligence and developing databases in geochemistry and cosmochemistry studies, as well as communicating geochemists/ cosmochemists and data scientists.

    ...
  • 4.A review of Earth Artificial Intelligence

    • 关键词:
    • Geosphere; Hydrology; Atmosphere; Artificial intelligence/machinelearning; Big data; Cyberinfrastructure;LOGISTIC-REGRESSION; NEURAL-NETWORKS; SURFACE-WATER; PREDICTION; MODELS;OCEAN; ALGORITHM; CLASSIFICATION; ENSEMBLE; IDENTIFICATION

    In recent years, Earth system sciences are urgently calling for innovation on improving accuracy, enhancing model intelligence level, scaling up operation, and reducing costs in many subdomains amid the exponentially accumulated datasets and the promising artificial intelligence (AI) revolution in computer science. This paper presents work led by the NASA Earth Science Data Systems Working Groups and ESIP machine learning cluster to give a comprehensive overview of AI in Earth sciences. It holistically introduces the current status, technology, use cases, challenges, and opportunities, and provides all the levels of AI practitioners in geosciences with an overall big picture and to "blow away the fog to get a clearer vision" about the future development of Earth AI. The paper covers all the majorspheres in the Earth system and investigates representative AI research in each domain. Widely used AI algorithms and computing cyberinfrastructure are briefly introduced. The mandatory steps in a typical workflow of specializing AI to solve Earth scientific problems are decomposed and analyzed. Eventually, it concludes with the grand challenges and reveals the opportunities to give some guidance and pre-warnings on allocating resources wisely to achieve the ambitious Earth AI goals in the future.

    ...
  • 排序方式:
  • 1
  • /