Developing AI-assisted distributed systems for spatio-temporal data

项目来源

日本学术振兴会基金(JSPS)

项目主持人

天方 大地

项目受资助机构

大阪大学

项目编号

24K14961

立项年度

2024

立项时间

未公开

研究期限

未知 / 未知

项目级别

国家级

受资助金额

4680000.00日元

学科

データベース関連

学科代码

未公开

基金类别

基盤研究(C)

关键词

時空間データ ; 分散処理システム ;

参与者

佐々木勇和

参与机构

未公开

项目标书摘要:Outline of Research at the Start:Webシステムの発展による時空間データの大規模化に伴い、複数の計算機を用いて時空間データを並列に処理する分散処理システムがプラットフォームとして一般的となっている。本研究では、さらなる深化と効率化を目指し、人工知能技術を活用した時空間データ分散処理システムを開発する。具体的には、(1)深層強化学習を用いた時空間データのパーティショニング技術の実現、(2)機械学習を用いたタスク割当て技術の実現、および(3)これらの技術を活用した時空間データ分散処理システムの開発、の3つの課題に取り組む。これらの技術を開発することにより、「ビッグデータ解析を瞬時に行える世界」の実現に貢献する。

  • 排序方式:
  • 1
  • /
  • 1.Random Sampling Over Spatial Range Joins

    • 关键词:
    • ;Computational costs;Geo-spatial;Geographic information;Geospatial point;Join;Location based;Random sample;Random sampling;Social networking services;Time efficient algorithms
    • Amagata, Daichi
    • 《41st IEEE International Conference on Data Engineering, ICDE 2025》
    • 2025年
    • May 19, 2025 - May 23, 2025
    • Hong Kong, China
    • 会议

    Spatial range joins have many applications, including geographic information systems, location-based social networking services, neuroscience, and visualization. However, joins incur not only expensive computational costs but also too large result sets. A practical and reasonable approach to alleviating these issues is to return random samples of the join results. Although this is promising and sufficient for many applications involving spatial range joins, efficiently computing random samples is not trivial. This is because we must obtain random join samples without running spatial range joins. We address this challenging problem for the first time and aim at designing a time- and space-efficient algorithm. First, we design two baseline algorithms that employ existing techniques for random sampling and show that they are not efficient. Then, we propose a new data structure that can deal with our problem in Õ(n+m+t) expected time and O(n+m) space, where n and m are the sizes of two point sets and t is the required number of samples. We conduct extensive experiments using four real spatial datasets, and the results demonstrate that our algorithm is significantly faster than the baselines in most tests. © 2025 IEEE.

    ...
  • 2.Top-k Range Search on Weighted Interval Data

    • 关键词:
    • Information management;Large datasets;Query processing;Analytical applications;Efficient managements;Interval data;Range search;Real-world datasets;Top-k query;Weighted intervals
    • Amagata, Daichi;Lee, Jimin
    • 《19th International Symposium on Spatial and Temporal Data, SSTD 2025》
    • 2025年
    • August 25, 2025 - August 27, 2025
    • Osaka, Japan
    • 会议

    Weighted intervals are ubiquitous because many objects are associated with temporal and numeric dimensions. As interval datasets are usually large, efficient management and processing of large weighted interval data are required. This paper addresses the problem of top-k range search on weighted interval data, which retrieves k intervals with the largest weight among a set of intervals overlapping a given query interval. It finds important analytical applications for vehicles, events, and cryptocurrencies. Existing algorithms for range search on interval data are inefficient for this problem, because they need to search for all intervals that overlap a given query interval. To overcome this inefficiency issue, we first provide a baseline algorithm and then propose two data structures and their associated algorithms. Our first proposed algorithm is practically fast but requires O(n log k) time, where n is the number of intervals, whereas the other requires less than O(n log k) time. We conduct extensive experiments on real-world datasets, and the results show that our algorithms outperform baseline techniques in most cases. © 2025 Copyright held by the owner/author(s)

    ...
  • 3.PolyCard: A learned cardinality estimator for intersection queries on spatial polygons

    • 关键词:
    • Cardinality estimation; Query optimization; Spatial polygon; Machinelearning
    • Ji, Yuchen;Amagata, Daichi;Sasaki, Yuya;Hara, Takahiro
    • 《JOURNAL OF INTELLIGENT INFORMATION SYSTEMS》
    • 2025年
    • 期刊

    How can we estimate the result size for a given query on complex spatial objects like polygons? Estimating a query's result size, also known as the cardinality estimation, plays a significant role in query scheduling and optimization. Accurate and fast cardinality estimation substantially improves query efficiency. Existing compatible solutions, mainly histogram-based, deal with polygons as their minimal bounding rectangles for easier processing, which leads to inaccurate estimation. To address this issue, we present PolyCard, a learned cardinality estimator for intersection queries on spatial polygons. We successfully apply learning techniques to spatial polygons with variable sizes. PolyCard has the following properties. (i) Accurate: PolyCard improves 30% accuracy compared with existing solutions, (ii) Fast: PolyCard takes only 4 microseconds for an estimation, and (iii) Stable: PolyCard is robust against datasets and queries of different cardinalities. Our experiments on four real-world datasets of millions of polygons demonstrate the efficiency and effectiveness of PolyCard.

    ...
  • 4.An Efficient Framework forApproximate Nearest Neighbor Search onHigh-Dimensional Multi-metric Data

    • 关键词:
    • Graph embeddings;Large datasets;Approximate kNN search;Approximate Nearest Neighbor Search;High-dimensional;Higher dimensions;Higher-dimensional;Metric spaces;Multi-metric data;Multi-metrics;Multi-modal data;Near neighbor searches
    • Uemura, Reon;Amagata, Daichi;Hara, Takahiro
    • 《17th International Conference on Similarity Search and Applications, SISAP 2024》
    • 2025年
    • November 4, 2024 - November 6, 2024
    • Providence, RI, United states
    • 会议

    Many objects are often multi-modal data consisting of images, videos, documents, etc., where each model exists in a different metric space. Similarity searches on multi-modal data are widely employed in information retrieval and machine learning. In these applications, each modal is usually represented as dense high-dimensional data (e.g., an embedding vector). This paper addresses the problem of nearest neighbor search (NNS) on high-dimensional multi-metric data. Although some techniques for NNS on multi-metric data exist, they do not consider high-dimensional data and query-dependent weights in multi-metric spaces. A straightforward yet fast algorithm for "exact" NNS on high-dimensional multi-metric data is linear scan, but obtaining exact results is slow on large datasets. We therefore propose an efficient framework for approximate NNS on high-dimensional multi-metric data. For each metric space, we build the same type of proximity graph that allows any search-start node. An approximate NN is found by traversing the proximity graphs while carefully selecting start nodes to avoid redundant node accesses. We conduct experiments on real-world high-dimensional multi-metric data, and the results demonstrate that our framework outperforms state-of-the-art algorithms. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

    ...
  • 排序方式:
  • 1
  • /