Developing AI-assisted distributed systems for spatio-temporal data
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
1.Random Sampling Over Spatial Range Joins
- 关键词:
- ;Computational costs;Geo-spatial;Geographic information;Geospatial point;Join;Location based;Random sample;Random sampling;Social networking services;Time efficient algorithms
- Amagata, Daichi
- 《41st IEEE International Conference on Data Engineering, ICDE 2025》
- 2025年
- May 19, 2025 - May 23, 2025
- Hong Kong, China
- 会议
Spatial range joins have many applications, including geographic information systems, location-based social networking services, neuroscience, and visualization. However, joins incur not only expensive computational costs but also too large result sets. A practical and reasonable approach to alleviating these issues is to return random samples of the join results. Although this is promising and sufficient for many applications involving spatial range joins, efficiently computing random samples is not trivial. This is because we must obtain random join samples without running spatial range joins. We address this challenging problem for the first time and aim at designing a time- and space-efficient algorithm. First, we design two baseline algorithms that employ existing techniques for random sampling and show that they are not efficient. Then, we propose a new data structure that can deal with our problem in Õ(n+m+t) expected time and O(n+m) space, where n and m are the sizes of two point sets and t is the required number of samples. We conduct extensive experiments using four real spatial datasets, and the results demonstrate that our algorithm is significantly faster than the baselines in most tests. © 2025 IEEE.
...2.Top-k Range Search on Weighted Interval Data
- 关键词:
- Information management;Large datasets;Query processing;Analytical applications;Efficient managements;Interval data;Range search;Real-world datasets;Top-k query;Weighted intervals
- Amagata, Daichi;Lee, Jimin
- 《19th International Symposium on Spatial and Temporal Data, SSTD 2025》
- 2025年
- August 25, 2025 - August 27, 2025
- Osaka, Japan
- 会议
Weighted intervals are ubiquitous because many objects are associated with temporal and numeric dimensions. As interval datasets are usually large, efficient management and processing of large weighted interval data are required. This paper addresses the problem of top-k range search on weighted interval data, which retrieves k intervals with the largest weight among a set of intervals overlapping a given query interval. It finds important analytical applications for vehicles, events, and cryptocurrencies. Existing algorithms for range search on interval data are inefficient for this problem, because they need to search for all intervals that overlap a given query interval. To overcome this inefficiency issue, we first provide a baseline algorithm and then propose two data structures and their associated algorithms. Our first proposed algorithm is practically fast but requires O(n log k) time, where n is the number of intervals, whereas the other requires less than O(n log k) time. We conduct extensive experiments on real-world datasets, and the results show that our algorithms outperform baseline techniques in most cases. © 2025 Copyright held by the owner/author(s)
...3.An Efficient Framework forApproximate Nearest Neighbor Search onHigh-Dimensional Multi-metric Data
- 关键词:
- Graph embeddings;Large datasets;Approximate kNN search;Approximate Nearest Neighbor Search;High-dimensional;Higher dimensions;Higher-dimensional;Metric spaces;Multi-metric data;Multi-metrics;Multi-modal data;Near neighbor searches
- Uemura, Reon;Amagata, Daichi;Hara, Takahiro
- 《17th International Conference on Similarity Search and Applications, SISAP 2024》
- 2025年
- November 4, 2024 - November 6, 2024
- Providence, RI, United states
- 会议
Many objects are often multi-modal data consisting of images, videos, documents, etc., where each model exists in a different metric space. Similarity searches on multi-modal data are widely employed in information retrieval and machine learning. In these applications, each modal is usually represented as dense high-dimensional data (e.g., an embedding vector). This paper addresses the problem of nearest neighbor search (NNS) on high-dimensional multi-metric data. Although some techniques for NNS on multi-metric data exist, they do not consider high-dimensional data and query-dependent weights in multi-metric spaces. A straightforward yet fast algorithm for "exact" NNS on high-dimensional multi-metric data is linear scan, but obtaining exact results is slow on large datasets. We therefore propose an efficient framework for approximate NNS on high-dimensional multi-metric data. For each metric space, we build the same type of proximity graph that allows any search-start node. An approximate NN is found by traversing the proximity graphs while carefully selecting start nodes to avoid redundant node accesses. We conduct experiments on real-world high-dimensional multi-metric data, and the results demonstrate that our framework outperforms state-of-the-art algorithms. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
...
