Worst-Case-Optimal Similarity Joins on Graph Databases
- Arroyuelo, Diego 2
- Bustos, Benjamin 3
- Gómez-Brandón, Adrián 5
- Hogan, Aidan 1
- Navarro, Gonzalo 1
- Reutter, Juan 4
- 1 DCC, University of Chile & IMFD, Santiago, Chile
- 2 DCC, Escuela de Ingeniería, Pontificia Universidad Católica & IMFD, Santiago, Chile
- 3 DCC, University of Chile & IMFD, Santiago, Chile, Chile
- 4 DCC, Escuela de Ingeniería, Pontificia Universidad Católica & Instituto de Ingeniería Matemática y Computaciónal, Pontificia Universidad Católica, Santiago, Chile
-
5
Universidade da Coruña
info
ISSN: 2836-6573
Año de publicación: 2024
Volumen: 2
Número: 1
Páginas: 1-26
Tipo: Artículo
Otras publicaciones en: Proceedings of the ACM on Management of Data
Resumen
We extend the concept of worst-case optimal equijoins in graph databases to the case where some nodes arerequired to be within the ?-nearest neighbors (?-NN) of others under some similarity function. We modelthe problem by superimposing the database graph with the ?-NN graph and show that a variant of LeapfrogTrieJoin (LTJ) implemented over a compact data structure called the Ring can be seamlessly extended tointegrate similarity clauses with the equijoins in the LTJ query process, retaining worst-case optimality inmany relevant cases. Our experiments on a benchmark that combines Wikidata and IMGpedia show that ourenhanced LTJ algorithm outperforms by a considerable margin a baseline that first applies classic LTJ andthen completes the query by applying the similarity predicates. The difference is more pronounced on querieswhere the similarity clauses are more densely connected to the query, becoming of an order of magnitude insome cases.CCS Concepts: • Theory of computation → Database query processing and optimization (theory);Data structures and algorithms for data management.Additional Key Words and Phrases: Worst-case optimal joins; Leapfrog Triejoin; graph patterns; graphdatabases; graph indexing; similarity joins; nearest-neighbor graphs
Información de financiación
Financiadores
-
Ministerio de Ciencia e Innovación
- MCIN/AEI/10.13039/501100011033 TED2021-129245B-C21
- EI/10.13039/5011000-11033: grant PID2020-114635RB-I00
- A-EI/10.13039/501100011033 PID2022-141027NB- C21
-
Agencia Nacional de Investigación y Desarrollo
- CN17_002
- Fondecyt 122192
- fondecyt 1-230755
- fondecyt 1221799
Referencias bibliográficas
- 10.1145/2902251.2902289
- 10.1145/3034786.3056105
- 10.14778/3184470.3184473
- 10.1145/3104031
- 10.1145/1322432.1322433
- 10.1145/3448016.3457256
- 10.1145/3514231
- B. Arsintescu. How LIquid connects everything so our members can do anything. LinkedIn Engineering Blog, 2023. https://engineering.linkedin.com/blog/2023/how-liquid-connects-everything-so-our-members-can-do-anything.
- 10.1137/110859440
- 10.3233/SW-2012-0065
- B. R. Bebee, D. Choi, A. Gupta, A. Gutmans, A. Khandelwal, Y. Kiran, S. Mallidi, B. McGaughy, M. Personick, K. Rajan, S. Rondelli, A. Ryazanov, M. Schmidt, K. Sengupta, B. B. Thompson, D. Vaidya, and S. Wang. Amazon Neptune: Graph data management in the Cloud. In Proc. ISWC Posters & Demonstrations, 2018.
- 10.1145/3308558.3313472
- 10.1109/ICDE.2006.9
- W. W. Cohen and H. Hirsh. Joins that generalize: Text classification using WHIRL. In Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD), pages 169--173, 1998.
- 10.24432/C5CC9H
- 10.1145/2948992.2949016
- 10.1371/journal.pone.0074113
- 10.1145/3514221.3526057
- 10.1016/0925-7721(95)00009-7
- 10.1145/1963405.1963487
- 10.1007/978-3-319-68204-4_8
- 10.1007/978-3-030-62419-4_12
- 10.14778/3407790.3407797
- 10.1145/3398682.3399162
- 10.1007/978-3-319-67162-8_26
- 10.1007/978-3-319-07959-2_28
- R. Grossi, A. Gupta, and J. Vitter. High-order entropy-compressed text indexes. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841--850, 2003.
- 10.14778/2733004.2733010
- 10.1080/00031305.1998.10480559
- 10.1145/276304.276326
- 10.1007/978-3-642-30284-8_53
- 10.1007/978-3-030-30793-6_15
- 10.1109/ICDE53745.2022.00247
- 10.1007/978-3-540-76298-0_22
- 10.24432/C50S4B
- M. Koklu and I. A. Ozkan. Multiclass classification of dry beans using computer vision and machine learning techniques. Computers and Electronics in Agriculture, 174:article 105507, 2020.
- 10.1007/978-3-030-00668-6_23
- 10.1145/3446980
- 10.14778/3554821.3554894
- 10.1007/3-540-62034-6_35
- 10.1016/j.jda.2013.07.004
- 10.1145/3196959.3196990
- 10.1145/3180143
- 10.1145/2764947.2764948
- 10.1007/11764298_8
- 10.3233/SW-170285
- 10.1109/ICDE.2010.5447873
- 10.1145/1978802.1978813
- 10.14778/3397230.3397250
- 10.14778/3476249.3476306
- 10.1007/BF02187718
- T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. International Conference on Database Theory (ICDT), pages 96--106, 2014.
- 10.1145/2629489
- 10.1109/CVPR.2012.6247790
- 10.1109/ICDE.1996.492202
- 10.1007/0-387-29151-2
- 10.1109/GEOINFORMATICS.2010.5567605
- Y.-M. Zhang, K. Huang, G. Geng, and C.-L. Liu. Fast knn graph construction with locality sensitive hashing. In Machine Learning and Knowledge Discovery in Databases, pages 660--674. Springer, 2013.