Popularity-Based Ranking for Fast Approximate kNN Search

Antol, Matej; Dohnal, Vlastislav

doi:10.15388/Informatica.2017.118

Informatica

Popularity-Based Ranking for Fast Approximate kNN Search

Volume 28, Issue 1 (2017), pp. 1–21

Matej Antol Vlastislav Dohnal

https://doi.org/10.15388/Informatica.2017.118

Pub. online: 1 January 2017 Type: Research Article

Open Access

Received
1 October 2016

Accepted
1 February 2017

Published
1 January 2017

Abstract

Similarity searching has become widely available in many on-line archives of multimedia data. Users accessing such systems look for data items similar to their specific query object and typically refine results by re-running the search with a query from the results. We study this issue and propose a mechanism of approximate kNN query evaluation that incorporates statistics of accessing index data partitions. Apart from the distance between database objects, it also considers the prior query answers to prioritize index partitions containing frequently retrieved data, so evaluating repetitive similar queries more efficiently. We verify this concept in a number of experiments.

References

Amato, G., Rabitti, F., Savino, P., Zezula, P. (2003). Region proximity in metric spaces and its use for approximate similarity search. ACM Transactions on Information Systems (TOIS 203), 2003(21), 192–227.

Antol, M., Dohnal, V. (2016). Optimizing query performance with inverted cache in metric spaces. In: Proceedings of ADBIS 2016: 20th East-European Conference on Advances in Databases and Information Systems, pp. 1–14. Accepted for publication. For the reviewers, it is available at https://www.fi.muni.cz/ xdohnal/adbis2016.pdf.

Barrios, J.M., Bustos, B., Skopal, T. (2014). Analyzing and dynamically indexing the query set. Information Systems, 45, 37–47.

Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P. (2009). Building a web-scale image similarity search system. Multimedia Tools and Applications, 47(3), 599–629.

Beecks, C., Skopal, T., Schöffmann, K., Seidl, T. (2011). Towards large-scale multimedia exploration. In: Proceedings of 5th International Workshop on Ranking in Databases (DBRank 2011), Seattle, WA, USA, pp. 31–33.

Böhm, C., Berchtold, S., Keim, D.A. (2001). Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3), 322–373.

Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marin, M., Pedreira, O. (2015). Efficient similarity search by combining indexing and caching strategies. In: Proceedings of SOFSEM 2015: Theory and Practice of Computer Science: 41st International Conference on Current Trends in Theory and Practice of Computer Science. Springer, Berlin, pp. 486–497.

Budikova, P., Batko, M., Zezula, P. (2011). Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (Eds.), Proceedings of International Conference on Theory and Practice of Digital Libraries, Research and Advanced Technology for Digital Libraries (TPDL). Springer, Berlin, pp. 130–142.

Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L. (2001). Searching in metric spaces. ACM Computing Surveys (CSUR 2001), 33(3), 273–321.

Ciaccia, P., Patella, M., Zezula, P. (1997). M-tree: an efficient access method for similarity search in metric spaces. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (Eds.), Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, August 25–29, 1997. Morgan Kaufmann, pp. 426–435.

Deepak, P., Prasad, M.D. (2015). Operators for Similarity Search: Semantics, Techniques and Usage Scenarios. Springer.

Esuli, A. (2012). Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management, 48(5), 889–902.

Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F. (2009). Caching content-based queries for robust and efficient image retrieval. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09. ACM, New York, pp. 780–790.

Houle, M.E., Nett, M. (2015). Rank-based similarity search: reducing the dimensional dependence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 136–150.

Houle, M.E., Sakuma, J. (2005). Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedins of 21st International Conference on Data Engineering (ICDE), pp. 619–630.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14. ACM, New York, pp. 675–678.

Lew, M.S., Sebe, N., Djeraba, C., Jain, R. (2006). Content-based multimedia information retrieval: State of the art and challenges. The ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1), 1–19.

Novak, D., Batko, M., Zezula, P. (2011). Metric index: an efficient and scalable solution for precise and approximate similarity search. Information Systems, 36.

Novak, D., Zezula, P. (2014). Rank aggregation of candidate sets for efficient similarity search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (Eds.), Proceedings of 25th International Conference on Database and Expert Systems Applications (DEXA). Springer International Publishing, Cham, pp. 42–58.

Oliveira, P.H., Traina, C., Kaster, D.S. (2015). Improving the pruning ability of dynamic metric access methods with local additional pivots and anticipation of information. In: Proceedings of 19th East European Conference on Advances in Databases and Information Systems (ADBIS), Poitiers, France, September 8–11, 2015. Springer International Publishing, Cham, pp. 18–31.

Pramanik, S., Li, J., Ruan, J., Bhattacharjee, S.K. (1999). Efficient search scheme for very large image databases. In: Beretta, G.B., Schettini, R. (Eds.), Proceedings of the International Society for Optical Engineering (SPIE) on Internet Imaging, San Jose, California, USA, January 26, 2000, Vol. 3964. The International Society for Optical Engineering, pp. 79–90.

Sadit Tellez, E., Chávez, E. (2012). The list of clusters revisited. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (Eds.), Proceedings of 4th Mexican Conference on Pattern Recognition (MCPR). Springer, Berlin, pp. 187–196.

Samet, H. (2006). Foundations of Multidimensional And Metric Data Structures. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann.

Skopal, T., Hoksza, D. (2007). Improving the performance of m-tree family by nearest-neighbor graphs. In: Ioannidis, Y., Novikov, B., Rachev, B. (Eds.), Proceedings of 11th East European Conference on Advances in Databases and Information Systems (ADBIS), Varna, Bulgaria, September 29–October 3, 2007. Springer, Berlin, pp. 172–188.

Skopal, T., Lokoc, J., Bustos, B. (2012). D-cache: universal distance cache for metric access methods. IEEE Transactions on Knowledge and Data Engineering, 24(5), 868–881.

Skopal, T., Pokorný, J., Snášel, V. (2005). Nearest neighbours search using the PM-Tree. In Lecture Notes in Computer Science: Vol. 3453. Proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA), Beijing, China, April 17–20, 2005. Springer, pp. 803–815.

Vilar, J.M. (1995). Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm. Information Processing Letters, 56(5), 265–271.

Zezula, P., Amato, G., Dohnal, V., Batko, M. (2005). Similarity Search: The Metric Space Approach. Advances in Database Systems, Vol. 32. Springer.

Biographies

Antol Matej

xantol@fi.muni.cz

M. Antol is a PhD student at the Faculty of Informatics, Masaryk University in Brno, Czech Republic. His area of interest is data analytics, namely optimization of hierarchical structures and their performance in metric spaces.

Dohnal Vlastislav

dohnal@fi.muni.cz

V. Dohnal is an associate professor at the Faculty of Informatics, Masaryk University. He has been interested in database systems and information retrieval. In particular, indexing principles and mechanisms for unstructured data modelled as a metric space.

Full article Cited by

Open access article under the CC BY license.

Keywords

kNN query approximate search query popularity index structure metric space

Metrics

since January 2020

2343

Article info
views

759

Full article
views

642

PDF
downloads

349

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file