Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 28, Issue 1 (2017)
  4. Popularity-Based Ranking for Fast Approx ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Cited by
  • More
    Article info Full article Cited by

Popularity-Based Ranking for Fast Approximate kNN Search
Volume 28, Issue 1 (2017), pp. 1–21
Matej Antol   Vlastislav Dohnal  

Authors

 
Placeholder
https://doi.org/10.15388/Informatica.2017.118
Pub. online: 1 January 2017      Type: Research Article      Open accessOpen Access

Received
1 October 2016
Accepted
1 February 2017
Published
1 January 2017

Abstract

Similarity searching has become widely available in many on-line archives of multimedia data. Users accessing such systems look for data items similar to their specific query object and typically refine results by re-running the search with a query from the results. We study this issue and propose a mechanism of approximate kNN query evaluation that incorporates statistics of accessing index data partitions. Apart from the distance between database objects, it also considers the prior query answers to prioritize index partitions containing frequently retrieved data, so evaluating repetitive similar queries more efficiently. We verify this concept in a number of experiments.

References

 
Amato, G., Rabitti, F., Savino, P., Zezula, P. (2003). Region proximity in metric spaces and its use for approximate similarity search. ACM Transactions on Information Systems (TOIS 203), 2003(21), 192–227.
 
Antol, M., Dohnal, V. (2016). Optimizing query performance with inverted cache in metric spaces. In: Proceedings of ADBIS 2016: 20th East-European Conference on Advances in Databases and Information Systems, pp. 1–14. Accepted for publication. For the reviewers, it is available at https://www.fi.muni.cz/ xdohnal/adbis2016.pdf.
 
Barrios, J.M., Bustos, B., Skopal, T. (2014). Analyzing and dynamically indexing the query set. Information Systems, 45, 37–47.
 
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., Sedmidubsky, J., Zezula, P. (2009). Building a web-scale image similarity search system. Multimedia Tools and Applications, 47(3), 599–629.
 
Beecks, C., Skopal, T., Schöffmann, K., Seidl, T. (2011). Towards large-scale multimedia exploration. In: Proceedings of 5th International Workshop on Ranking in Databases (DBRank 2011), Seattle, WA, USA, pp. 31–33.
 
Böhm, C., Berchtold, S., Keim, D.A. (2001). Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3), 322–373.
 
Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marin, M., Pedreira, O. (2015). Efficient similarity search by combining indexing and caching strategies. In: Proceedings of SOFSEM 2015: Theory and Practice of Computer Science: 41st International Conference on Current Trends in Theory and Practice of Computer Science. Springer, Berlin, pp. 486–497.
 
Budikova, P., Batko, M., Zezula, P. (2011). Evaluation platform for content-based image retrieval systems. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (Eds.), Proceedings of International Conference on Theory and Practice of Digital Libraries, Research and Advanced Technology for Digital Libraries (TPDL). Springer, Berlin, pp. 130–142.
 
Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L. (2001). Searching in metric spaces. ACM Computing Surveys (CSUR 2001), 33(3), 273–321.
 
Ciaccia, P., Patella, M., Zezula, P. (1997). M-tree: an efficient access method for similarity search in metric spaces. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (Eds.), Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, August 25–29, 1997. Morgan Kaufmann, pp. 426–435.
 
Deepak, P., Prasad, M.D. (2015). Operators for Similarity Search: Semantics, Techniques and Usage Scenarios. Springer.
 
Esuli, A. (2012). Use of permutation prefixes for efficient and scalable approximate similarity search. Information Processing & Management, 48(5), 889–902.
 
Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F. (2009). Caching content-based queries for robust and efficient image retrieval. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09. ACM, New York, pp. 780–790.
 
Houle, M.E., Nett, M. (2015). Rank-based similarity search: reducing the dimensional dependence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 136–150.
 
Houle, M.E., Sakuma, J. (2005). Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedins of 21st International Conference on Data Engineering (ICDE), pp. 619–630.
 
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14. ACM, New York, pp. 675–678.
 
Lew, M.S., Sebe, N., Djeraba, C., Jain, R. (2006). Content-based multimedia information retrieval: State of the art and challenges. The ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1), 1–19.
 
Novak, D., Batko, M., Zezula, P. (2011). Metric index: an efficient and scalable solution for precise and approximate similarity search. Information Systems, 36.
 
Novak, D., Zezula, P. (2014). Rank aggregation of candidate sets for efficient similarity search. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (Eds.), Proceedings of 25th International Conference on Database and Expert Systems Applications (DEXA). Springer International Publishing, Cham, pp. 42–58.
 
Oliveira, P.H., Traina, C., Kaster, D.S. (2015). Improving the pruning ability of dynamic metric access methods with local additional pivots and anticipation of information. In: Proceedings of 19th East European Conference on Advances in Databases and Information Systems (ADBIS), Poitiers, France, September 8–11, 2015. Springer International Publishing, Cham, pp. 18–31.
 
Pramanik, S., Li, J., Ruan, J., Bhattacharjee, S.K. (1999). Efficient search scheme for very large image databases. In: Beretta, G.B., Schettini, R. (Eds.), Proceedings of the International Society for Optical Engineering (SPIE) on Internet Imaging, San Jose, California, USA, January 26, 2000, Vol. 3964. The International Society for Optical Engineering, pp. 79–90.
 
Sadit Tellez, E., Chávez, E. (2012). The list of clusters revisited. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (Eds.), Proceedings of 4th Mexican Conference on Pattern Recognition (MCPR). Springer, Berlin, pp. 187–196.
 
Samet, H. (2006). Foundations of Multidimensional And Metric Data Structures. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann.
 
Skopal, T., Hoksza, D. (2007). Improving the performance of m-tree family by nearest-neighbor graphs. In: Ioannidis, Y., Novikov, B., Rachev, B. (Eds.), Proceedings of 11th East European Conference on Advances in Databases and Information Systems (ADBIS), Varna, Bulgaria, September 29–October 3, 2007. Springer, Berlin, pp. 172–188.
 
Skopal, T., Lokoc, J., Bustos, B. (2012). D-cache: universal distance cache for metric access methods. IEEE Transactions on Knowledge and Data Engineering, 24(5), 868–881.
 
Skopal, T., Pokorný, J., Snášel, V. (2005). Nearest neighbours search using the PM-Tree. In Lecture Notes in Computer Science: Vol. 3453. Proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA), Beijing, China, April 17–20, 2005. Springer, pp. 803–815.
 
Vilar, J.M. (1995). Reducing the overhead of the AESA metric-space nearest neighbour searching algorithm. Information Processing Letters, 56(5), 265–271.
 
Zezula, P., Amato, G., Dohnal, V., Batko, M. (2005). Similarity Search: The Metric Space Approach. Advances in Database Systems, Vol. 32. Springer.

Biographies

Antol Matej
xantol@fi.muni.cz

M. Antol is a PhD student at the Faculty of Informatics, Masaryk University in Brno, Czech Republic. His area of interest is data analytics, namely optimization of hierarchical structures and their performance in metric spaces.

Dohnal Vlastislav
dohnal@fi.muni.cz

V. Dohnal is an associate professor at the Faculty of Informatics, Masaryk University. He has been interested in database systems and information retrieval. In particular, indexing principles and mechanisms for unstructured data modelled as a metric space.


Full article Cited by PDF XML
Full article Cited by PDF XML

Copyright
© 2017 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
kNN query approximate search query popularity index structure metric space

Metrics
since January 2020
1260

Article info
views

531

Full article
views

512

PDF
downloads

238

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy