Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 33, Issue 2 (2022)
  4. Geometric MDS Performance for Large Data ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Geometric MDS Performance for Large Data Dimensionality Reduction and Visualization
Volume 33, Issue 2 (2022), pp. 299–320
Gintautas Dzemyda   Martynas Sabaliauskas   Viktor Medvedev  

Authors

 
Placeholder
https://doi.org/10.15388/22-INFOR491
Pub. online: 14 June 2022      Type: Research Article      Open accessOpen Access

Received
1 February 2022
Accepted
1 June 2022
Published
14 June 2022

Abstract

Multidimensional scaling (MDS) is a widely used technique for mapping data from a high-dimensional to a lower-dimensional space and for visualizing data. Recently, a new method, known as Geometric MDS, has been developed to minimize the MDS stress function by an iterative procedure, where coordinates of a particular point of the projected space are moved to the new position defined analytically. Such a change in position is easily interpreted geometrically. Moreover, the coordinates of points of the projected space may be recalculated simultaneously, i.e. in parallel, independently of each other. This paper has several objectives. Two implementations of Geometric MDS are suggested and analysed experimentally. The parallel implementation of Geometric MDS is developed for multithreaded multi-core processors. The sequential implementation is optimized for computational speed, enabling it to solve large data problems. It is compared with the SMACOF version of MDS. Python codes for both Geometric MDS and SMACOF are presented to highlight the differences between the two implementations. The comparison was carried out on several aspects: the comparative performance of Geometric MDS and SMACOF depending on the projection dimension, data size and computation time. Geometric MDS usually finds lower stress when the dimensionality of the projected space is smaller.

References

 
Albanie, S. (2019). Euclidean Distance Matrix Trick. Retrieved from Visual Geometry Group, University of Oxford.
 
Anaconda (2022). Anaconda Software Distribution. Anaconda Inc. https://docs.anaconda.com/.
 
Bernardin, L., Chin, P., DeMarco, P., Geddes, K.O., Hare, D., Heal, K., Labahn, G., May, J., McCarron, J., Monagan, M. (2021). Maple Programming Guide. Maplesoft, a division of Waterloo Maple Inc., Waterloo, Ontario.
 
Bernatavičienė, J., Dzemyda, G., Kurasova, O., Marcinkevičius, V., Medvedev, V. (2007). The problem of visual analysis of multidimensional medical data. In: Models and Algorithms for Global Optimization. Springer, Boston, MA, pp. 277–298. https://doi.org/10.1007/978-0-387-36721-7_17.
 
Borg, I., Groenen, P.J. (2005). Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media, New York, NY 100013, USA.
 
Borg, I., Groenen, P.J., Mair, P. (2018). Applied Multidimensional Scaling and Unfolding, 2nd ed. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-73471-2.
 
Buja, A., Swayne, D.F., Littman, M.L., Dean, N., Hofmann, H., Chen, L. (2008). Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics, 17(2), 444–472. https://doi.org/10.1198/106186008X318440.
 
De Leeuw, J. (1977). Application of convex analysis to multidimensional scaling. In: Barra, J.R., Brodeau, F., Romier, G., Van Cutsem, B. (Eds.), Recent Developments in Statistics. North Holland PublishingCompany, Amsterdam, pp. 133–145.
 
De Leeuw, J., Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30. https://doi.org/10.18637/jss.v031.i03.
 
Dos Santos, S., Brodlie, K. (2004). Gaining understanding of multivariate and multidimensional data through visualization. Computers & Graphics, 28(3), 311–325. https://doi.org/10.1016/j.cag.2004.03.013.
 
Dzemyda, G., Kurasova, O. (2006). Heuristic approach for minimizing the projection error in the integrated mapping. European Journal of Operational Research, 171(3), 859–878. https://doi.org/10.1016/j.ejor.2004.09.011.
 
Dzemyda, G., Sabaliauskas, M. (2020). A novel geometric approach to the problem of multidimensional scaling. In: Sergeyev, Y.D., Kvasov, D.E. (Eds.), Numerical Computations: Theory and Algorithms, NUMTA 2019. Lecture Notes in Computer Science, Vol. 11974. Springer, Cham, pp. 354–361. https://doi.org/10.1007/978-3-030-40616-5_30.
 
Dzemyda, G., Sabaliauskas, M. (2021a). Geometric multidimensional scaling: a new approach for data dimensionality reduction. Applied Mathematics and Computation, 409, 125561. https://doi.org/10.1016/j.amc.2020.125561.
 
Dzemyda, G., Sabaliauskas, M. (2021b). New capabilities of the geometric multidimensional scaling. In: WorldCIST 2021. Advances in Intelligent Systems and Computing. Trends and Applications in Information Systems and Technologies, Vol. 1366. Springer, Cham, pp. 264–273. https://doi.org/10.1007/978-3-030-72651-5_26.
 
Dzemyda, G., Sabaliauskas, M. (2021c). On the computational efficiency of geometric multidimensional scaling. In: 2021 2nd European Symposium on Software Engineering, ESSE 2021. Association for Computing Machinery, New York, NY, USA, pp. 136–141. 9781450385060. https://doi.org/10.1145/3501774.3501794.
 
Dzemyda, G., Sabaliauskas, M. (2022). Geometric multidimensional scaling: efficient approach for data dimensionality reduction. Journal of Global Optimization. https://doi.org/10.1007/s10898-022-01190-8.
 
Dzemyda, G., Kurasova, O., Medvedev, V. (2007). Dimension reduction and data visualization using neural networks. In: Frontiers in Artificial Intelligence and Applications, Emerging Artificial Intelligence Applications in Computer Engineering, Vol. 160, pp. 25–49.
 
Dzemyda, G., Kurasova, O., Žilinskas, J. (2013). Multidimensional Data Visualization: Methods and Applications. Springer Optimization and its Applications, Vol. 75. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0236-8.
 
Dzemyda, G., Medvedev, V., Sabaliauskas, M. (2022). Multi-Core Implementation of Geometric Multidimensional Scaling for Large-Scale Data. In: Information Systems and Technologies. WorldCIST 2022, Lecture Notes in Networks and Systems, Vol. 469. Springer International Publishing, Cham, pp. 74–82. https://doi.org/10.1007/978-3-031-04819-7_8.
 
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S.T., Telea, A.C. (2021). Toward a quantitative survey of dimension reduction techniques. IEEE Transactions on Visualization and Computer Graphics, 27(3), 2153–2173. https://doi.org/10.1109/TVCG.2019.2944182.
 
Groenen, P.J., Mathar, R., Heiser, W.J. (1995). The majorization approach to multidimensional scaling for Minkowski distances. Journal of Classification, 12(1), 3–19. https://doi.org/10.1007/BF01202265.
 
Guttman, L. (1968). A general nonmetric technique for finding the smallest coordinate space for a configuration of points. Psychometrica, 33, 469–506. https://doi.org/10.1007/BF02290164.
 
Handl, J., Knowles, J. (2005). Cluster generators for large high-dimensional data sets with large numbers of clusters. Dimension, 2, 20. https://personalpages.manchester.ac.uk/staff/Julia.Handl/data.tar.gz.
 
Ingram, S., Munzner, T., Olano, M. (2008). Glimmer: multilevel MDS on the GPU. IEEE Transactions on Visualization and Computer Graphics, 15(2), 249–261. https://doi.org/10.1109/TVCG.2008.85.
 
Ivanikovas, S., Medvedev, V., Dzemyda, G. (2007). Parallel realizations of the SAMANN algorithm. In: International Conference on Adaptive and Natural Computing Algorithms. Lecture Notes in Computer Science, Vol. 4432. Springer, pp. 179–188. https://doi.org/10.1007/978-3-540-71629-7_21.
 
Jackson, J.E. (1991). A User’s Guide to Principal Components, Vol. 587. John Wiley & Sons, Hoboken, NJ. https://doi.org/10.1002/0471725331.
 
Jolliffe, I. (2002). Principal Component Analysis, second edition. Springer-Verlag, New York, Berlin, Heidelberg.
 
Karbauskaitė, R., Dzemyda, G. (2015). Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data. International Journal of Applied Mathematics and Computer Science, 25(4), 895–913. https://doi.org/10.1515/amcs-2015-0064.
 
Karbauskaitė, R., Dzemyda, G. (2016). Fractal-based methods as a technique for estimating the intrinsic dimensionality of high-dimensional data: a survey. Informatica, 27(2), 257–281. https://doi.org/10.15388/Informatica.2016.84.
 
Kohonen, T. (2001). Self-Organizing Maps, 3rd ed. Springer Series in Information Sciences. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56927-2.
 
Kurasova, O., Molyte, A. (2011). Quality of quantization and visualization of vectors obtained by neural gas and self-organizing map. Informatica, 22(1), 115–134. https://doi.org/10.15388/informatica.2011.317.
 
Lee, J.A., Verleysen, M. (2007). Nonlinear Dimensionality Reduction. Springer Science & Business Media, New York, NY. https://doi.org/10.1007/978-0-387-39351-3.
 
Mao, J., Jain, A.K. (1995). Artificial neural networks for feature extraction and multivariate data projection. IEEE Transactions on Neural Networks, 6(2), 296–317. https://doi.org/10.1109/72.363467.
 
Markeviciute, J., Bernataviciene, J., Levuliene, R., Medvedev, V., Treigys, P., Venskus, J. (2022). Attention-based and time series models for short-term forecasting of COVID-19 spread. Computers, Materials and Continua, 70(1), 695–714. https://doi.org/10.32604/cmc.2022.018735.
 
MATLAB (2012). MATLAB and Statistics Toolbox Release 2012b. The MathWorks Inc., Natick, Massachusetts, United States.
 
McInnes, L., Healy, J., Saul, N., Großberger, L. (2018). UMAP: uniform manifold approximation and projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861.
 
Medvedev, V., Dzemyda, G., Kurasova, O., Marcinkevičius, V. (2011). Efficient data projection for visual analysis of large data sets using neural networks. Informatica, 22(4), 507–520. https://doi.org/10.15388/informatica.2011.339.
 
Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press, Cambridge, Massachusetts.
 
Orts, F., Filatovas, E., Ortega, G., Kurasova, O., Garzón, E.M. (2019). Improving the energy efficiency of SMACOF for multidimensional scaling on modern architectures. The Journal of Supercomputing, 75(3), 1038–1050. https://doi.org/10.1007/s11227-018-2285-x.
 
Pace, R.K., Barry, R. (1997). Sparse spatial autoregressions. Statistics & Probability Letters, 33(3), 291–297. https://doi.org/10.1016/s0167-7152(96)00140-x.
 
Pawliczek, P., Dzwinel, W., Yuen, D.A. (2014). Visual exploration of data by using multidimensional scaling on multicore CPU, GPU, and MPI cluster. Concurrency and Computation: Practice and Experience, 26(3), 662–682. https://doi.org/10.1002/cpe.3027.
 
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
 
Qiu, J., Bae, S.-H. (2012). Performance of windows multicore systems on threading and MPI. Concurrency and Computation: Practice and Experience, 24(1), 14–28. https://doi.org/10.1002/cpe.1762.
 
Ray, P., Reddy, S.S., Banerjee, T. (2021). Various dimension reduction techniques for high dimensional data analysis: a review. Artificial Intelligence Review, 54(5), 3473–3515. https://doi.org/10.1007/s10462-020-09928-0.
 
Sabaliauskas, M., Dzemyda, G. (2021). Visual analysis of multidimensional scaling using GeoGebra. In: Dzitac, I., Dzitac, S., Filip, F.G., Kacprzyk, J., Manolescu, M.-J., Oros, H. (Eds.), Intelligent Methods in Computing, Communications and Control. Springer International Publishing, Cham, pp. 179–187. https://doi.org/10.1007/978-3-030-53651-0_15.
 
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 4004–4012. https://doi.org/10.1109/CVPR.2016.434.
 
Stefanovic, P., Kurasova, O. (2011). Visual analysis of self-organizing maps. Nonlinear Analysis-Modelling and Control, 16(4), 488–504. https://doi.org/10.15388/NA.16.4.14091.
 
Torgerson, W.S. (1958). Theory and Methods of Scaling. John Wiley, Oxford, England.
 
Vachharajani, B., Pandya, D. (2022). Dimension reduction techniques: current status and perspectives. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2021.12.549.
 
Van der Maaten, L., Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 2579–2605.
 
Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of machine learning research, 10(13), 66–71.
 
Wang, Y., Huang, H., Rudin, C., Shaposhnik, Y. (2021). Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. Journal of Machine Learning Research, 22(201), 1–73.
 
Xu, X., Liang, T., Zhu, J., Zheng, D., Sun, T. (2019). Review of classical dimensionality reduction and sample selection methods for large-scale data processing. Neurocomputing, 328, 5–15. https://doi.org/10.1016/j.neucom.2018.02.100.
 
Zhou, Z.-H. (2021). Dimensionality reduction and metric learning. In: Machine Learning. Springer, Singapore, pp. 241–264. https://doi.org/10.1007/978-981-15-1967-3_10.

Biographies

Dzemyda Gintautas
gintautas.dzemyda@mif.vu.lt

G. Dzemyda received the doctoral degree in technical sciences (PhD) in 1984, and the degree of Doctor Habilius in 1997 from Kaunas University of Technology. He was conferred the title of professor at Kaunas University of Technology (1998) and Vilnius University (2018). Recent employment is at Vilnius University, Institute of Data Science and Digital Technologies, as the director of the Institute, the head of Cognitive Computing Group, professor and principal researcher. The research interests cover visualization of multidimensional data, optimization theory and applications, data mining, multiple criteria decision support, neural networks, image analysis. He is the author of more than 260 scientific publications, two monographs, five textbooks.

Sabaliauskas Martynas
martynas.sabaliauskas@mif.vu.lt

M. Sabaliauskas was awarded the doctor of technical sciences degree at Vilnius University in 2017. At present, he is an assistant professor at Vilnius University Institute of Data Science and Digital Technologies. His research interests include the problems of multidimensional scaling, computational mathematics, graph theory, and game theory.

Medvedev Viktor
viktor.medvedev@mif.vu.lt

V. Medvedev is a senior researcher at the Institute of Data Science and Digital Technologies, Vilnius University. He received the doctoral degree in computer science (PhD) from Institute of Mathematics and Informatics jointly with Vilnius Gediminas Technical University in 2008. His research interests include artificial intelligence, neural networks, multidimensional data, dimensionality reduction, image processing, data mining, and parallel computing.


Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2022 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
dimensionality reduction multidimensional scaling Geometric MDS large-scale data multi-core implementation SMACOF Python codes

Funding
This research has received funding from the Research Council of Lithuania (LMTLT), agreement No. S-MIP-20-19.

Metrics
since January 2020
1088

Article info
views

521

Full article
views

451

PDF
downloads

121

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy