Home
Search

Informatica

Information Submit your article For Referees Help ATTENTION!

Journal home
To appear
Current issue
All issues
More
Journal home To appear Current issue All issues

Keywords: clustering

Detailed search

Title

Author

Types

Abstract

Keywords

Published

Pages

Volumes

Issues

DOI

Affiliation

Search results 14

Order by:

Select: All None Download:

Multivariate Data Clustering for the Gaussian Mixture Model

Mindaugas Kavaliauskas Rimantas Rudzkis

https://doi.org/10.15388/Informatica.2005.084

Pub. online: 5 Aug 2022 Type: Research Article

Open Access

Journal: Informatica Volume 16, Issue 1 (2005), pp. 61–74

Abstract

This paper discusses a soft sample clustering problem for multivariate independent random data satisfying the mixture model of the Gaussian distribution. The theory recommends to estimate the parameters of model by the maximum likelihood method and to use “plug-in” approach for data clustering. Unfortunately, the calculation problem of the maximum likelihood estimate is not completely solved in multivariate case. This work proposes a new constructive a few stage procedure to solve this task. This procedure includes statistical distribution analysis of a large number of the univariate projections of observations, geometric clustering of a multivariate sample and application of EM algorithm. The results of the accuracy analysis of the proposed methods is made by means of Monte-Carlo simulation.

Approach for Multi-Label Text Data Class Verification and Adjustment Based on Self-Organizing Map and Latent Semantic Analysis

Pavel Stefanovič Olga Kurasova

https://doi.org/10.15388/22-INFOR473

Pub. online: 10 Jan 2022 Type: Research Article

Open Access

Journal: Informatica Volume 33, Issue 1 (2022), pp. 109–130

Abstract

In this paper, a new approach has been proposed for multi-label text data class verification and adjustment. The approach helps to make semi-automated revisions of class assignments to improve the quality of the data. The data quality significantly influences the accuracy of the created models, for example, in classification tasks. It can also be useful for other data analysis tasks. The proposed approach is based on the combination of the usage of the text similarity measure and two methods: latent semantic analysis and self-organizing map. First, the text data must be pre-processed by selecting various filters to clean the data from unnecessary and irrelevant information. Latent semantic analysis has been selected to reduce the vectors dimensionality of the obtained vectors that correspond to each text from the analysed data. The cosine similarity distance has been used to determine which of the multi-label text data class should be changed or adjusted. The self-organizing map has been selected as the key method to detect similarity between text data and make decisions for a new class assignment. The experimental investigation has been performed using the newly collected multi-label text data. Financial news data in the Lithuanian language have been collected from four public websites and classified by experts into ten classes manually. Various parameters of the methods have been analysed, and the influence on the final results has been estimated. The final results are validated by experts. The research proved that the proposed approach could be helpful to verify and adjust multi-label text data classes. 82% of the correct assignments are obtained when the data dimensionality is reduced to 40 using the latent semantic analysis, and the self-organizing map size is reduced from 40 to 5 by step 5.

LBCH: Load Balancing Cluster Head Protocol for Wireless Sensor Networks

Raed T. Al-Zubi Noor Abedsalam Ahmad Atieh Khalid A. Darabkh

https://doi.org/10.15388/Informatica.2018.185

Pub. online: 1 Jan 2018 Type: Research Article

Open Access

Journal: Informatica Volume 29, Issue 4 (2018), pp. 633–650

Abstract

In recent years, Wireless Sensor Networks (WSNs) received great attention because of their important applications in many areas. Consequently, a need for improving their performance and efficiency, especially in energy awareness, is of a great interest. Therefore, in this paper, we proposed a lifetime improvement fixed clustering energy awareness routing protocol for WSNs named Load Balancing Cluster Head (LBCH) protocol. LBCH mainly aims at reducing the energy consumption in the network and balancing the workload over all nodes within the network. A novel method for selecting initial cluster heads (CHs) is proposed. In addition, the network nodes are evenly distributed into clusters to build balanced size clusters. Finally, a novel scheme is proposed to circulate the role of CHs depending on the energy and location information of each node in each cluster. Multihop technique is used to minimize the communication distance between CHs and the base station (BS) thus saving nodes energy. In order to evaluate the performance of LBCH, a thorough simulation has been conducted and the results are compared with other related protocols (i.e. ACBEC-WSNs-CD, Adaptive LEACH-F, LEACH-F, and RRCH). The simulations showed that LBCH overcomes other related protocols for both continuous data and event-based data models at different network densities. LBCH achieved an average improvement in the range of 2–172%, 18–145.5%, 10.18–62%, 63–82.5% over the compared protocols in terms of number of alive nodes, first node died (FND), network throughput, and load balancing, respectively.

ADvaNCE – Efficient and Scalable Approximate Density-Based Clustering Based on Hashing

Tianrun Li Thomas Heinis Wayne Luk

https://doi.org/10.15388/Informatica.2017.122

Pub. online: 1 Jan 2017 Type: Research Article

Open Access

Journal: Informatica Volume 28, Issue 1 (2017), pp. 105–130

Abstract

Analysing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, current approaches for data analysis are no longer efficient. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time: the more data points are in the dataset, the bigger the share of time needed for distance calculations.

Crucial to the data analysis and clustering process, however, is that it is rarely straightforward: instead, parameters need to be determined and tuned first. Entirely accurate results are thus rarely needed and instead we can sacrifice little precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach based on approximating DBSCAN. More specifically, we propose two measures to reduce distance calculation overhead and to consequently approximate DBSCAN: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations.

The experiments show that the resulting clustering algorithm is more scalable than the state-of-the-art as the datasets become bigger. Compared with the most recent approximation technique for DBSCAN, our approach is in general one order of magnitude faster (at most 30× in our experiments) as the size of the datasets increase.

Sweep-Hyperplane Clustering Algorithm Using Dynamic Model

Niko Lukač Borut Žalik Krista Rizman Žalik

https://doi.org/10.15388/Informatica.2014.30

Pub. online: 1 Jan 2014 Type: Article

Journal: Informatica Volume 25, Issue 4 (2014), pp. 563–580

Abstract

Clustering is one of the better known unsupervised learning methods with the aim of discovering structures in the data. This paper presents a distance-based Sweep-Hyperplane Clustering Algorithm (SHCA), which uses sweep-hyperplanes to quickly locate each point’s approximate nearest neighbourhood. Furthermore, a new distance-based dynamic model that is based on

2^{N}

-tree hierarchical space partitioning, extends SHCA’s capability for finding clusters that are not well-separated, with arbitrary shape and density. Experimental results on different synthetic and real multidimensional datasets that are large and noisy demonstrate the effectiveness of the proposed algorithm.

A Hybrid Regression System Based on Local Models for Solar Energy Prediction

Héctor Quintián Jose Luis Calvo-Rolle Emilio Corchado

https://doi.org/10.15388/Informatica.2014.14

Pub. online: 1 Jan 2014 Type: Research Article

Journal: Informatica Volume 25, Issue 2 (2014), pp. 265–282

Abstract

The aim of this study is to predict the energy generated by a solar thermal system. To achieve this, a hybrid intelligent system was developed based on local regression models with low complexity and high accuracy. Input data is divided into clusters by using a Self Organization Maps; a local model will then be created for each cluster. Different regression techniques were tested and the best one was chosen. The novel hybrid regression system based on local models is empirically verified with a real dataset obtained by the solar thermal system of a bioclimatic house.

Large-Scale Data Analysis Using Heuristic Methods

Gintautas Dzemyda Leonidas Sakalauskas

https://doi.org/10.15388/Informatica.2011.310

Pub. online: 1 Jan 2011 Type: Research Article

Journal: Informatica Volume 22, Issue 1 (2011), pp. 1–10

Abstract

Estimation and modelling problems as they arise in many data analysis areas often turn out to be unstable and/or intractable by standard numerical methods. Such problems frequently occur in fitting of large data sets to a certain model and in predictive learning. Heuristics are general recommendations based on practical statistical evidence, in contrast to a fixed set of rules that cannot vary, although guarantee to give the correct answer. Although the use of these methods became more standard in several fields of sciences, their use for estimation and modelling in statistics appears to be still limited. This paper surveys a set of problem-solving strategies, guided by heuristic information, that are expected to be used more frequently. The use of recent advances in different fields of large-scale data analysis is promoted focusing on applications in medicine, biology and technology.

Analysis of Thermovisual Data of the Radio-Frequency Impact on the Myocardium Damage

Vincentas Veikutis Gintautas Dzemyda Povilas Treigys Kristina Morkūnaitė Algidas Basevičius Saulius Lukoševičius Gintarė Šakalytė Antanas Sederevičius

https://doi.org/10.15388/Informatica.2010.299

Pub. online: 1 Jan 2010 Type: Research Article

Journal: Informatica Volume 21, Issue 3 (2010), pp. 455–470

Abstract

In this article, a method is proposed for analysing the thermovision-based video data that characterize the dynamics of temperature anisotropy of the heart tissue in a spatial domain. Many cardiac rhythm disturbances at present time are treated by applying destructive energy sources. One of the most common source and the related methodology is to use radio-frequency ablation procedure. However, the rate of the risk of complications including arrhythmia recurrence remains enough high. The drawback of the methodology used is that the suchlike destruction procedure cannot be monitored by visual spectra and results in the inability to control the ablation efficiency. To the end of understanding the nature of possible complications and controlling the treating process, the means of thermovision could be used. The aim of the study was to analyse possible mechanisms of these complications, measure and determine optimal radio-frequency ablation parameters, according to the analysis of video data, acquired using thermovision.

On a Minimal Spanning Tree Approach in the Cluster Validation Problem

Zeev Barzily Zeev Volkovich Başak Akteke-Öztürk Gerhard-Wilhelm Weber

https://doi.org/10.15388/Informatica.2009.245

Pub. online: 1 Jan 2009 Type: Research Article

Journal: Informatica Volume 20, Issue 2 (2009), pp. 187–202

Abstract

In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained clusters are similar. The resemblance is measured by the total number of edges, in the clusters minimal spanning trees, connecting points from different samples. We use the Friedman and Rafsky two sample test statistic. Under the homogeneity hypothesis, this statistic is normally distributed. Thus, it can be expected that the true number of clusters corresponds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the ability of the approach to detect the true number of clusters.

On Structural Analysis of Parliamentarian Voting Data

Tomas Krilavičius Antanas Žilinskas

https://doi.org/10.15388/Informatica.2008.219

Pub. online: 1 Jan 2008 Type: Research Article

Journal: Informatica Volume 19, Issue 3 (2008), pp. 377–390

Abstract

We investigate applicability of quantitative methods to discover the most fundamental structural properties of the most reliable political data in Lithuania. Namely, we analyze voting data of the Lithuanian Parliament. Two most widely used techniques of structural data analysis (clustering and multidimensional scaling) are compared. We draw some technical conclusions which can serve as recommendations in more purposeful application of these methods.

1 2

Items per page

Export citation

Copy and paste formatted citation

Formatted citation

Placeholder

Citation style

Download citation in file

Export format

Authors

Placeholder

RSS

INFORMATICA

Online ISSN: 1822-8844
Print ISSN: 0868-4952

About

About journal

For contributors

OA Policy
Submit your article
Instructions for Referees

Contact us

Institute of Data Science and Digital Technologies
Vilnius University

Akademijos St. 4

08412 Vilnius, Lithuania

Phone: (+370 5) 2109 338

E-mail: informatica@mii.vu.lt
https://informatica.vu.lt/journal/INFORMATICA