Pub. online:1 Jan 2018Type:Research ArticleOpen Access
Journal:Informatica
Volume 29, Issue 4 (2018), pp. 633–650
Abstract
In recent years, Wireless Sensor Networks (WSNs) received great attention because of their important applications in many areas. Consequently, a need for improving their performance and efficiency, especially in energy awareness, is of a great interest. Therefore, in this paper, we proposed a lifetime improvement fixed clustering energy awareness routing protocol for WSNs named Load Balancing Cluster Head (LBCH) protocol. LBCH mainly aims at reducing the energy consumption in the network and balancing the workload over all nodes within the network. A novel method for selecting initial cluster heads (CHs) is proposed. In addition, the network nodes are evenly distributed into clusters to build balanced size clusters. Finally, a novel scheme is proposed to circulate the role of CHs depending on the energy and location information of each node in each cluster. Multihop technique is used to minimize the communication distance between CHs and the base station (BS) thus saving nodes energy. In order to evaluate the performance of LBCH, a thorough simulation has been conducted and the results are compared with other related protocols (i.e. ACBEC-WSNs-CD, Adaptive LEACH-F, LEACH-F, and RRCH). The simulations showed that LBCH overcomes other related protocols for both continuous data and event-based data models at different network densities. LBCH achieved an average improvement in the range of 2–172%, 18–145.5%, 10.18–62%, 63–82.5% over the compared protocols in terms of number of alive nodes, first node died (FND), network throughput, and load balancing, respectively.
Pub. online:1 Jan 2017Type:Research ArticleOpen Access
Journal:Informatica
Volume 28, Issue 1 (2017), pp. 105–130
Abstract
Analysing massive amounts of data and extracting value from it has become key across different disciplines. As the amounts of data grow rapidly, current approaches for data analysis are no longer efficient. This is particularly true for clustering algorithms where distance calculations between pairs of points dominate overall time: the more data points are in the dataset, the bigger the share of time needed for distance calculations.
Crucial to the data analysis and clustering process, however, is that it is rarely straightforward: instead, parameters need to be determined and tuned first. Entirely accurate results are thus rarely needed and instead we can sacrifice little precision of the final result to accelerate the computation. In this paper we develop ADvaNCE, a new approach based on approximating DBSCAN. More specifically, we propose two measures to reduce distance calculation overhead and to consequently approximate DBSCAN: (1) locality sensitive hashing to approximate and speed up distance calculations and (2) representative point selection to reduce the number of distance calculations.
The experiments show that the resulting clustering algorithm is more scalable than the state-of-the-art as the datasets become bigger. Compared with the most recent approximation technique for DBSCAN, our approach is in general one order of magnitude faster (at most 30× in our experiments) as the size of the datasets increase.
Journal:Informatica
Volume 25, Issue 4 (2014), pp. 563–580
Abstract
Abstract
Clustering is one of the better known unsupervised learning methods with the aim of discovering structures in the data. This paper presents a distance-based Sweep-Hyperplane Clustering Algorithm (SHCA), which uses sweep-hyperplanes to quickly locate each point’s approximate nearest neighbourhood. Furthermore, a new distance-based dynamic model that is based on -tree hierarchical space partitioning, extends SHCA’s capability for finding clusters that are not well-separated, with arbitrary shape and density. Experimental results on different synthetic and real multidimensional datasets that are large and noisy demonstrate the effectiveness of the proposed algorithm.
Pub. online:1 Jan 2014Type:Research ArticleOpen Access
Journal:Informatica
Volume 25, Issue 2 (2014), pp. 265–282
Abstract
The aim of this study is to predict the energy generated by a solar thermal system. To achieve this, a hybrid intelligent system was developed based on local regression models with low complexity and high accuracy. Input data is divided into clusters by using a Self Organization Maps; a local model will then be created for each cluster. Different regression techniques were tested and the best one was chosen. The novel hybrid regression system based on local models is empirically verified with a real dataset obtained by the solar thermal system of a bioclimatic house.
Pub. online:1 Jan 2011Type:Research ArticleOpen Access
Journal:Informatica
Volume 22, Issue 1 (2011), pp. 1–10
Abstract
Estimation and modelling problems as they arise in many data analysis areas often turn out to be unstable and/or intractable by standard numerical methods. Such problems frequently occur in fitting of large data sets to a certain model and in predictive learning. Heuristics are general recommendations based on practical statistical evidence, in contrast to a fixed set of rules that cannot vary, although guarantee to give the correct answer. Although the use of these methods became more standard in several fields of sciences, their use for estimation and modelling in statistics appears to be still limited. This paper surveys a set of problem-solving strategies, guided by heuristic information, that are expected to be used more frequently. The use of recent advances in different fields of large-scale data analysis is promoted focusing on applications in medicine, biology and technology.
Pub. online:1 Jan 2010Type:Research ArticleOpen Access
Journal:Informatica
Volume 21, Issue 3 (2010), pp. 455–470
Abstract
In this article, a method is proposed for analysing the thermovision-based video data that characterize the dynamics of temperature anisotropy of the heart tissue in a spatial domain. Many cardiac rhythm disturbances at present time are treated by applying destructive energy sources. One of the most common source and the related methodology is to use radio-frequency ablation procedure. However, the rate of the risk of complications including arrhythmia recurrence remains enough high. The drawback of the methodology used is that the suchlike destruction procedure cannot be monitored by visual spectra and results in the inability to control the ablation efficiency. To the end of understanding the nature of possible complications and controlling the treating process, the means of thermovision could be used. The aim of the study was to analyse possible mechanisms of these complications, measure and determine optimal radio-frequency ablation parameters, according to the analysis of video data, acquired using thermovision.
Pub. online:1 Jan 2009Type:Research ArticleOpen Access
Journal:Informatica
Volume 20, Issue 2 (2009), pp. 187–202
Abstract
In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained clusters are similar. The resemblance is measured by the total number of edges, in the clusters minimal spanning trees, connecting points from different samples. We use the Friedman and Rafsky two sample test statistic. Under the homogeneity hypothesis, this statistic is normally distributed. Thus, it can be expected that the true number of clusters corresponds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the ability of the approach to detect the true number of clusters.
Pub. online:1 Jan 2008Type:Research ArticleOpen Access
Journal:Informatica
Volume 19, Issue 3 (2008), pp. 377–390
Abstract
We investigate applicability of quantitative methods to discover the most fundamental structural properties of the most reliable political data in Lithuania. Namely, we analyze voting data of the Lithuanian Parliament. Two most widely used techniques of structural data analysis (clustering and multidimensional scaling) are compared. We draw some technical conclusions which can serve as recommendations in more purposeful application of these methods.
Pub. online:1 Jan 2007Type:Research ArticleOpen Access
Journal:Informatica
Volume 18, Issue 2 (2007), pp. 187–202
Abstract
In this paper, the relative multidimensional scaling method is investigated. This method is designated to visualize large multidimensional data. The method encompasses application of multidimensional scaling (MDS) to the so-called basic vector set and further mapping of the remaining vectors from the analyzed data set. In the original algorithm of relative MDS, the visualization process is divided into three steps: the set of basis vectors is constructed using the k-means clustering method; this set is projected onto the plane using the MDS algorithm; the set of remaining data is visualized using the relative mapping algorithm. We propose a modification, which differs from the original algorithm in the strategy of selecting the basis vectors. The experimental investigation has shown that the modification exceeds the original algorithm in the visualization quality and computational expenses. The conditions, where the relative MDS efficiency exceeds that of standard MDS, are estimated.
Pub. online:1 Jan 2002Type:Research ArticleOpen Access
Journal:Informatica
Volume 13, Issue 4 (2002), pp. 485–500
Abstract
This paper presents model-based forecasting of the Lithuanian education system in the period of 2001–2010. In order to obtain satisfactory forecasting results, development of models used for these aims should be grounded on some interactive data mining. The process of the development is usually accompanied by the formulation of some assumptions to background methods or models. The accessibility and reliability of data sources should be verified. Special data mining of data sources may verify the assumptions. Interactive data mining of the data, stored in the system of the Lithuanian teachers' database, and that of other sources representing the state of the education system and demographic changes in Lithuania was used. The models cover the estimation of data quality in the databases, analysis of the flow of teachers and pupils, clustering of schools, the model of dynamics of the pedagogical staff and pupils, and the quality analysis of teachers. The main results of forecasting and integrated analysis of the Lithuanian teachers' database with other data reflecting the state of the education system and demographic changes in Lithuania are presented.