Pub. online:19 Nov 2024Type:Research ArticleOpen Access
Journal:Informatica
Volume 35, Issue 4 (2024), pp. 883–908
Abstract
There are different deep neural network (DNN) architectures and methods for performing augmentation on time series data, but not all the methods can be adapted for specific datasets. This article explores the development of deep learning models for time series, applies data augmentation methods to conveyor belt (CB) tension signal data and investigates the influence of these methods on the accuracy of CB state classification. CB systems are one of the essential elements of production processes, enabling smooth transportation of various industrial items, therefore its analysis is highly important. For the purpose of this work, multi-domain tension data signals from five different CB load weight conditions (0.5 kg, 1 kg, 2 kg, 3 kg, 5 kg) and one damaged belt condition were collected and analysed. Four DNN models based on fully convolutional network (FCN), convolutional neural network combined with long short-term memory (CNN-LSTM) model, residual network (ResNet), and InceptionTime architectures were developed and applied to classification of CB states. Different time series augmentations, such as random Laplace noise, drifted Gaussian noise, uniform noise, and magnitude warping, were applied to collected data during the study. Furthermore, new CB tension signals were generated using a TimeVAE model. The study has shown that DNN models based on FCN, ResNet, and InceptionTime architectures are able to classify CB states accurately. The research has also shown that various data augmentation methods can improve the accuracy of the above-mentioned models, for example, the combined addition of random Laplace and drifted Gaussian noise improved FCN model’s baseline (without augmentation) classification accuracy with 2.0 s-length signals by 4.5% to 92.6% ± 1.54%. FCN model demonstrated the best accuracy and classification performance despite its lowest amount of trainable parameters, thus demonstrating the importance of selecting and optimizing the right architecture when developing models for specific tasks.
Journal:Informatica
Volume 35, Issue 2 (2024), pp. 227–253
Abstract
Like other disciplines, machine learning is currently facing a reproducibility crisis that hinders the advancement of scientific research. Researchers face difficulties reproducing key results due to the lack of critical details, including the disconnection between publications and associated models, data, parameter settings, and experimental results. To promote transparency and trust in research, solutions that improve the accessibility of models and data, facilitate experiment tracking, and allow audit of experimental results are needed. Blockchain technology, characterized by its decentralization, data immutability, cryptographic hash functions, consensus algorithms, robust security measures, access control mechanisms, and innovative smart contracts, offers a compelling pathway for the development of such solutions. To address the reproducibility challenges in machine learning, we present a novel concept of a blockchain-based platform that operates on a peer-to-peer network. This network comprises organizations and researchers actively engaged in machine learning research, seamlessly integrating various machine learning research and development frameworks. To validate the viability of our proposed concept, we implemented a blockchain network using the Hyperledger Fabric infrastructure and conducted experimental simulations in several scenarios to thoroughly evaluate its effectiveness. By fostering transparency and facilitating collaboration, our proposed platform has the potential to significantly improve reproducible research in machine learning and can be adapted to other domains within artificial intelligence.
Pub. online:17 May 2022Type:Research ArticleOpen Access
Journal:Informatica
Volume 33, Issue 2 (2022), pp. 247–277
Abstract
One of the biggest difficulties in telecommunication industry is to retain the customers and prevent the churn. In this article, we overview the most recent researches related to churn detection for telecommunication companies. The selected machine learning methods are applied to the publicly available datasets, partially reproducing the results of other authors and then it is applied to the private Moremins company dataset. Next, we extend the analysis to cover the exiting research gaps: the differences of churn definitions are analysed, it is shown that the accuracy in other researches is better due to some false assumptions, i.e. labelling rules derived from definition lead to very good classification accuracy, however, it does not imply the usefulness for such churn detection in the context of further customer retention. The main outcome of the research is the detailed analysis of the impact of the differences in churn definitions to a final result, it was shown that the impact of labelling rules derived from definitions can be large. The data in this study consist of call detail records (CDRs) and other user aggregated daily data, 11000 user entries over 275 days of data was analysed. 6 different classification methods were applied, all of them giving similar results, one of the best results was achieved using Gradient Boosting Classifier with accuracy rate 0.832, F-measure 0.646, recall 0.769.
Journal:Informatica
Volume 12, Issue 3 (2001), pp. 455–468
Abstract
This paper describes a preliminary algorithm performing epilepsy prediction by means of visual perception tests and digital electroencephalograph data analysis. Special machine learning algorithm and signal processing method are used. The algorithm is tested on real data of epileptic and healthy persons that are treated in Kaunas Medical University Clinics, Lithuania. The detailed examination of results shows that computerized visual perception testing and automated data analysis could be used for brain damages diagnosing.