Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 34, Issue 3 (2023)
  4. Multi-Querying: A Subsequence Matching A ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • Related articles
  • Cited by
  • More
    Article info Full article Related articles Cited by

Multi-Querying: A Subsequence Matching Approach to Support Multiple Queries
Volume 34, Issue 3 (2023), pp. 557–576
Wen Liu   Mingrui Ma   Peng Wang  

Authors

 
Placeholder
https://doi.org/10.15388/23-INFOR519
Pub. online: 28 June 2023      Type: Research Article      Open accessOpen Access

Received
1 August 2022
Accepted
1 May 2023
Published
28 June 2023

Abstract

The widespread use of sensors has resulted in an unprecedented amount of time series data. Time series mining has experienced a particular surge of interest, among which, subsequence matching is one of the most primary problem that serves as a foundation for many time series data mining techniques, such as anomaly detection and classification. In literature there exist many works to study this problem. However, in many real applications, it is uneasy for users to accurately and clearly elaborate the query intuition with a single query sequence. Consequently, in this paper, we address this issue by allowing users to submit a small query set, instead of a single query. The multiple queries can embody the query intuition better. In particular, we first propose a novel probability-based representation of the query set. A common segmentation is generated which can approximate the queries well, in which each segment is described by some features. For each feature, the corresponding values of multiple queries are represented as a Gaussian distribution. Then, based on the representation, we design a novel distance function to measure the similarity of one subsequence to the multiple queries. Also, we propose a breadth-first search strategy to find out similar subsequences. We have conducted extensive experiments on both synthetic and real datasets, and the results verify the superiority of our approach.

References

 
Abanda, A., Mori, U., Lozano, J.A. (2019). A review on distance based time series classification. Data Mining and Knowledge Discovery, 33(2), 378–412. https://doi.org/10.1145/3514221.3526183.
 
Boniol, P., Palpanas, T. (2020). Series2graph: Graph-based subsequence anomaly detection for time series. Proceedings of the VLDB Endowment, 13(12), 1821–1834. https://doi.org/10.14778/3407790.3407792.
 
Boniol, P., Linardi, M., Roncallo, F., Palpanas, T. (2020). Automated anomaly detection in large sequences. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, pp. 1834–1837. https://doi.org/10.1109/ICDE48307.2020.00182.
 
Boniol, P., Meftah, M., Remy, E., Palpanas, T. (2022). DCAM: dimension-wise class activation map for explaining multivariate data series classification. In: Proceedings of the 2022 International Conference on Management of Data, SIGMOD ’22. Association for Computing Machinery, New York, NY, USA, pp. 1175–1189. https://doi.org/10.1145/3514221.3526183.
 
Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H. (2007). SpADe: on shape-based pattern detection in streaming time series. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15–20, 2007, pp. 786–795. https://doi.org/10.1109/ICDE.2007.367924.
 
Dau, H.A., Bagnall, A., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Keogh, E. (2019). The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6), 1293–1305. https://doi.org/10.1109/JAS.2019.1911747.
 
Eravci, B., Ferhatosmanoglu, H. (2013). Diversity based relevance feedback for time series search. Proceedings of the VLDB Endowment, 7(2), 109–120. https://doi.org/10.14778/2732228.2732230.
 
Hu, W., Letson, F., Barthelmie, R.J., Pryor, S.C. (2018). Wind gust characterization at wind turbine relevant heights in moderately complex terrain. Journal of Applied Meteorology and Climatology, 57(7), 1459–1476. https://doi.org/10.1175/JAMC-D-18-0040.1.
 
Iwana, B.K., Uchida, S. (2020). Time series classification using local distance-based features in multi-modal fusion networks. Pattern Recognition, 97, 107024. https://doi.org/10.1016/j.patcog.2019.107024. https://www.sciencedirect.com/science/article/pii/S0031320319303279.
 
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3), 263–286. https://doi.org/10.1007/PL00011669.
 
Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T. (2018). Coconut: a scalable bottom-up approach for building data series indexes. Proceedings of the VLDB Endowment, 11(6), 677–690. https://doi.org/10.14778/3184470.3184472.
 
Li, Y., Tang, B., U, L.H., Yiu, M.L., Gong, Z. (2017). Fast subsequence search on time series data. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21–24, 2017, pp. 514–517. https://doi.org/10.5441/002/edbt.2017.58. https://openproceedings.org/2017/conf/edbt/paper-369.pdf.
 
Linardi, M., Palpanas, T. (2018). Scalable, variable-length similarity search in data series: the ULISSE approach. Proceedings of the VLDB Endowment, 11(13), 2236–2248. https://doi.org/10.14778/3275366.3284968.
 
Muthumanickam, P.K., Vrotsou, K., Cooper, M., Johansson, J. (2016). Shape grammar extraction for efficient query-by-sketch pattern matching in long time series. In: 11th IEEE Conference on Visual Analytics Science and Technology, IEEE VAST 2016, Baltimore, MD, USA, October 23–28, 2016. IEEE Computer Society, pp. 121–130. https://doi.org/10.1109/VAST.2016.7883518.
 
Papapetrou, P., Athitsos, V., Potamias, M., Kollios, G., Gunopulos, D. (2011). Embedding-based subsequence matching in time-series databases. ACM Transactions on Database Systems, 36(3), 17–11739.
 
Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G.E.A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J. (2012). Searching and mining trillions of time series subsequences under dynamic time warping. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12, Beijing, China, August 12–16, 2012. ACM, pp. 262–270. https://doi.org/10.1145/2339530.2339576.
 
Shieh, J., Keogh, E. (2008). ISAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08. Association for Computing Machinery, New York, NY, USA, pp. 623–631. https://doi.org/10.1145/1401890.1401966.
 
Wang, H., Li, C., Sun, H., Guo, Z., Bai, Y. (2018). Shapelet classification algorithm based on efficient subsequence matching. Data Science Journal, 17(6), 1–12. https://doi.org/10.5334/dsj-2018-006.
 
Wang, Q., Whitmarsh, S., Navarro, V., Palpanas, T. (2022). IEDeaL: a deep learning framework for detecting highly imbalanced interictal epileptiform discharges. Proceedings of the VLDB Endowment, 16(3), 480–490. https://doi.org/10.14778/3570690.3570698.
 
Wasay, A., Wei, X., Dayan, N., Idreos, S. (2017). Data canopy: accelerating exploratory statistical analysis. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14–19, 2017, ACM, pp. 557–572. https://doi.org/10.1145/3035918.3064051.
 
Wu, J., Wang, P., Pan, N., Wang, C., Wang, W., Wang, J. (2019). KV-match: a subsequence matching approach supporting normalization and time warping. In: 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8–11, 2019. IEEE, pp. 866–877. https://doi.org/10.1109/ICDE.2019.00082.
 
Zoumpatianos, K., Idreos, S., Palpanas, T. (2016). ADS: the adaptive data series index. VLDB Endowment, 25(6), 843–866. https://doi.org/10.1007/s00778-016-0442-5.

Full article Related articles Cited by PDF XML
Full article Related articles Cited by PDF XML

Copyright
© 2023 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
multiple queries subsequence matching time series mining fault diagnosis

Funding
This work is supported by NSFC under grant 61962058, the Tianshan Talent of Xinjiang Uygur Autonomous Region – Young Top Talents in Science and Technology (2022TSYCCY0008), Integration of Industry and Education-Joint Laboratory of Data Engineering and Digital Mine (2019QX0035), Bayingolin Mongolian Autonomous Prefecture Science and Technology Research Program (202117), Natural Science Foundation of Xinjiang Uygur Autonomous Region (2019D01A30), and Scientific Research Program of the Higher Education Institution of Xinjiang (XJEDU2018Y056).

Metrics
since January 2020
486

Article info
views

202

Full article
views

247

PDF
downloads

65

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy