Multi-Querying: A Subsequence Matching Approach to Support Multiple Queries

Liu, Wen; Ma, Mingrui; Wang, Peng

doi:10.15388/23-INFOR519

Informatica

Multi-Querying: A Subsequence Matching Approach to Support Multiple Queries

Volume 34, Issue 3 (2023), pp. 557–576

Wen Liu Mingrui Ma Peng Wang

https://doi.org/10.15388/23-INFOR519

Pub. online: 28 June 2023 Type: Research Article

Open Access

Received
1 August 2022

Accepted
1 May 2023

Published
28 June 2023

Abstract

The widespread use of sensors has resulted in an unprecedented amount of time series data. Time series mining has experienced a particular surge of interest, among which, subsequence matching is one of the most primary problem that serves as a foundation for many time series data mining techniques, such as anomaly detection and classification. In literature there exist many works to study this problem. However, in many real applications, it is uneasy for users to accurately and clearly elaborate the query intuition with a single query sequence. Consequently, in this paper, we address this issue by allowing users to submit a small query set, instead of a single query. The multiple queries can embody the query intuition better. In particular, we first propose a novel probability-based representation of the query set. A common segmentation is generated which can approximate the queries well, in which each segment is described by some features. For each feature, the corresponding values of multiple queries are represented as a Gaussian distribution. Then, based on the representation, we design a novel distance function to measure the similarity of one subsequence to the multiple queries. Also, we propose a breadth-first search strategy to find out similar subsequences. We have conducted extensive experiments on both synthetic and real datasets, and the results verify the superiority of our approach.

References

Abanda, A., Mori, U., Lozano, J.A. (2019). A review on distance based time series classification. Data Mining and Knowledge Discovery, 33(2), 378–412. https://doi.org/10.1145/3514221.3526183.

Boniol, P., Palpanas, T. (2020). Series2graph: Graph-based subsequence anomaly detection for time series. Proceedings of the VLDB Endowment, 13(12), 1821–1834. https://doi.org/10.14778/3407790.3407792.

Boniol, P., Linardi, M., Roncallo, F., Palpanas, T. (2020). Automated anomaly detection in large sequences. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, pp. 1834–1837. https://doi.org/10.1109/ICDE48307.2020.00182.

Boniol, P., Meftah, M., Remy, E., Palpanas, T. (2022). DCAM: dimension-wise class activation map for explaining multivariate data series classification. In: Proceedings of the 2022 International Conference on Management of Data, SIGMOD ’22. Association for Computing Machinery, New York, NY, USA, pp. 1175–1189. https://doi.org/10.1145/3514221.3526183.

Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H. (2007). SpADe: on shape-based pattern detection in streaming time series. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15–20, 2007, pp. 786–795. https://doi.org/10.1109/ICDE.2007.367924.

Dau, H.A., Bagnall, A., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Keogh, E. (2019). The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6), 1293–1305. https://doi.org/10.1109/JAS.2019.1911747.

Eravci, B., Ferhatosmanoglu, H. (2013). Diversity based relevance feedback for time series search. Proceedings of the VLDB Endowment, 7(2), 109–120. https://doi.org/10.14778/2732228.2732230.

Hu, W., Letson, F., Barthelmie, R.J., Pryor, S.C. (2018). Wind gust characterization at wind turbine relevant heights in moderately complex terrain. Journal of Applied Meteorology and Climatology, 57(7), 1459–1476. https://doi.org/10.1175/JAMC-D-18-0040.1.

Iwana, B.K., Uchida, S. (2020). Time series classification using local distance-based features in multi-modal fusion networks. Pattern Recognition, 97, 107024. https://doi.org/10.1016/j.patcog.2019.107024. https://www.sciencedirect.com/science/article/pii/S0031320319303279.

Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3), 263–286. https://doi.org/10.1007/PL00011669.

Kondylakis, H., Dayan, N., Zoumpatianos, K., Palpanas, T. (2018). Coconut: a scalable bottom-up approach for building data series indexes. Proceedings of the VLDB Endowment, 11(6), 677–690. https://doi.org/10.14778/3184470.3184472.

Li, Y., Tang, B., U, L.H., Yiu, M.L., Gong, Z. (2017). Fast subsequence search on time series data. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21–24, 2017, pp. 514–517. https://doi.org/10.5441/002/edbt.2017.58. https://openproceedings.org/2017/conf/edbt/paper-369.pdf.

Linardi, M., Palpanas, T. (2018). Scalable, variable-length similarity search in data series: the ULISSE approach. Proceedings of the VLDB Endowment, 11(13), 2236–2248. https://doi.org/10.14778/3275366.3284968.

Muthumanickam, P.K., Vrotsou, K., Cooper, M., Johansson, J. (2016). Shape grammar extraction for efficient query-by-sketch pattern matching in long time series. In: 11th IEEE Conference on Visual Analytics Science and Technology, IEEE VAST 2016, Baltimore, MD, USA, October 23–28, 2016. IEEE Computer Society, pp. 121–130. https://doi.org/10.1109/VAST.2016.7883518.

Papapetrou, P., Athitsos, V., Potamias, M., Kollios, G., Gunopulos, D. (2011). Embedding-based subsequence matching in time-series databases. ACM Transactions on Database Systems, 36(3), 17–11739.

Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G.E.A.P.A., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J. (2012). Searching and mining trillions of time series subsequences under dynamic time warping. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12, Beijing, China, August 12–16, 2012. ACM, pp. 262–270. https://doi.org/10.1145/2339530.2339576.

Shieh, J., Keogh, E. (2008). ISAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08. Association for Computing Machinery, New York, NY, USA, pp. 623–631. https://doi.org/10.1145/1401890.1401966.

Wang, H., Li, C., Sun, H., Guo, Z., Bai, Y. (2018). Shapelet classification algorithm based on efficient subsequence matching. Data Science Journal, 17(6), 1–12. https://doi.org/10.5334/dsj-2018-006.

Wang, Q., Whitmarsh, S., Navarro, V., Palpanas, T. (2022). IEDeaL: a deep learning framework for detecting highly imbalanced interictal epileptiform discharges. Proceedings of the VLDB Endowment, 16(3), 480–490. https://doi.org/10.14778/3570690.3570698.

Wasay, A., Wei, X., Dayan, N., Idreos, S. (2017). Data canopy: accelerating exploratory statistical analysis. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14–19, 2017, ACM, pp. 557–572. https://doi.org/10.1145/3035918.3064051.

Wu, J., Wang, P., Pan, N., Wang, C., Wang, W., Wang, J. (2019). KV-match: a subsequence matching approach supporting normalization and time warping. In: 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8–11, 2019. IEEE, pp. 866–877. https://doi.org/10.1109/ICDE.2019.00082.

Zoumpatianos, K., Idreos, S., Palpanas, T. (2016). ADS: the adaptive data series index. VLDB Endowment, 25(6), 843–866. https://doi.org/10.1007/s00778-016-0442-5.

Full article Related articles Cited by

Open access article under the CC BY license.

Keywords

multiple queries subsequence matching time series mining fault diagnosis

Funding

This work is supported by NSFC under grant 61962058, the Tianshan Talent of Xinjiang Uygur Autonomous Region – Young Top Talents in Science and Technology (2022TSYCCY0008), Integration of Industry and Education-Joint Laboratory of Data Engineering and Digital Mine (2019QX0035), Bayingolin Mongolian Autonomous Prefecture Science and Technology Research Program (202117), Natural Science Foundation of Xinjiang Uygur Autonomous Region (2019D01A30), and Scientific Research Program of the Higher Education Institution of Xinjiang (XJEDU2018Y056).

Metrics

since January 2020

1047

Article info
views

648

Full article
views

504

PDF
downloads

164

XML
downloads

RSS

Authors

Abstract

References

Export citation

Copy and paste formatted citation

Download citation in file