Advancing Research Reproducibility in Machine Learning through Blockchain Technology
Volume 35, Issue 2 (2024), pp. 227–253
Pub. online: 11 April 2024
Type: Research Article
Open Access
Received
1 February 2024
1 February 2024
Accepted
1 April 2024
1 April 2024
Published
11 April 2024
11 April 2024
Abstract
Like other disciplines, machine learning is currently facing a reproducibility crisis that hinders the advancement of scientific research. Researchers face difficulties reproducing key results due to the lack of critical details, including the disconnection between publications and associated models, data, parameter settings, and experimental results. To promote transparency and trust in research, solutions that improve the accessibility of models and data, facilitate experiment tracking, and allow audit of experimental results are needed. Blockchain technology, characterized by its decentralization, data immutability, cryptographic hash functions, consensus algorithms, robust security measures, access control mechanisms, and innovative smart contracts, offers a compelling pathway for the development of such solutions. To address the reproducibility challenges in machine learning, we present a novel concept of a blockchain-based platform that operates on a peer-to-peer network. This network comprises organizations and researchers actively engaged in machine learning research, seamlessly integrating various machine learning research and development frameworks. To validate the viability of our proposed concept, we implemented a blockchain network using the Hyperledger Fabric infrastructure and conducted experimental simulations in several scenarios to thoroughly evaluate its effectiveness. By fostering transparency and facilitating collaboration, our proposed platform has the potential to significantly improve reproducible research in machine learning and can be adapted to other domains within artificial intelligence.
References
Androulaki, E., Barger, A., Bortnikov, V., Cachin, C., Christidis, K., De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y. Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G., Smith, K., Sorniotti, A., Stathakopoulou, Ch., Vukolić, M., Cocco, S.W., Yellick, J. (2018). Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference, pp. 1–15.
Bag, R., Spilak, B., Winkel, J., Härdle, W.K. (2022). Quantinar: a blockchain p2p ecosystem for honest scientific research. arXiv preprint. arXiv:2211.11525.
Buterin, V. (2017). The Meaning of Decentralization. https://medium.com/@VitalikButerin/the-meaning-of-decentralization-a0c92b76a274.
Cacti (2024). Hyperledger Cacti. https://www.hyperledger.org/projects/cacti.
Fabric, H. (2024). Hyperledger Fabric Docs. https://hyperledger-fabric.readthedocs.io/en/latest/index.html.
FireFly, H. (2023). Hyperledger FireFly Docs. https://hyperledger.github.io/firefly/.
Khoi Tran, N., Sabir, B., Babar, M.A., Cui, N., Abolhasan, M., Lipman, J. (2022). ProML: a decentralised platform for provenance management of machine learning software systems. In: Software Architecture: 16th European Conference, 2022, Proceedings, ECSA 2022, Prague, Czech Republic, September 19–23. Springer, pp. 49–65.
Kwon, J., Buchman, E. (2015). Comsos: A Network of Distributed Ledgers. https://github.com/cosmos/cosmos/blob/master/WHITEPAPER.md.
Lüthi, P., Gagnaux, T., Gygli, M. (2020). Distributed ledger for provenance tracking of artificial intelligence assets. In: Friedewald, M., Önen, M., Lievens, E., Krenn, S., Fricker, S. (Eds.), Privacy and Identity Management. Data for Better Living: AI and Privacy. Privacy and Identity 2019. IFIP Advances in Information and Communication Technology, Vol. 576. Springer, Cham, pp. 411–426. https://doi.org/10.1007/978-3-030-42504-3_26.
Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System. https://bitcoin.org/bitcoin.pdf.
Rowhani-Farid, A., Barnett, A.G. (2018). Badges for sharing data and code at Biostatistics: an observational study. F1000Research, 7, 90. https://doi.org/10.12688/f1000research.13477.2.
Sarpatwar, K., Vaculin, R., Min, H., Su, G., Heath, T., Ganapavarapu, G., Dillenberger, D. (2019). Towards enabling trusted artificial intelligence via blockchain. In: Calo, S., Bertino, E., Verma, D. (Eds.), Policy-Based Autonomic Data Governance, Lecture Notes in Computer Science, Vol. 11550. Springer, Cham, pp. 137–153.
WeCross (2019). WeCross. https://github.com/WeBankBlockchain/WeCross.
YUI (2022). Hyperledger YUI. https://labs.hyperledger.org/labs/yui.html.
Zamyatin, A., Al-Bassam, M., Zindros, D., Kokoris-Kogias, E., Moreno-Sanchez, P., Kiayias, A., Knottenbelt, W.J. (2021). Sok: Communication across distributed ledgers. In: Borisov, N., Diaz, C. (Eds.), Financial Cryptography and Data Security, FC 2021, Lecture Notes in Computer Science, Vol. 12675. Springer, Berlin, Heidelberg, pp. 3–36.