Журнал Современные наукоемкие технологии

1812-7320

Общество с ограниченной ответственностью "Издательский Дом "Академия Естествознания"

10.17513/snt.40780

ART-40780

МОДЕЛИ ПРОГРАММНОГО УПРАВЛЕНИЯ СИСТЕМАМИ РАСПРЕДЕЛЕННОЙ ОБРАБОТКИ ДАННЫХ В КОМПЬЮТЕРНЫХ СЕТЯХ: СРАВНИТЕЛЬНЫЙ АНАЛИЗ И РАЗРАБОТКА ИНТЕГРАТИВНОЙ АРХИТЕКТУРЫ

Нугаев

Р. К.

Nugaev

R. K.

rinat@nugaev.net

Сиразетдинов

Р. Т.

Sirazetdinov

R. T.

Российская Федерация

Аккредитованное образовательное частное учреждение высшего образования «Московский финансово-юридический университет МФЮА» Accredited Private Higher Education Institution “Moscow Financial and Law University of the Moscow Financial and Law Academy”

28 05 2026

5 90 100

This is an open-access article distributed under the terms of the CC BY 4.0 license.

https://top-technologies.ru/ru/article/view?id=40780

В статье рассматривается актуальная проблема программного управления системами распределенной обработки данных в условиях усложняющихся требований к адаптивности, гибкости соглашений об уровне обслуживания и поддержке гетерогенных ресурсов. Цель исследования – провести системный сравнительный анализ ключевых архитектур программного управления (MapReduce, Hadoop/YARN/HDFS, Spark, Flink, Kafka Streams) с учетом их преимуществ, ограничений и эксплуатационных метрик, а также обосновать целесообразность создания интегративной архитектурной модели. В работе использовались методы сравнительного анализа, моделирования эксплуатационных сценариев и формализации архитектурных решений с применением ориентированных ациклических графов, нагрузочного тестирования и статистической оценки эффективности. Результаты исследования показали, что классические и современные архитектуры имеют различные уровни гибкости, устойчивости и масштабируемости, при этом переход к гибридным, мультиагентным и оптимизируемым с применением методов машинного обучения моделям позволяет существенно повысить адаптивность, эффективность использования ресурсов и отказоустойчивость распределенных систем. Разработанная интегративная модель совмещает концепцию мультиагентного планировщика, динамический оркестратор процессов на основе направленных ациклических графов, кластеризацию задач с помощью алгоритмов машинного обучения и гибридное управление механизмами исполнения, что подтверждает эффективность предложенного подхода в типовых эксплуатационных сценариях. Сделан вывод, что дальнейшее развитие программного управления распределенной обработкой данных требует интеграции самонастраивающихся и интеллектуальных моделей, обеспечивающих высокий уровень надежности и адаптивности современных информационных платформ.

The article discusses the current problem of software management of distributed data processing systems in the context of increasingly complex requirements for adaptability, flexibility of service level agreements and support for heterogeneous resources. The purpose of the study is to conduct a systematic comparative analysis of key software management architectures (MapReduce, Hadoop/YARN/HDFS, Spark, Flink, Kafka Streams), taking into account their advantages, limitations, and operational metrics, as well as to substantiate the feasibility of creating an integrative architectural model. The work used methods of comparative analysis, modeling of operational scenarios and formalization of architectural solutions using oriented acyclic graphs, load testing and statistical efficiency assessment. The results of the study showed that classical and modern architectures have different levels of flexibility, resilience, and scalability, while the transition to hybrid, multi-agent, and machine learning-optimized models can significantly improve the adaptability, resource efficiency, and fault tolerance of distributed systems. The developed integrative model combines the concept of a multi-agent scheduler, a dynamic process orchestrator based on directed acyclic graphs, task clustering using machine learning algorithms and hybrid control of execution mechanisms, which confirms the effectiveness of the proposed approach in typical operational scenarios. It is concluded that the further development of software management of distributed data processing requires the integration of self-adjusting and intelligent models that ensure a high level of reliability and adaptability of modern information platforms.

распределенные вычисления программное управление кластеризация мультиагентные системы оркестратор на основе направленного ациклического графа потоковая обработка данных Apache Spark Apache Flink отказоустойчивость соглашение об уровне обслуживания гибридная архитектура прогнозирование нагрузки

distributed computing software management clustering multi-agent systems directed acyclic graph orchestrator streaming data processing Apache Spark Apache Flink fault tolerance service level agreement hybrid architecture load forecasting

1. Vilarrodona C. R.-C., Alvarez P., Lordan F., Alvarez J., Ejarque J., Badia R. M. A survey on the Distributed Computing stack // Computer Science Review. 2021. Vol. 42. Is. 2. 100422. DOI: 10.1016/j.cosrev.2021.100422.

2. Zhang D., Dai Z.-Y., Sun X.-P., Wu X.-T., Li H., Tang L., He J.-H. A distributed data processing scheme based on Hadoop for synchrotron radiation experiments // Journal of Synchrotron Radiation. 2024. Vol. 31. P. 635–645. DOI: 10.1107/S1600577524002637.

3. Ma C., Zhao M., Zhao Y. An overview of Hadoop applications in transportation big data // Journal of traffic and transportation engineering (English edition). 2023. Vol. 10. Is. 5. P. 900–917. DOI: 10.1016/j.jtte.2023.05.003.

4. Liu Y., Zeng Y. K., Piao X. F. High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN // Proceedings of the 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), 17–19 August 2016, Daegu, South Korea. P. 238–247. DOI: 10.1109/RTCSA.2016.51.

5. Yao Y., Gao H., Wang J., Sheng B., Mi N. New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters // IEEE Transactions on Cloud Computing. 2021. Vol. 9. Is. 3. P. 1158–1171. DOI: 10.1109/TCC.2019.2894779.

6. Li H., Ji S., Zhong H. et al. LPW: an efficient data-aware cache replacement strategy for Apache Spark // Science China Information Sciences. 2023. Vol. 66. Is. 112104. DOI: 10.1007/s11432-021-3406-5.

7. Winter G., Waterman D., Parkhurst J. et al. DIALS: implementation and evaluation of a new integration package // Acta Crystallographica Section D: Structural Biology. 2018. Vol. 74. Is. 2. P. 85–97. DOI: 10.1107/S2059798317017235.

8. You Z., Hu H., Wang Y., Xue J., Yi X. Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform // Wuhan University Journal of Natural Sciences. 2023. Vol. 28. Is. 5. P. 451–460. DOI: 10.1051/wujns/2023285451.

9. Ullah F., Dhingra S., Xia X., Ali Babar M. Evaluation of Distributed Data Processing Frameworks in Hybrid Clouds // Journal of Network and Computer Applications. 2024. Vol. 224. P. 1–14. DOI: 10.1016/j.jnca.2024.103837.

10. Šprem Š., Tomažin N., Matečić J., Horvat M. Building Advanced Web Applications Using Data Ingestion and Data Processing Tools. Electronics. 2024. Vol. 13. Is. 4. P. 1–23. DOI: 10.3390/electronics13040709.

11. Mezati M., Aouria I. Machine learning in big data: A performance benchmarking study of Flink-ML and Spark MLlib // Applied Computer Science. 2025. Vol. 21. Is. 2. P. 18–27. DOI: 10.35784/acs_7297.

12. Ramisetty S., Chandrasekaran T., Eruvaram V. K., Pulicharla M. R. Optimizing Real‑Time Data Pipelines for Machine Learning: A Comparative Study of Stream Processing Architectures // World Journal of Advanced Research and Reviews. 2024. Vol. 23. Is. 03. P. 1653–1660. DOI: 10.30574/wjarr.2024.23.3.2818.

13. Henning S., Hasselbring W. Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud // Journal of Systems and Software. 2024. Vol. 208. 17 p. DOI: 10.1016/j.jss.2023.111879.

14. Song Y. F., Li C., Xuan K., et al. Automatic data archiving and visualization at HLS-II // Nuclear Science and Techniques. 2018. Vol. 29. Is. 9. P. 129. DOI: 10.1007/s41365-018-0461-6.

15. Aljughaiman A., Almarri S. The pivotal role of software defined networks to safeguard against cyber-attacks: a comprehensive review // PeerJ Comput Sci. 2025. Vol. 11. Р. e2814. DOI: 10.7717/peerj-cs.2814. PMID: 40567704; PMCID: PMC12190307.

16. Alsheikh R., Fadel E., Akkari N. Distributed Software-Defined Networking Management: An Overview and Open Challenges // Aro-The Scientific Journal of Koya University. 2024. Vol. 12. Is. 2. P. 157–166. DOI: 10.14500/aro.11468.