Trofimov Ilya Distributed systems for machine learning
Trofimov Ilya Distributed systems for machine learning


Machine learning is a rapidly developing area of research. Many machine learning and data science applications face very large datasets. These datasets are hard to process on a single computer or this processing will be very time consuming. Using only a subsample of a dataset typically leads to the worse quality of model’s predictions. Distributed computational systems are used to solve this problem. The most popular approaches for developing software of such systems include the following computational models: Map/Reduce, Spark, graph computational models and parameter server architecture. Current paper is a review of such systems with analysis of their advantages and disadvantages regarding to machine learning applications. Systems for training artificial neural networks are discussed separately.


Machine learning, data science, big data, distributed systems.

PP. 56-69.


1. Agarwal, A., Chapelle, O., Dudik, M., and Langford, J. (2011). A reliable effective terascale linear learning system. Technical
2. Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., and Smola, A. (2012). Scalable inference in latent variable models.
In International conference on Web search and data mining (WSDM), pages 123-132.
3. Amazon (2016). Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine.
4. Apache Foundation (2016). Spark MLLib.
5. Babenko, M. (2013). YT - a new framework for distributed computations. Technical report.
6. Bekkerman, R., Bilenko, M., and Langford, J. (2011). Scaling up machine learning: Parallel and distributed approaches.
Cambridge University Press.
7. Bilenko, M. et al. (2011). Scaling up decision tree ensembles. Technical report.
8. Bu, Y., Howe, B., and Ernst, M. D. (2010). HaLoop: E_cient Iterative Data Processing on Large Clusters. Proceedings of the
VLDB Endowment, 3(1-2):285-296.
9. Chandy, K. M. and Lamport, L. (1985). Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63-75.
10. Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. arXiv preprint arXiv:1603.02754.
11. Chen, W., Wang, Z., and Zhou, J. (2014). Large-scale L-BFGS using MapReduce. In Advances in Neural Information Processing Systems, pages 1332-1340.
12. Chilimbi, T., Suzue, Y., Apacible, J., and Kalyanaraman, K. (2014). Project Adam: Building an e_cient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571-582.
13. Ching, A. and Kunz, C. (2011). Giraph: Large-scale graph processing infrastructure on Hadoop. Hadoop Summit, 6(29):2011.
14. Chu, C., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., and Olukotun, K. (2007). Map-reduce for machine learning
on multicore. Advances in neural information processing systems.
15. Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., and Schmidhuber, J. (2011). Flexible, high performance convolutional neural networks for image classifcation. In IJCAI Proceedings-International Joint Conference on Arti_cial Intelligence, volume 22, page 1237.
16. Click, C., Lanford, J., Malohlava, M., Parmar, V., and Roark, H. (2015). Gradient boosted machines with H2O. Technical
17. Coates, A., Huval, B., Wang, T., Wu, D. J., Ng, A. Y., and Catanzaro, B. (2013). Deep learning with COTS HPC systems.
18. Dai, W., Wei, J., Kim, J. K., Lee, S., Yin, J., Ho, Q., and Xing, E. P. (2013). Petuum: A framework for iterative-convergent
distributed ML.
19. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q. V., et al. (2012).
Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223-1231.
20. Dean, J. and Ghemawat, S. (2004). MapReduce : Simplifed Data Processing on Large Clusters. In OSDI' 04, San Francisco.
21. Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization.
Journal of Machine Learning Research, 12(Jul):2121-2159.
22. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-h., Qiu, J., and Fox, G. (2010). Twister: a runtime for iterative MapReduce. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10 , pages 810-818.
23. Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R. H., Daniel, D. J., Graham, R. L., and Woodall, T. S. (2004). Open MPI: Goals, concept, and design of a next generation 38 implementation. In Proceedings, 11th European PVM/38 Users' Group Meeting, pages 97-104.
24. Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J., and Stoica, I. (2014). Graphx: Graph processing in a distributed dataflow framework. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 599-613.
25. Han, M., Daudjee, K., Ammar, K., Ozsu, M. T., Wang, X., and Jin, T. (2014). An experimental comparison of pregel-like graph processing systems. Proceedings of the VLDB Endowment, 7(12):1047-1058.
26. Ho, Q., Cipar, J., Cui, H., Kim, J. K., Lee, S., Gibbons, P. B., Gibson, G. A., Ganger, G. R., and Xing, E. P. (2013). More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NIPS' 13.
27. Iandola, F. N., Ashraf, K., Moskewicz, M. W., and Keutzer, K. (2015). Fireca_e: nearlinear acceleration of deep neural network training on compute clusters. arXiv preprint arXiv:1511.00175.
28. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia ,
pages 675-678. ACM.
29. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classi_cation with deep convolutional neural networks.
In NIPS, pages 1097-1105.
30. Le, Q. V., Ranzato, M. A., Devin, M., Corrado, G. S., and Ng, A. Y. (2012). Building High-level Features Using Large Scale Unsupervised Learning.
31. Leskovec, J., Rajaraman, A., and Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge University Press.
32. Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. (2014). Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583_598.
33. Li, M., Zhou, L., Yang, Z., Li, A., and Xia, F. (2013). Parameter Server for Distributed Machine Learning. Big Learning
Workshop, pages 1-10.
34. Lin, C.-Y., Tsai, C.-H., Lee, C.-P., and Lin, C.-J. (2014). Large-scale logistic regression and linear support vector machines using spark. In Big Data (Big Data), 2014 IEEE International Conference on, pages 519-528. IEEE.
35. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. (2012). Distributed GraphLab:
a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716_727.
36. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M. (2010). Graphlab: A new framework for parallel machine learning. In UAI' 10, Cataline Island, California.
37. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. (2010). Pregel: a system for
large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135-146. ACM.
38. Message Passing Interface Forum (2012). 38: A message-passing interface standard, version 3.0.
39. Microsoft (2016). Distributed machine learning toolkit.
40. Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.
41. Niu, F., Recht, B., Re, C., and Wright, S. J. (2011). HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. page 21.
42. Nokia Research Center (2008). Disco project.
43. Pacheco, P. (2011). An introduction to parallel programming. Elsevier.
44. Pafka, S. (2016). Simple/limited/incomplete benchmark for scalability, speed and accuracy of machine learning libraries for classi_cation.
45. Panda, B., Herbach, J. S., Basu, S., and Bayardo, R. J. (2009). Planet: massively parallel learning of tree ensembles with mapreduce. Proceedings of the VLDB Endowment, 2(2):1426-1437.
46. Rabit developers (2016). Rabit: Reliable allreduce and broadcast interface.
47. Smola, A. and Narayanamurthy, S. (2010). An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1-2): pp. 703-710
48. Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. (2001). Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review, 31(4):149-160.
49. Stylianou, M. (2014). Apache Giraph for applications in Machine Learning & Recommendation Systems.
50. Svore, K. M. and Burges, C. (2011). Large-scale learning to rank using boosted decision trees. Scaling Up Machine Learning:
Parallel and Distributed Approaches.
51. Trofimov, I. and Genkin, A. (2014). Distributed coordinate descent for l1-regularized logistic regression. arXiv preprint:1411.6520
52. Tyree, S., Weinberger, K. Q., Agrawal, K., and Paykin, J. (2011). Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387-396. ACM.
53. Van Renesse, R. and Schneider, F. B. (2004). Chain replication for supporting high throughput and availability. In OSDI, volume 4, pages 91-104.
54. White, T. (2012). Hadoop: The definitive guide. "O'Reilly Media, Inc.".
55. Xin, R. S., Gonzalez, J. E., Franklin, M. J., and Stoica, I. (2013). GraphX: A resilient distributed graph system on Spark. First
Intern. Workshop on Graph Data Management Experiences and Systems.
56. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark : Cluster Computing with Working Sets. HotCloud'10 Proceedings of the 2nd USENIXconference on Hot topics in cloud computing, page 10.
57. Zhang, W., Gupta, S., Lian, X., and Liu, J. (2015). Staleness-aware async-sgd for distributed deep learning. arXiv preprint
58. Zhuang, Y., Chin, W.-S., Juan, Y.-C., and Lin, C.-J. (2015). Distributed newton methods for regularized logistic regression.
In Advances in KDD.


2024 / 01
2023 / 04
2023 / 03
2023 / 02

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".