Symbiotic Organisms Search Optimization to Predict Optimal Thread Count for Multi-threaded Applications

Main Article Content

Sachin H. Malave
Subhash K. Shinde

Abstract

Multicore systems have emerged as a cost-effective option for the growing demands for high-performance, low-energy computing. Thread management has long been a source of concern for developers, as overheads associated with it reduce the overall throughput of the multicore processor systems. One of the most complex problems with multicore processors is determining the optimal number of threads for the execution of multithreaded programs. To address this issue, this paper proposes a novel solution based on a modified symbiotic organism search (MSOS) algorithm which is a bio-inspired algorithm used for optimization in various engineering domains. This technique uses mutualism, commensalism and parasitism behaviours seen in organisms for searching the optimal solutions in the available search space. The algorithm is simulated on the NVIDIA DGX Intel-Xeon E5-2698-v4 server with PARSEC 3.0 benchmark suit.  The results show that keeping the thread count equal to the number of processors available in the system is not necessarily the best strategy to get maximum speedup when running multithreaded programs. It was also observed that when programs are run with the optimal thread count, the execution time is substantially decreased, resulting in energy savings due to the use of fewer processors than are available in the system.

Article Details

How to Cite
H. Malave, S. ., & K. Shinde, S. . (2022). Symbiotic Organisms Search Optimization to Predict Optimal Thread Count for Multi-threaded Applications. International Journal on Recent and Innovation Trends in Computing and Communication, 10(12), 83–91. https://doi.org/10.17762/ijritcc.v10i12.5889
Section
Articles

References

Min-Yuan Cheng, Doddy Prayogo. “Symbiotic Organisms Search: A new metaheuristic optimization algorithm.” Computers and Structures, vol. 139, pp 98-112, July 2014.

Yan, Chenggang, et al. "A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors." IEEE Signal Processing Letters 21.5 (2014): 573-576.

Lim, Amy W., and Monica S. Lam. "Maximizing parallelism and minimizing synchronization with affine transforms." Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1997.

Grant, Ryan E., et al. "Finepoints: Partitioned multithreaded mpi communication." International Conference on High Performance Computing. Springer, Cham, 2019.

Martinez, Jose F., and Josep Torrellas. "Speculative synchronization: Applying thread-level speculation to explicitly parallel applications." ACM SIGOPS Operating Systems Review 36.5 (2002): 18-29.

P. E. McKenney, M. Gupta, M. Michael, P. Howard, J. Triplett, and J. Walpole, “Is parallel programming hard, and if so, why?” Portland State University, Computer Science Department, Tech. Rep., TR-09-02, Feb. 2009.

R. Atachiants, D. Gregg, K. Jarvis, and G. Doherty, “Design considerations for parallel performance tools,” in Proc. SIGCHI Conf. Human Factors Compute. Syst., 2014, pp. 2501–2510.

C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr., “Does bug prediction support human developers? Findings from a Google case study,” in Proc. Int. Conf. Softw. Eng., 2013, pp. 372–381.

Ko and B. Myers, “An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks,” IEEE Trans. Softw. Eng., vol. 32, no. 12, pp. 971–987, Dec. 2006

Anderson, Thomas E., and Edward D. Lazowska. "Quartz: A tool for tuning parallel program performance." ACM SIGMETRICS Performance Evaluation Review 18.1 (1990): 115-125.

Navarro, Cristobal A., Nancy Hitschfeld-Kahler, and Luis Mateu. "A survey on parallel computing and its applications in data-parallel problems using GPU architectures." Communications in Computational Physics 15.2 (2014): 285-329.

Roman Atachiants, Gavin Doherty, and David Gregg., “Parallel performance problems on shared-memory multicore systems: taxonomy and observations,” IEEE Trans. Softw. Eng., vol. 42, no. 8, pp. 764–785, Aug. 2016.

Muthuvelu, Nithiapidary, et al. "On-line task granularity adaptation for dynamic grid applications." International Conference on Algorithms and Architectures for Parallel Processing. Springer, Berlin, Heidelberg, 2010.

Iancu, Costin, et al. "Oversubscription on multicore processors." 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 2010.

Li, Jian, Jose F. Martinez, and Michael C. Huang. "The thrifty barrier: Energy-aware synchronization in shared-memory multiprocessors." 10th International Symposium on High Performance Computer Architecture (HPCA'04). IEEE, 2004.

Sridharan, Srinivas, Arun Rodrigues, and Peter Kogge. "Evaluating synchronization techniques for light-weight multithreaded/ multicore architectures. " Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures. 2007.

N. Tallent, J. Mellor-Crummey and A. Porter?eld, “Analyzing lock contention in multithreaded applications,” in Proc. 15th ACM SIGPLAN Symp. Principles Practice Parallel Program., 2010, pp. 269–280.

Amer, Abdelhalim, et al. "Lock contention management in multithreaded mpi." ACM Transactions on Parallel Computing (TOPC) 5.3 (2019): 1-21.

Cui, Yan, et al. "Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems." ACM Transactions on Architecture and Code Optimization (TACO) 9.4 (2013): 1-25.

Venugopal, Srikumar, Rajkumar Buyya, and Kotagiri Ramamohanarao. "A taxonomy of data grids for distributed data sharing, management, and processing." ACM Computing Surveys (CSUR) 38.1 (2006): 3-es.

AbdurRouf, Mohammad, et al. "Performance Improvement using Optimal Thread Allocation Algorithm in Multicore Processor." (2018)

Lim, Geunsik, Donghyun Kang, and Young Ik Eom. "Thread Evolution Kit for Optimizing Thread Operations on CE/IoT Devices." IEEE Transactions on Consumer Electronics 66.4 (2020): 289-298.

Sethia, Ankit, and Scott Mahlke. "Equalizer: Dynamic tuning of gpu resources for efficient execution." 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 2014.

Qin, Henry, et al. "Arachne: Core-aware thread management." 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 2018.

Awatramani, Mihir, Joseph Zambreno, and Diane Rover. "Increasing gpu throughput using kernel interleaved thread block scheduling." 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, 2013.

Pusukuri, Kishore Kumar, Rajiv Gupta, and Laxmi N. Bhuyan. "Thread reinforcer: Dynamically determining number of threads via os level monitoring." 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2011.

Sasaki, Hiroshi, et al. "Scalability-based manycore partitioning." Proceedings of the 21st international conference on Parallel architectures and compilation techniques. 2012.

Heirman, Wim, et al. "Automatic SMT threading for OpenMP applications on the Intel Xeon Phi co-processor." Proceedings of the 4th international workshop on runtime and operating systems for supercomputers. 2014.

Kanemitsu, Hidehiro, Masaki Hanada, and Hidenori Nakazato. "Clustering-based task scheduling in a large number of heterogeneous processors." IEEE Transactions on Parallel and Distributed Systems 27.11 (2016): 3144-3157.

Birhanu, Thomas Mezmur, et al. "Efficient thread mapping for heterogeneous multicore iot systems." Mobile Information Systems 2017 (2017).