{"id":12803,"date":"2021-11-21T17:08:14","date_gmt":"2021-11-21T22:08:14","guid":{"rendered":"https:\/\/carleton.ca\/scs\/?page_id=12803"},"modified":"2026-06-02T14:59:25","modified_gmt":"2026-06-02T18:59:25","slug":"tr-95-23-performance-impact-of-run-queue-organization-and-synchronization-on-large-scale-numa-multiprocessor-systems","status":"publish","type":"page","link":"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1995\/tr-95-23-performance-impact-of-run-queue-organization-and-synchronization-on-large-scale-numa-multiprocessor-systems\/","title":{"rendered":"TR-95-23: Performance Impact of Run Queue Organization and Synchronization on Large-Scale NUMA Multiprocessor Systems"},"content":{"rendered":"\n<section class=\"w-screen px-6 cu-section cu-section--white ml-offset-center md:px-8 lg:px-14\">\n    <div class=\"space-y-6 cu-max-w-child-5xl  md:space-y-10 cu-prose-first-last\">\n\n            <div class=\"cu-textmedia flex flex-col lg:flex-row mx-auto gap-6 md:gap-10 my-6 md:my-12 first:mt-0 max-w-5xl\">\n        <div class=\"justify-start cu-textmedia-content cu-prose-first-last\" style=\"flex: 0 0 100%;\">\n            <header class=\"font-light prose-xl cu-pageheader md:prose-2xl cu-component-updated cu-prose-first-last\">\n                                    <h1 class=\"cu-prose-first-last font-semibold !mt-2 mb-4 md:mb-6 relative after:absolute after:h-px after:bottom-0 after:bg-cu-red after:left-px text-3xl md:text-4xl lg:text-5xl lg:leading-[3.5rem] pb-5 after:w-10 text-cu-black-700 not-prose\">\n                        TR-95-23: Performance Impact of Run Queue Organization and Synchronization on Large-Scale NUMA Multiprocessor Systems\n                    <\/h1>\n                \n                                \n                            <\/header>\n\n                    <\/div>\n\n            <\/div>\n\n    <\/div>\n<\/section>\n\n<p>Carleton University<br>\n<a href=\"https:\/\/carleton.ca\/scs\/research\/scs-technical-reports\/technical-reports-1995\/\">Technical Report<\/a> TR-95-23<br>\nNovember 1995<\/p>\n\n\n\n<h2 id=\"performance-impact-of-run-queue-organization-and-synchronization-on-large-scale-numa-multiprocessor-systems\" class=\"wp-block-heading tr_t1\">Performance Impact of Run Queue Organization and Synchronization on Large-Scale NUMA Multiprocessor Systems<\/h2>\n\n\n\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">\n<div class=\"tr_t3\">Sivarama P. Dandamudi &amp; S. P. Cheng<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div>\n<h3>Abstract<\/h3>\n<\/div>\n\n\n\n<div class=\"tr_abstract\">\n<p>The goal of this paper is to study the impact of run queue organization on the performance of synchronization methods in multiprocessor systems. Two run queue organizations are considered: distributed and hierarchical organizations. The performance impact of spinning and blocking synchronization methods on these two run queue organizations is studied. We use two canonical workload types that require task synchronization: lock accessing and barrier synchronization workloads. The results presented here show that, when fine grain synchronization is required, the distributed organization is better. However, for large granularity tasks, the performance of the distributed organization is unacceptable and the hierarchical organization should be used. Note that the distributed organization is embedded into the hierarchical organization. Thus, for coarse granularity parallel applications, the hierarchical organization with its load sharing feature can be used; for fine-granularity parallel applications, the hierarchy of queues can be circumvented and the round robin task assignment can be done on processor local queues as in the distributed organization. Therefore, the hierarchical organization is useful in general-purpose large-scale shared-memory multiprocessors.<\/p>\n<\/div>\n\n\n\n<p><a href=\"https:\/\/carleton.ca\/scs\/wp-content\/uploads\/sites\/260\/TR-95-23.pdf\">TR-95-23.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Carleton University Technical Report TR-95-23 November 1995 Performance Impact of Run Queue Organization and Synchronization on Large-Scale NUMA Multiprocessor Systems Sivarama P. Dandamudi &amp; S. P. Cheng Abstract The goal of this paper is to study the impact of run queue organization on the performance of synchronization methods in multiprocessor systems. Two run queue organizations [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":11736,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"_cu_dining_location_slug":"","footnotes":"","_links_to":"","_links_to_target":""},"cu_page_type":[],"class_list":["post-12803","page","type-page","status-publish","hentry"],"acf":{"cu_post_thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/comments?post=12803"}],"version-history":[{"count":1,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12803\/revisions"}],"predecessor-version":[{"id":12805,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/12803\/revisions\/12805"}],"up":[{"embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/pages\/11736"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/media?parent=12803"}],"wp:term":[{"taxonomy":"cu_page_type","embeddable":true,"href":"https:\/\/carleton.ca\/scs\/wp-json\/wp\/v2\/cu_page_type?post=12803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}