Overview of Simulations on Shared- and Distributed-Memory Computers
The following terms occur frequently when describing the hardware for cluster computing and shared memory parallel computing:
A cluster is defined as any networked system of computers. A single cluster is composed of a set of computing hosts, often interchangeably called physical nodes, which are networked together via interconnects. When instances of a software program such as COMSOL Multiphysics running on a cluster need to communicate with each other, they do so via MPI, the message-passing interface. Each computing hosts will have one or more CPUs, and each CPU has multiple cores. All of the CPUs on a single hosts (and all of the cores within these CPUs) can share the same memory space, so each host is a shared-memory computer. The physical processor cores are used in shared-memory parallelism by a computational node running on a host with a multicore processor. For example, a host with two quad-core processors has eight available cores.
The entire cluster, on the other hand, is a distributed-memory computer. In this case, the compute nodes are where the distributed computing occurs. A COMSOL Multiphysics instance resides in each compute node and communicates with other compute nodes using MPI (message-passing interface). A compute node is a process running on the operating system, and multiple compute nodes can be assigned to run on a single host.
When a single COMSOL Multiphysics model is solved on a single host, there is no information being passed over the interconnects, and MPI does not need to be used. This is desirable because the data transfer speed between hosts via the interconnects is slower than the data transfer within a single host. When a single model requires so much memory that it cannot be solved on a single host, then multiple hosts have to be allocated to the model, and the MPI is used to share date between processes running on different hosts. The interconnect speed can become a significant computational bottleneck. It is therefore typically desirable to minimize the number of hosts used per single model. At least one compute node must run on each host. Depending on the specific hardware and the COMSOL Multiphysics model being solved, it may be beneficial to assign two or more computational nodes per host.
When a single COMSOL Multiphysics model contains a parametric sweep, it is possible to solve each case of that sweep completely separately of each other. In this scenario, using the Distributed Parametric Sweep functionality is highly motivated because there will be relatively little data passed via MPI; only the problem definition and the solution. Since each host can contain several CPUs, each of which can have many cores, it is also possible to solve several cases of a parametric sweep on the same host. This is motivated if the individual cases have low computational requirements.
The following command-line options are relevant for controlling how a problem is divided over a cluster:
-mpihosts: This option specifies the names of the hosts that will be used during the solution.
-f: This option specifies the path to the hostfile that will be used during the solution.
-nn: This option specifies the number of instances of COMSOL Multiphysics that are created. The instances communicate to each other via MPI.
-nnhost: The numbers of instances of COMSOL Multiphysics that are allocated to run on each host.
-np: The number of cores used for each instance of COMSOL Multiphysics.
Running COMSOL® in parallel on clusters. Knowledge base article with more information and example scripts: https://www.comsol.com/support/knowledgebase/1001/.
The Introduction to COMSOL Multiphysics includes a tutorial to learn how to build the busbar geometry. See the PDF-file included with COMSOL Multiphysics.