Challenge 3 Resources
Although the nodes in Challenges 1 and 2 were not shared, the system interconnect and global file system were, of necessity, shared as in any large cluster. The resulting resource contention resulted in varying amounts of CPU time being used during the wall clock based time limit. Although small, of order a few percent, this variation could impact the relative performance of two runs. It did not, however, impact the final order of any of the Entrants. In an effort to reduce resource contention even further, Challenge 3 runs will use only a single node and use only local, memory based, storage.
Each Challenge 3 node is powered by dual 32-core AMD EPYC 7502 CPUs; 64-cores (hyperthreading is turned off; there are only 64 threads, not 128) and 2.56 Teraflops per node. The L3 cache on each CPU is 128 MB. The CPUs run at 2.5 GHz with the ability to boost to 3.35 GHz, a new potential source of resource contention since the boost is controlled by temperature. Each node is equipped with 256 GB of octa-channel DDR4-3200 memory for a single core memory bandwidth of 22 GB/sec, 512 GB of local NVMe /scratch storage (Over 1GB/second read/write), and a HDR-100 Mellanox InfiniBand (100 Gb/s) network connection. Although it should not be a factor since only one node may be used, the network topology is a 2:1 blocking fat tree using 3 “leaf” switches with 32 HDR-100 ports down to compute nodes and 9 HDR-200 ports up to “spine” switches. There are 3 “spine” switches with ports available for 7 additional leaf switches.
The operating system is Centos 7.8.2003 (Core); http://bay.uchicago.edu/centos-vault/centos/7.8.2003/isos/x86_64/
Challenge 1 and 2 Resources
Challenge 1 and 2 submissions could request up to six 24-core nodes with 64 GiB of memory (a total of 144 cores), the same nodes and interconnect described below. The default number of nodes is 1.
To request more than 1 node, put the following in submission.conf as described on the Languages page.
srun -N x where x is a positive integer less than or equal to 6.
The Challenge 1 and 2 Evaluation Platform hardware consisted of a multi-node cluster with a Mellanox 4x FDR Infiniband interconnect (54.54 Gigabits/second data bandwidth; 4.5 GB/s possible with block sizes >6 million double words) in a 2-1 oversubscribed fat tree. Sets of 24 nodes have 12 uplinks into the large core Infiniband switch. Each node has 64 GiB of 4-channel 2133 MHz DDR4 SDRAM (68 Gigabytes/second) memory; two Intel Xeon E5-2670 v3 (Haswell) CPUs (Turbo boost NOT enabled), each with 12 cores (24 cores per node) and a clock speed of 2.30 GHz (peak floating point performance per node is 883.2 GFlops/sec); and 480 GB Intel® SSD 530 Series disk drives with SATA 3.0 6 Gbit/sec interfaces with transfer rates in the 300 MB/sec range. Performance varies as a function of the number of bad memory blocks on the SSD. The SSD is available in the /scratch directory.
An Entrant could use up to 6 nodes for an MPI submission. Please see the submission.conf section on the Languages page for details. Nodes will be under the sole control of the Entrant and not shared with other users. The hardware configuration of all nodes is identical.
Results from each submission are gzipped and placed in tar files on dtn2.pnl.gov, an externally-facing gateway for transferring large amounts of data efficiently. It is connected to the Internet via multiple 10 gigabit per second links to the Pacific Northwest Gigapop and the Seattle Internet Exchange. It is not uncommon to see transfer rates over 1 gbps to various sites around the world.
The Evaluation Platform uses the Centos 7.4.1708 operating system with kernel-3.10.0-693.5.2.el7.x86_64.
Requests for additional software will be considered. Please include requests on your registration form or send them to the GO Operations Team with subject line "Request for Additional Software".
Large memory and GPUs were available for demonstration runs only.
The Competition has access to three (3) nodes each with a large (1 Terabyte) amount of DRAM. Contact the GO Operations Team with subject line "Large Memory" for more information. These are not standard platform nodes and may not have a performance profile similar to the standard nodes, but we are willing to investigate their use by willing volunteers.
Using the directive "srun_options = -p gpu -N 1" in submission.conf, it is possible to access 8 NVidia Tesla K-80 GPU nodes with the same basic hardware configuration as the evaluaiton nodes. Use of GPUs is for demonstration purposes only for now.
Other GPUs are potentially available, but since they break the uniform evaluation model the Competition is striving for there is no assurance that such requests would be granted for other than demonstration purposes. Send requests to the GO Operations Team with subject line "GPU request".
The other GPUs all use dual Intel Broadwell E5-2620 v4 @ 2.10GHz CPUs with 16 cores per node, 64 GB 2133Mhz DDR4 memory per node (4 nodes with 384 GB of memory), with disk, interconnect, and OS the same at the Evaluation Platform nodes.There are 31 nodes with NVIDIA P100 12GB PCI-e based GPUs, 6 nodes with dual NVIDIA V100 16GB PCI-e based GPUs, and one NVIDIA DGX-2.