High-Performance Computing (HPC) Facility: Speed

Since December 2018, Gina Cody School of Engineering and Computer Science faculty, students, postdocs, and staff, have had access to a powerful HPC facility called Speed, which has been optimised for compute jobs that are multi-core aware, require a large memory space, or are iteration intensive.

Who can use Speed?

Speed is a free for use by all Gina Cody School of Engineering and Computer Science faculty, staff, and students. However, students require the approval of their supervisor and/ or instructor. If you are a member of GCS, please follow the instructions for requesting access in our user manual.

Non-GCS faculty, staff or students require sponsorship from a member of the GCS faculty or approval from the school's Dean's office. Concordians without an affiliation with GCS are encouraged to contact the Advanced Research Computing team.

What does Speed Consist of?

Twenty four (24) 32-core compute nodes, each with 512 GB of memory and approximately 1 TB of local volatile-scratch disk space.
Twelve (12) NVIDIA Tesla P6 GPUs, with 16 GB of memory (compatible with the CUDA, OpenGL, OpenCL, and Vulkan APIs).
4 VIDPRO nodes, with 6 P6 cards, and 6 V100 cards (32GB), and 256GB of RAM.
7 new SPEED2 servers with 64 CPU cores each 4x A100 80 GB GPUs, partitioned into 4x 20GB each; larger local storage for TMPDIR.
One AMD FirePro S7150 GPU, with 8 GB of memory (compatible with the Direct X, OpenGL, OpenCL, and Vulkan APIs).

Job Management is handled by the Slurm Workload Manager.

The "cluster" mounts multi-TB, NFS-provided storage, which serves both persistent-scratch data (not backed up) and persistent-store data (backed up).

What software is availble on Speed?

Linux OS which supports containers
- Nodes are running either Scientific Linux 7 or Almalinux 9. All nodes to be migrated to Almalinux 9.
Singularity (supports conversion from Docker containers), various machine- and specifically deep learning frameworks, Conda, Ubuntu Lambda Stack, TensorFlow, OpenCV, OpenMPI, OpenISS, MARF, OpenFOAM
Commercial tools, subject to licensing, Fluent, MATLAB, Ansys, and many others.

This infrastructure is continuously maintained by dedicated and professional AITS staff for sysadmin, applications, storage, and networking needs.

In alignment with the University’s digital strategy and open learning, GCS ENCS Network, Security, and HPC (NAG) group began to release some HPC resources for Speed publicly, including job submission script samples. The GCS HPC users / community are encouraged to contribute their own scripts, tricks, and hints via pull requests or report issues with the existing ones on our GitHub page:

● GCS NAG Speed GitHub Page

For more information, please e-mail: rt-ex-hpc@encs.concordia.ca

There are ongoing plans and work in progress for future expansion of Speed compute, GPU, and storage capabilities.

Documentation

User Manual and GitHub

The Speed User Manual is available in PDF or as HTML at the Speed HPC Facility GitHub. The facility's GitHub also contains sample job scripts.