Helping enable the data reduction needs of the SKA with data parallelisation and scaling up of CASA

The team at the Cavendish Laboratory of the University of Cambridge endeavoured to build upon CASA (Common Astronomy Software Applications) to allow easy distribution of processing on conventional HPC clusters through the use of a shared file system, and was able to scale CASA up to the limits of storage throughput required by SKA with “SWIFTCASA”, which is a task-based parallelisation of the standard radio astronomy data reduction package “CASA”. The design is simply to use CASA as an in-process Python module invoked through SWIFT/T’s existing mechanisms for Python functions.

CASA is probably the most widely used package for interferometric data analysis today and is the official data reduction package for both ALMA and JVLA.

Relevant Features

The most relevant feature of SWIFT language is the use of strictly single assignment data structures which makes it easy for SWIFT to infer the dataflow semantics of the programme.

What User Experience (UX) has been achieved?

SWIFTCASA enables such parallelisation while maintaining easy use by astronomers through utilisation of a simple and clear “scripting” language.

Benefits

  • More efficient use of scientist/analyst time when the analysis is not fully automated;
  • More efficient use of valuable computing resources, e.g., very fast storage;
  • More timely response to time-critical phenomena, whether they are of astronomical origin (i.e., transient sources) or in the telescope (e.g., a fault developing which is subtly corrupting the data).

For further information, software patches to make SWIFTCASA are available at: http://www.mrao.cam.ac.uk/~bn204/publications/2017/2017-08-casaswift.pdf

SWIFTCASA is directly interoperable with CASA and with standard HPC environment such as SLURM scheduling, MPI libraries and LUSTRE filesystem.

How does this help SKA?

SWIFTCASA meets the intensive task-based parallelism required by SKA and in particular SKA’s precursor instrument “Hydrogen Epoch of Reionization Array” (HERA). It addresses the difficulty of facing with traditional software based on partitioning of the input data in time and frequency.

Contact: Dr Bojan Nikolic email: b [dot] nikolicatmrao [dot] cam [dot] ac [dot] uk