Big [enough] Compute
Big Compute 20 took place last week in San Francisco, bringing together the full spectrum of current players in the cloud compute space, and those looking to utilize the capability they provide. It's run by Rescale, a cloud compute capability provider focused on engineering applications, across the range of compute scales and the pretty much the whole remit of computer aided engineering (CAE) software. Rescale was born in 2011 and their growth to date is symptomatic of the growth of remote compute in the CAE world, where you can now access an unprecedented scale of compute capability. The growth of the cloud in general across the current incumbents (Microsoft Azure, Amazon Web Services AWS and Google's cloud platform GCP) has been well known for some time with consistent growth in the infrastructure/platform/software as a service landscapes. The high performance computing (HPC) element of the cloud has lagged the top line growth but is now expected to be one of the drivers of subsequent growth in the forthcoming years. This is reflected in the prominence of HPC examples in an AWS Re:Invent 2019 keynote citing Formula One aerodynamics development using computational fluid dynamics (CFD), the usability of parallel cluster (the AWS HPC management tool-set), and recent staff hires AWS and Azure have made for people with strong HPC backgrounds. For HPC to deliver at scale a key component is high speed interconnect between the computes instances - it's interesting that GCP appear to have not implemented such capability (Azure uses infiniband and AWS it's own Elastic Fibre Adapter), maybe that's due to GCP's focus on machine learning rather than HPC for scientific application.
Let's take a look at how this abundance of compute capability can help product development programmes which use physics and engineering simulation to predict component and/or product performance in harmony with physical testing. Prior to the drive towards HPC in the cloud, it was able to offer high levels of scalability. CAE models that could run on a single cloud instance in a reasonable amount of time could be scaled out so that many models ran simultaneously, allowing a whole suite of design of experiments runs to be complete in the time it takes to do a single model run. The move to accessible HPC on the cloud allows the same scenario, but now more complex CAE models can follow the same approach. For example, in computational fluid dynamics there are significant accuracy benefits to running the models "transient" so that they capture the effects of how things change over time, as opposed to "steady-state" where the model only provides details of the time averaged behaviour. Transient models are much more resource hungry to use, and this is where HPC comes in to play.
The increased availability and ease of use of cloud HPC further reduces the timescales for utilising simulation tools, meaning design space characterization and optimization are now very much fully in scope, at the higher end of model complexity and accuracy. From the perspective of using HPC on the cloud the continued trend is to turn timescale constraints in to cost constraints. You can now run as many models in simultaneously as you can afford to have clusters running at the same time. With the removal of timescale from project considerations (in the scope of performing multiple CAE simulations) the focus becomes how to cost effectively characterize your design space and optimize performance.
The timescale/cost balance shift from the perspective of cloud HPC use is clear, but from the project management perspective there are of course many other considerations. HPC capability is available almost out of the box, but running your commerical CAE software on multiple clusters has license cost implications with additional licenses being required not only for every simulation, but for every core of every CPU in every simulation. The legacy commercial CAE software licensing model doesn't play well into this compute abundant development capability, despite the recent shifts to allow on demand license costing ("elastic" licensing) through CAE HPC platform providers such as Rescale which only charge for the time used. It's here that open source has real strength, allowing utilising of cloud HPC at any scale with no license costs.
It's good to remember that open source isn't "free", the cost in developing the skills to effectively use open source CAE software can be a barrier to implementation. From a project success perspective it's also important to have access to people with skills to integrate the CAE process and results effectively in to the development programme. There's much scope for this to be thorough and highly enabling - running a set of simulation for the purposes of validation and deciding which model settings are best suited can also be achieved in similarly short timescales - but making those initial set up choices and automating the workflow so that all the simulations really do run at the same time requires some domain knowledge and experience.
We have experience developing and using open source software workflows on the cloud and HPC, tailored to providing design characterization and optimization in the sort of timescales that product development currently demands. If you'd like to find out more about how we could apply these tools to help with your product development challenges, or to share your experiences of utilizing these approaches, we'd love to hear from you.