To achieve higher utilisation, cloud providers offer VMs with GPUs as lower-cost transient cloud resources. Transient VMs can be revoked at short notice and vary in their availability. This poses challenges to distributed machine learning (ML) jobs, …
Virtually all public clouds today are run by single providers, and this creates near-monopolies, inefficient markets, and hinders innovation at the infrastructure level. There are current proposals to change this, by creating open architectures that …
Existing cloud provisioning schemes allocate re- sources to batch processing systems at deployment time and only change this allocation at run-time due to unexpected events such as server failures. We observe that MapReduce-like jobs are time- …