Dataset: IXITiny Dataset- https://torchio.readthedocs.io/_modules/torchio/datasets/ixi.html#IXITiny
Machine config: 8 x NVIDIA A100-SXM4-40GB
Model architecture: UNet
With DLOP:
Configuration: {batch_size: 1028, prefetch_factor: 8, num_workers: 16}
Average throughput: 242.69 imgs/sec
Without DLOP:
Configuration: {batch_size: 1028, prefetch_factor: 2, num_workers: 32}
Average throughput: 18.87 samples/sec
Speedup: 12x over regular training.