There are some tools for profiling the PyTorch codes.
PyTorch Profiler
This can generate almost all is happening in PyTorch codes, such as CPU, GPU, memory, and stack usage. Moreover, it can generate enough information to be used in TensorBoard or even simple json format which can be opened in Chrome browser with chrome://tracing.
TensorBoard
After installing TensorBoard, this can be imported from torch.utils.tensorboard as SummaryWriter. This SummaryWriter can write images, audios, time series or any other scalar in a chronological order. So, it will be possible to keep the track of any parameters during epochs or any other specific steps. Then it can be seen in TensorBoard by showing the log directory to tensorboard command and tunneling via the ssh.
To run all these on our cluster, the following commands are needed:
>> tensorboard –logdir=./log/
>> ssh -CNL localhost:6006:localhost:6006 nedc_008
Be First to Comment