OSU Benchmarking with Python and Pytorch

From Define Wiki
Jump to navigation Jump to search

Note; this is only for CPU benchmarks but still tests the python libs


python3.9 -m venv ~/venvs/pytorch-osu-testing
source ~/venvs/pytorch-osu-testing/bin/activate
pip install -U pip
mkdir ~/scratch
cd scratch
git clone https://github.com/Algebraic-Programming/pytorch-hccl-tests.git
cd pytorch-hccl-tests
pip install -r requirements_dev.txt
make install

# should this this at the end
Successfully installed pandas-1.3.5 python-dateutil-2.9.0.post0 pytorch-hccl-tests-0.1.15 pytz-2024.1 six-1.16.0

# installed a pretty old pytorch, updating
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

# on one node (only node_rank differs); 
torchrun --master_addr=172.16.16.42 --master_port=29500 --node_rank 0 --nnodes 2 --nproc_per_node 1 pytorch_hccl_tests/cli.py --benchmark latency --device cpu

# then on node 2; 
torchrun --master_addr=172.16.16.42 --master_port=29500 --node_rank 1 --nnodes 2 --nproc_per_node 1 pytorch_hccl_tests/cli.py --benchmark latency --device cpu


(pytorch-osu-testing) [antony@gpu5 pytorch-hccl-tests]$ torchrun --master_addr=172.16.16.42 --master_port=29500 --node_rank 0 --nnodes 2 --nproc_per_node 1 pytorch_hccl_tests/cli.py --benchmark latency --device cpu
[2024-06-20 16:18:47,241] {cli.py:67} INFO - ******************************
[2024-06-20 16:18:47,241] {cli.py:68} INFO - Selected benchmark : latency
[2024-06-20 16:18:47,241] {cli.py:69} INFO - Input device param : cpu
[2024-06-20 16:18:47,241] {cli.py:70} INFO - Input dtype param  : float
[2024-06-20 16:18:47,241] {cli.py:71} INFO - Global rank        : 0
[2024-06-20 16:18:47,241] {cli.py:72} INFO - Local rank         : 0
[2024-06-20 16:18:47,241] {cli.py:73} INFO - ******************************
[2024-06-20 16:18:47,241] {commons.py:108} INFO - Init distributed env device: cpu / local_rank 0
[2024-06-20 16:18:47,289] {commons.py:162} INFO - Python version: 3.9.18
[2024-06-20 16:18:47,289] {commons.py:163} INFO - PyTorch version: 2.3.1+rocm6.0
[2024-06-20 16:18:47,290] {commons.py:164} INFO - PyTorch MPI enabled?: False
[2024-06-20 16:18:47,290] {commons.py:165} INFO - PyTorch CUDA enabled?: True
[2024-06-20 16:18:47,290] {commons.py:166} INFO - PyTorch NCCL enabled?: True
[2024-06-20 16:18:47,290] {commons.py:167} INFO - PyTorch Gloo enabled?: True
[2024-06-20 16:18:47,290] {commons.py:174} WARNING - ********************************************************************************
[2024-06-20 16:18:47,290] {commons.py:175} WARNING - * PyTorch Ascend (NPU) is NOT installed.
[2024-06-20 16:18:47,290] {commons.py:176} WARNING - * You must install PyTorch Ascend Adaptor from https://gitee.com/ascend/pytorch. *
[2024-06-20 16:18:47,290] {commons.py:179} WARNING - ********************************************************************************
[2024-06-20 16:18:47,290] {commons.py:181} INFO - Using device *cpu* with *gloo* backend
[2024-06-20 16:18:47,290] {commons.py:182} INFO - World size: 2
[2024-06-20 16:18:47,290] {osu_util_mpi.py:32} INFO - # PyTorch Benchmark Latency Test
[2024-06-20 16:18:47,290] {osu_util_mpi.py:33} INFO - # Size (B) Elapsed Time (ms)
[2024-06-20 16:18:48,455] {osu_latency.py:72} INFO - 0                      52.13
[2024-06-20 16:18:49,650] {osu_latency.py:72} INFO - 4                      54.28