ho22joshua commited on
Commit
81104f5
·
1 Parent(s): c32e406

updated readme

Browse files
Files changed (1) hide show
  1. root_gnn_dgl/README.md +21 -1
root_gnn_dgl/README.md CHANGED
@@ -110,4 +110,24 @@ python scripts/inference.py \
110
  --branch 'GNN_Score'
111
  ```
112
 
113
- You can also input a list as the `--config` and the `--branch` to simultaneously apply multiple models onto the same set of samples. An example on how to do this in shell script is in the `run_demo.sh` file.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  --branch 'GNN_Score'
111
  ```
112
 
113
+ You can also input a list as the `--config` and the `--branch` to simultaneously apply multiple models onto the same set of samples. An example on how to do this in shell script is in the `run_demo.sh` file.
114
+
115
+ ## Running Jobs + Parallelization
116
+
117
+ Perlmutter job scripts are located in `jobs/`. Job scripts are separated into 3 categories: `pre_data`, `training`, and `inference`.
118
+
119
+ The different shell scripts show how to request GPU or CPU nodes from Perlmutter, which are reqired for running jobs.
120
+
121
+ ### Data Prep Parallelization
122
+
123
+ The preparation of data can be parallelized across several threads on a CPU. The parallelization is handled by python's `concurrent.futures.ThreadPoolExecutor`.
124
+
125
+ ### Training Parallelization
126
+
127
+ Parallelization of GNN training is implemented with `torch.DistributedDataParallel`. The job submission script is in `jobs/training/multinode/submit.sh`.
128
+
129
+ When running a multinode training, remember to use the `--multinode` run-time arguement for the training script.
130
+
131
+ ### Inference Parallelization
132
+
133
+ Model inference parallelization is done with `mpi4py` (currently not listed in the conda environment requirements).