Commit ·
81104f5
1
Parent(s): c32e406
updated readme
Browse files- root_gnn_dgl/README.md +21 -1
root_gnn_dgl/README.md
CHANGED
|
@@ -110,4 +110,24 @@ python scripts/inference.py \
|
|
| 110 |
--branch 'GNN_Score'
|
| 111 |
```
|
| 112 |
|
| 113 |
-
You can also input a list as the `--config` and the `--branch` to simultaneously apply multiple models onto the same set of samples. An example on how to do this in shell script is in the `run_demo.sh` file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
--branch 'GNN_Score'
|
| 111 |
```
|
| 112 |
|
| 113 |
+
You can also input a list as the `--config` and the `--branch` to simultaneously apply multiple models onto the same set of samples. An example on how to do this in shell script is in the `run_demo.sh` file.
|
| 114 |
+
|
| 115 |
+
## Running Jobs + Parallelization
|
| 116 |
+
|
| 117 |
+
Perlmutter job scripts are located in `jobs/`. Job scripts are separated into 3 categories: `pre_data`, `training`, and `inference`.
|
| 118 |
+
|
| 119 |
+
The different shell scripts show how to request GPU or CPU nodes from Perlmutter, which are reqired for running jobs.
|
| 120 |
+
|
| 121 |
+
### Data Prep Parallelization
|
| 122 |
+
|
| 123 |
+
The preparation of data can be parallelized across several threads on a CPU. The parallelization is handled by python's `concurrent.futures.ThreadPoolExecutor`.
|
| 124 |
+
|
| 125 |
+
### Training Parallelization
|
| 126 |
+
|
| 127 |
+
Parallelization of GNN training is implemented with `torch.DistributedDataParallel`. The job submission script is in `jobs/training/multinode/submit.sh`.
|
| 128 |
+
|
| 129 |
+
When running a multinode training, remember to use the `--multinode` run-time arguement for the training script.
|
| 130 |
+
|
| 131 |
+
### Inference Parallelization
|
| 132 |
+
|
| 133 |
+
Model inference parallelization is done with `mpi4py` (currently not listed in the conda environment requirements).
|