Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,88 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: reinforcement-learning
|
| 4 |
---
|
| 5 |
+
# RLOR: A Flexible Framework of Deep Reinforcement Learning for Operation Research
|
| 6 |
+
|
| 7 |
+
1οΈβ£ First work to incorporate end-to-end vehicle routing model in a modern RL platform (CleanRL)
|
| 8 |
+
|
| 9 |
+
β‘ Speed up the training of Attention Model by 8 times (25hours --> 3 hours)
|
| 10 |
+
|
| 11 |
+
π A flexible framework for developing *model*, *algorithm*, *environment*, and *search* for operation research
|
| 12 |
+
|
| 13 |
+
## News
|
| 14 |
+
|
| 15 |
+
- 24/03/2023: We release our paper on [arxiv](https://arxiv.org/abs/2303.13117)!
|
| 16 |
+
- 20/03/2023: We release demo and pretrained checkpoints!
|
| 17 |
+
- 10/03/2023: We release our codebase!
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
## Demo
|
| 21 |
+
We provide inference demo on colab notebook:
|
| 22 |
+
|
| 23 |
+
| Environment | Search | Demo |
|
| 24 |
+
| ----------- | ------------ | ------------------------------------------------------------ |
|
| 25 |
+
| TSP | Greedy | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/tsp_search.ipynb"><br/> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |
|
| 26 |
+
| CVRP | Multi-Greedy | <a target="_blank" href="https://colab.research.google.com/github/cpwan/RLOR/blob/main/demo/cvrp_search.ipynb"><br/> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/><br/></a> |
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
## Installation
|
| 30 |
+
### Conda
|
| 31 |
+
```shell
|
| 32 |
+
conda env create -n <env name> -f environment.yml
|
| 33 |
+
# The environment.yml was generated from
|
| 34 |
+
# conda env export --no-builds > environment.yml
|
| 35 |
+
```
|
| 36 |
+
It can take a few minutes.
|
| 37 |
+
### Optional dependency
|
| 38 |
+
`wandb`
|
| 39 |
+
|
| 40 |
+
Refer to their [quick start guide](https://docs.wandb.ai/quickstart) for installation.
|
| 41 |
+
|
| 42 |
+
## File structures
|
| 43 |
+
All the major implementations were under [rlor](./rlor) folder.
|
| 44 |
+
```shell
|
| 45 |
+
./rlor
|
| 46 |
+
βββ envs
|
| 47 |
+
β βββ tsp_data.py # load pre-generated data for evaluation
|
| 48 |
+
β βββ tsp_vector_env.py # define the (vectorized) gym environment
|
| 49 |
+
β βββ cvrp_data.py
|
| 50 |
+
β βββ cvrp_vector_env.py
|
| 51 |
+
βββ models
|
| 52 |
+
β βββ attention_model_wrapper.py # wrap refactored attention model to cleanRL
|
| 53 |
+
β βββ nets # contains refactored attention model
|
| 54 |
+
βββ ppo_or.py # implementaion of ppo with attention model for operation research problems
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
The [ppo_or.py](./ppo_or.py) was modified from [cleanrl/ppo.py](https://github.com/vwxyzjn/cleanrl/blob/28fd178ca182bd83c75ed0d49d52e235ca6cdc88/cleanrl/ppo.py). To see what's changed, use diff:
|
| 58 |
+
```shell
|
| 59 |
+
# apt install diff
|
| 60 |
+
diff --color ppo.py ppo_or.py
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
## Training OR model with PPO
|
| 64 |
+
### TSP
|
| 65 |
+
```shell
|
| 66 |
+
python ppo_or.py --num-steps 51 --env-id tsp-v0 --env-entry-point envs.tsp_vector_env:TSPVectorEnv --problem tsp
|
| 67 |
+
```
|
| 68 |
+
### CVRP
|
| 69 |
+
```shell
|
| 70 |
+
python ppo_or.py --num-steps 60 --env-id cvrp-v0 --env-entry-point envs.cvrp_vector_env:CVRPVectorEnv --problem cvrp
|
| 71 |
+
```
|
| 72 |
+
### Enable WandB
|
| 73 |
+
```shell
|
| 74 |
+
python ppo_or.py ... --track
|
| 75 |
+
```
|
| 76 |
+
Add `--track` argument to enable tracking with WandB.
|
| 77 |
+
|
| 78 |
+
### Where is the tsp data?
|
| 79 |
+
It can be generated from the [official repo](https://github.com/wouterkool/attention-learn-to-route) of the attention-learn-to-route paper. You may modify the [./envs/tsp_data.py](./envs/tsp_data.py) to update the path to data accordingly.
|
| 80 |
+
|
| 81 |
+
# Acknowledgements
|
| 82 |
+
The neural network model is refactored and developed from [Attention, Learn to Solve Routing Problems!](https://github.com/wouterkool/attention-learn-to-route).
|
| 83 |
+
|
| 84 |
+
The idea of multiple trajectory training/ inference is from [POMO: Policy Optimization with Multiple Optima for Reinforcement Learning](https://proceedings.neurips.cc/paper/2020/hash/f231f2107df69eab0a3862d50018a9b2-Abstract.html).
|
| 85 |
+
|
| 86 |
+
The RL environments are defined with [OpenAI Gym](https://github.com/openai/gym/tree/0.23.1).
|
| 87 |
+
|
| 88 |
+
The PPO algorithm implementation is based on [CleanRL](https://github.com/vwxyzjn/cleanrl).
|