The ML-repo Zen
In order of significance
- Progress bar everything that takes time - use tqdm!
- Replace train-eval-test logic with (train-eval)/predict.
- Make resumable everything that can be resumed.
- Checkpoint all big intermediary calculations.
- Make model loading possible in two lines of code (import and load) and inference in three.
- Make inference possible from raw data (numpy arrays, text etc).
- Allow batch compute on inference.
- Learn to package: you could use argparse :)
- Allow issues section for community discussion, even if you want to unsubscribe yourself.
- Don’t hesitate to require some knowledge of python (or any) but not some knowledge of how you think.
- Support multiple cudatoolkits: don’t crush someones PC by a reckless environment.yml file.
- Run a memory stress test for a synthetic dataset and report it in README.
- Don’t indulge to providing proof of concept but try to provide something useful - this will totally prove your concept.
- Make your work citable.
- Make docker train easy and batch calculation possible for docker predict.
- Respect creators: Don’t ask from someone to google-search for you or don’t expect to get an answer to a question that is outside the specificity of the repo.
- Respect users: Above + answer their questions if you can or else notify them that you can’t.