No longer maintained, the latest version is here: CGCL-codes/naturalcc.
NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models for many software engineering tasks, e.g., code summarization, code retrieval and code clone detection. Our vision is to bridge the gap between programming language and natural language via some machine learning techniques.
About us: XCodeMind
NaturalCC demo page: NCC demo
This repository is an ongoing project and we are willing to invite you to attend its development. If you meet any bug or problem while using, feel free to contact us and we will try our best to help you. On the other hand, if you want to merge your workflow into this project, please apply to push your requests.
The project is inspired by fairseq. Thanks for its appearance.
- mixed precision training
- multi-gpu training
- raw/bin data reading/writing
TBC...
Currently, we have processed the following datasets:
TBC...
Please wait.
- PyTorch version >= 1.4.0
- Python version >= 3.6
- For training new models, you'll also need an NVIDIA GPU and NCCL
- For faster training install NVIDIA's apex library with the --cuda_ext and --deprecated_fused_adam options
1) Install apex
to support half precision training.
git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
git clone https://github.com/xcodemind/naturalcc
cd naturalcc
pip install -r requirements.txt
# or install with conda
# conda install --yes --file requirements.txt
BTW, install.md supports virtual environment installation in details. If you meet problems in installation, you can refer to the file.
# build for development
python setup.py build_ext --inplace
# install
pip install --editable ./
NaturalCC is MIT-licensed. The license applies to the pre-trained models as well.
Please cite as: xxx