Currently under development
Solver for Winograd Schema Challenge in Portuguese. Portuguese translations for original Winograd Schema Challenge are also being proposed here.
Preliminary results were presented on a conference paper: Melo, Gabriela Souza de; Imaizumi, Vinicius A. ; Cozman, Fabio Gagliardi . Winograd Schemas in Portuguese. In: Encontro Nacional de Inteligência Artificial e Computacional, 2019.
This project has not been tested in machines without CUDA GPUs available.
A Dockerfile is available, and may be used with docker build -t wsc_port . followed by nvidia-docker run -it -v $PWD/models:/code/models wsc_port <desired_command> (ie nvidia-docker run -it -v $PWD/models:/code/models wsc_port python -m src.main).
The docker-compose file contains a few different options for running the code, which can be run with commands such as: docker-compose run <service_name> (ie docker-compose run train). For the jupyter-server, run with docker-compose run --service-ports jupyter-server (password for accessing the webpage for it is root).
For running outside of the Docker container, Conda is required.
conda env create -f environment.ymlMakefile contains some of the commands used to run the code. These commands must be run from inside the environment.
make dev-init. This command also makes sure make processed-data is run, which prepares data needed to train model
make reduced-processed-data reduces size of each of these splitsmake corpus will speed up first run of code (but is not necessary)make train trains a modelmake winograd-test runs evaluation of Winograd Schema Challengemake generate runs language model for generation of textCode runs for both English and Portuguese cases, and this setting is controlled by the variable PORTUGUESE in src.consts.
Run tests with make tests, which is equivalent to pytest --cov=src tests/. Use pytest --cov=src --cov-report=html tests/ for generation of HTML test report. Needs pytest and pytest-cov packages. If there are import errors, should run pip install -e . to locally install package from source code.
There is also code in this repository for generating the Winograd Schema Collection JSON, from the original HTML file, to be ready to be used by the solver. This generation happens by executing python -m src.winograd_collection_manipulation.wsc_subsets_generation. To generate the version with translated names, after that first command, simply run python -m src.winograd_collection_manipulation.name_replacer. These commands don't need to be called to be able to run the solver, given that the JSON file is already present in this repository. However, this code is being made available, in case it can help with translations for the Challenge to other languages.
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`.
├── README.md <- The top-level README for developers using this project.
├── environment.yml <- Contains project's requirements, generated from Anaconda environment.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported.
│
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── githooks <- Contains githooks scripts being used for development. Git hook directory for repo needs to be set to this folder.
│
├── models <- Trained and serialized models, model predictions, or model summaries. Gitignored due to their size.
│
├── notebooks <- Jupyter notebooks, used during experimentation and testing.
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module.
└── tests <- Tests module, using Pytest.
Project based on the cookiecutter data science project template. #cookiecutterdatascience