portuguese_wsc 다운로드 portuguese_wsc 소스 코드 다운로드

portuguese_wsc

AI 소스 코드

ENIAC

다운로드

포르투갈어 Winograd 스키마 챌린지

현재 개발 중입니다

포르투갈어의 Winograd 스키마 챌린지 솔버. 원래 Winograd Schema Challenge에 대한 포르투갈어 번역도 여기에서 제안되고 있습니다.

예비 결과는 회의 논문에 제시되었다 : Melo, Gabriela Souza de; Imaizumi, Vinicius A.; Cozman, Fabio Gagliardi. 포르투갈어의 Winograd Schemas. 에서 : Encontro Nacional de Inteligência 인공 E Computacional, 2019.

프로젝트 설정

이 프로젝트는 CUDA GPU가없는 기계에서 테스트되지 않았습니다.
Dockerfile을 사용할 수 있으며 docker build -t wsc_port . 다음 nvidia-docker run -it -v $PWD/models:/code/models wsc_port <desired_command> (예 : nvidia-docker run -it -v $PWD/models:/code/models wsc_port python -m src.main ).
Docker-Compose 파일에는 코드를 실행하기위한 몇 가지 다른 옵션이 포함되어 있으며 다음과 같은 명령으로 실행할 수 있습니다. docker-compose run <service_name> (예 : docker-compose run train ). Jupyter-Server의 경우 docker-compose run --service-ports jupyter-server )으로 실행하십시오 (웹 페이지에 액세스하기위한 root ).
Docker 컨테이너 밖에서 달리기 위해서는 Conda가 필요합니다.
- 콘다 환경을 만들려면 : conda env create -f environment.yml
MakeFile에는 코드를 실행하는 데 사용되는 일부 명령이 포함되어 있습니다. 이 명령은 환경 내부에서 실행되어야합니다.
- 프로젝트 실행 환경을 설정하려면 : make dev-init . 이 명령은 또한 make processed-data 모델을 훈련하는 데 필요한 데이터를 준비합니다.
  - 사용중인 코퍼스에 해당하는 데이터는 다음과 같이 구성됩니다.
    - 원시 데이터 : 최종 Winograd Schema Challenge Schema Collection Jsons를 생성하는 데 사용되는 파일
    - 외부 데이터 : Wikipedia 's Dump Archive에서 다운로드 한 압축 XML 파일
    - 중간 데이터 : 위에서 추출한 TXT 파일. 다른 작은 파일 사이에서 분할되거나 나오지 않을 수 있습니다.
    - 처리 된 데이터 : 열차, 테스트 및 검증 스플릿 사이에 텍스트 분할이 포함 된 TXT 파일. 또한 생성 된 Winograd Schema Challenge Schema Collection Jsons도 포함되어 있습니다.
      - 또한, make reduced-processed-data 각 스플릿의 크기가 줄어 듭니다.
- make corpus 첫 번째 코드 실행 속도를 높일 수 있습니다 (그러나 필요하지 않음)
- make train
- make winograd-test Winograd Schema Challenge의 평가를 실행하십시오
- 생성 텍스트 생성에 대한 언어 모델을 make generate
코드는 영어와 포르투갈 사례 모두에 대해 실행 되며이 설정은 src.consts 의 변수 PORTUGUESE 에 의해 제어됩니다.
make tests 사용하여 테스트를 실행합니다.이 테스트는 pytest --cov=src tests/ 와 같습니다. HTML 테스트 보고서 생성에 pytest --cov=src --cov-report=html tests/ 사용하십시오. Pytest 및 Pytest-Cov 패키지가 필요합니다. 가져 오기 오류가있는 경우 pip install -e . 소스 코드에서 패키지를 로컬로 설치하려면.

Winograd Collection Generation

이 저장소에는 원래 HTML 파일에서 Winograd Schema Collection JSON을 생성하여 솔버가 사용할 준비가되어 있습니다. 이 세대는 python -m src.winograd_collection_manipulation.wsc_subsets_generation 실행하여 발생합니다. 번역 된 이름으로 버전을 생성하려면 첫 번째 명령 후에 python -m src.winograd_collection_manipulation.name_replacer 실행하십시오. JSON 파일이 이미이 저장소에 존재한다는 점을 감안할 때 솔버를 실행할 수 있도록 이러한 명령을 호출 할 필요는 없습니다. 그러나이 코드는 다른 언어에 대한 도전에 대한 번역에 도움이 될 수있는 경우이 코드를 사용할 수 있습니다.

프로젝트 조직

 ├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`.
├── README.md          <- The top-level README for developers using this project.
├── environment.yml    <- Contains project's requirements, generated from Anaconda environment.
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported.
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── githooks           <- Contains githooks scripts being used for development. Git hook directory for repo needs to be set to this folder.
│
├── models             <- Trained and serialized models, model predictions, or model summaries. Gitignored due to their size.
│
├── notebooks          <- Jupyter notebooks, used during experimentation and testing.
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module.
└── tests              <- Tests module, using Pytest.

Cookiecutter 데이터 과학 프로젝트 템플릿을 기반으로 한 프로젝트. #cookiecutterdatascience

참조

Pytorch의 단어 수준 언어 모델링 RNN 예제를 기반으로 언어 모델에 대한 코드
이 중간 게시물의 도움을받은 Pytorch-Encoding 패키지를 기반으로 Pytorch 모델의 병렬화 코드.
Trieu H. Trinh and Quoc V. Le, 2018의 논문 "상식 추론을위한 간단한 방법"을 기반으로 Winograd Schema Challenge를 해결하기 위해 언어 모델을 사용하는 아이디어.

확장하다

추가 정보