text-similarity =============== ANLI ---- Original website, https://huggingface.co/datasets/anli download ~~~~~~~~ 1. https://huggingface.co/datasets/mesolitica/translated-MNLI/resolve/main/anli.translated.jsonl Citation ~~~~~~~~ .. code:: bibtex @InProceedings{nie2019adversarial, title={Adversarial NLI: A New Benchmark for Natural Language Understanding}, author={Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe}, booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", } MNLI ---- Original website, https://cims.nyu.edu/~sbowman/multinli/ download ~~~~~~~~ 1. https://huggingface.co/datasets/mesolitica/translated-MNLI/resolve/main/translated-mnli-train.jsonl 2. https://huggingface.co/datasets/mesolitica/translated-MNLI/resolve/main/translated-mnli-dev_mismatched.jsonl 3. https://huggingface.co/datasets/mesolitica/translated-MNLI/resolve/main/translated-mnli-dev_matched.jsonl Citation ~~~~~~~~ .. code:: bibtex @InProceedings{N18-1101, author = "Williams, Adina and Nangia, Nikita and Bowman, Samuel", title = "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference", booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)", year = "2018", publisher = "Association for Computational Linguistics", pages = "1112--1122", location = "New Orleans, Louisiana", url = "http://aclweb.org/anthology/N18-1101" } Quora ----- download ~~~~~~~~ 1. part1, https://f000.backblazeb2.com/file/malay-dataset/text-similarity/quora/0-100k.json 2. part2, https://f000.backblazeb2.com/file/malay-dataset/text-similarity/quora/100k-200k.json 3. part3, https://f000.backblazeb2.com/file/malay-dataset/text-similarity/quora/200k-300k.json 4. part4, https://f000.backblazeb2.com/file/malay-dataset/text-similarity/quora/300k-400k.json 5. part5, https://f000.backblazeb2.com/file/malay-dataset/text-similarity/quora/400k-500k.json Citation ~~~~~~~~ .. code:: bibtex @misc{kaggle, title={Quora Question Pairs}, url={https://www.kaggle.com/c/quora-question-pairs}, journal={Kaggle}} SNLI ---- Original website, https://nlp.stanford.edu/projects/snli/ download ~~~~~~~~ 1. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/snli_1.0_train.json.translate 2. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/snli_1.0_test.json.translate 3. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/snli_1.0_dev.json.translate Citation ~~~~~~~~ .. code:: bibtex Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). [pdf] [bib] how-to ~~~~~~ 1. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part1.json 2. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part2.json 3. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part3.json 4. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part4.json 5. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part5.json 6. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part6.json 7. https://f000.backblazeb2.com/file/malay-dataset/text-similarity/snli/part7.json