logo

Dataset

  • chatbot
  • corpus
  • crawl
  • dictionary
  • document-ranking
  • dumping
  • embedding
  • generative
  • keyphrase
  • knowledge-graph
  • lexicon
  • llm-benchmark
  • llm-instruction
  • news
  • nlq
  • normalization
  • ocr
  • paraphrase
  • parsing
  • phoneme
  • question-answer
  • segmentation
  • sentiment
  • speech
  • speech-to-text
  • speech-to-text-semisupervised
  • spelling-correction
  • summarization
  • tagging
  • tatabahasa
  • text-similarity
  • text-to-speech
  • tokenization
  • translation
  • true-case
Theme by the Executable Book Project
  • .rst
Contents
  • Ayat Aktif - Pasif
    • download
  • Kesalahan Tatabahasa Choice
    • download
  • Kesalahan Tatabahasa for Seq2Seq
    • download

tatabahasa

Contents

  • Ayat Aktif - Pasif
    • download
  • Kesalahan Tatabahasa Choice
    • download
  • Kesalahan Tatabahasa for Seq2Seq
    • download

tatabahasa#

Ayat Aktif - Pasif#

Generate using ChatGPT 4.

download#

  1. https://huggingface.co/datasets/mesolitica/chatgpt-wikipedia-qa/resolve/main/wikipedia-qa.jsonl

Kesalahan Tatabahasa Choice#

Generate using ChatGPT 3.5.

download#

  1. https://huggingface.co/datasets/mesolitica/kesalahan-tatabahasa-choice/resolve/main/kesalahan-tatabahasa-choice.jsonl

Kesalahan Tatabahasa for Seq2Seq#

Data generation script at https://github.com/mesolitica/malaya/tree/master/session/tatabahasa/prepare-dataset

download#

  1. https://huggingface.co/datasets/mesolitica/kesalahan-tatabahasa/resolve/main/train2.jsonl

  2. https://huggingface.co/datasets/mesolitica/kesalahan-tatabahasa/resolve/main/train.jsonl

  3. https://huggingface.co/datasets/mesolitica/kesalahan-tatabahasa/resolve/main/test.jsonl

previous

tagging

next

text-similarity

By mesolitica
© Copyright 2020, mesolitica.