logo

Dataset

  • chatbot
  • corpus
  • crawl
  • dictionary
  • document-ranking
  • dumping
  • embedding
  • generative
  • keyphrase
  • knowledge-graph
  • lexicon
  • llm-benchmark
  • llm-instruction
  • news
  • nlq
  • normalization
  • ocr
  • paraphrase
  • parsing
  • phoneme
  • question-answer
  • segmentation
  • sentiment
  • speech
  • speech-to-text
  • speech-to-text-semisupervised
  • spelling-correction
  • summarization
  • tagging
  • tatabahasa
  • text-similarity
  • text-to-speech
  • tokenization
  • translation
  • true-case
Theme by the Executable Book Project
  • .rst
Contents
  • Segmentation
    • download
    • Citation

segmentation

Contents

  • Segmentation
    • download
    • Citation

segmentation#

Segmentation#

Build custom segmentation augmentation.

download#

  1. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-iium.tsv

  2. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-multisentences-iium.tsv

  3. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-multisentences-news.tsv

  4. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-multisentences-wiki.tsv

  5. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-news.tsv

  6. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-short-iium.tsv

  7. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-short-news.tsv

  8. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-short-wiki.tsv

  9. https://f000.backblazeb2.com/file/malay-dataset/segmentation/segmentation-wiki.tsv

  10. https://f000.backblazeb2.com/file/malay-dataset/segmentation/test-set-segmentation.json

Citation#

@misc{Malay-Dataset, We gather Bahasa Malaysia corpus!, Segmentation Augmentation,
author = {Husein, Zolkepli},
title = {Malay-Dataset},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huseinzol05/malay-dataset/tree/master/segmentation}}
}

previous

question-answer

next

sentiment

By mesolitica
© Copyright 2020, mesolitica.