logo

Dataset

  • chatbot
  • corpus
  • crawl
  • dictionary
  • document-ranking
  • dumping
  • embedding
  • generative
  • keyphrase
  • knowledge-graph
  • lexicon
  • llm-benchmark
  • llm-instruction
  • news
  • nlq
  • normalization
  • ocr
  • paraphrase
  • parsing
  • phoneme
  • question-answer
  • segmentation
  • sentiment
  • speech
  • speech-to-text
  • speech-to-text-semisupervised
  • spelling-correction
  • summarization
  • tagging
  • tatabahasa
  • text-similarity
  • text-to-speech
  • tokenization
  • translation
  • true-case
Theme by the Executable Book Project
  • .rst
Contents
  • ada-002
    • download
  • bge-large-en
    • download
  • Instructions pair
    • download

embedding

Contents

  • ada-002
    • download
  • bge-large-en
    • download
  • Instructions pair
    • download

embedding#

ada-002#

download#

  1. https://huggingface.co/datasets/mesolitica/OpenAI-embedding-ada-002/resolve/main/ada-002-b.cari.com.my.jsonl

  2. https://huggingface.co/datasets/mesolitica/OpenAI-embedding-ada-002/resolve/main/ada-002-carigold.jsonl

  3. https://huggingface.co/datasets/mesolitica/OpenAI-embedding-ada-002/resolve/main/ada-002-facebook.jsonl

  4. https://huggingface.co/datasets/mesolitica/OpenAI-embedding-ada-002/resolve/main/ada-002-lowyat.jsonl

  5. https://huggingface.co/datasets/mesolitica/OpenAI-embedding-ada-002/resolve/main/ada-002-twitter.jsonl

bge-large-en#

download#

Full list at https://huggingface.co/datasets/mesolitica/bge-large-en-embedding/tree/main

Instructions pair#

download#

All datasets at https://huggingface.co/datasets/mesolitica/instructions-pair-mining/tree/main

previous

dumping

next

generative

By mesolitica
© Copyright 2020, mesolitica.