tagging
Contents
tagging#
Description#
"""
Describe OntoNotes5 Entities supported. https://spacy.io/api/annotation#named-entities
"""
d = [
{'Tag': 'PERSON', 'Description': 'People, including fictional.'},
{
'Tag': 'NORP',
'Description': 'Nationalities or religious or political groups.',
},
{
'Tag': 'FAC',
'Description': 'Buildings, airports, highways, bridges, etc.',
},
{
'Tag': 'ORG',
'Description': 'Companies, agencies, institutions, etc.',
},
{'Tag': 'GPE', 'Description': 'Countries, cities, states.'},
{
'Tag': 'LOC',
'Description': 'Non-GPE locations, mountain ranges, bodies of water.',
},
{
'Tag': 'PRODUCT',
'Description': 'Objects, vehicles, foods, etc. (Not services.)',
},
{
'Tag': 'EVENT',
'Description': 'Named hurricanes, battles, wars, sports events, etc.',
},
{'Tag': 'WORK_OF_ART', 'Description': 'Titles of books, songs, etc.'},
{'Tag': 'LAW', 'Description': 'Named documents made into laws.'},
{'Tag': 'LANGUAGE', 'Description': 'Any named language.'},
{
'Tag': 'DATE',
'Description': 'Absolute or relative dates or periods.',
},
{'Tag': 'TIME', 'Description': 'Times smaller than a day.'},
{'Tag': 'PERCENT', 'Description': 'Percentage, including "%".'},
{'Tag': 'MONEY', 'Description': 'Monetary values, including unit.'},
{
'Tag': 'QUANTITY',
'Description': 'Measurements, as of weight or distance.',
},
{'Tag': 'ORDINAL', 'Description': '"first", "second", etc.'},
{
'Tag': 'CARDINAL',
'Description': 'Numerals that do not fall under another type.',
},
]
Download translated Ontonotes#
https://f000.backblazeb2.com/file/malay-dataset/tagging/ontonotes5/translated-ontonotes5.json
Processed Ontonotes#
train set, https://f000.backblazeb2.com/file/malay-dataset/tagging/ontonotes5/processed-train-ontonotes5.json
test set, https://f000.backblazeb2.com/file/malay-dataset/tagging/ontonotes5/processed-test-ontonotes5.json
Augmented#
Prefix, https://f000.backblazeb2.com/file/malay-dataset/tagging/ontonotes5/
augmentation-address-ontonotes5.json
augmentation-event-ontonotes5.json
augmentation-fac-ontonotes5.json
augmentation-gpe-ontonotes5.json
augmentation-language-ontonotes5.json
augmentation-law-ontonotes5.json
augmentation-loc-ontonotes5.json
augmentation-norp-ontonotes5.json
augmentation-org-ontonotes5.json
augmentation-person-ontonotes5.json
augmentation-product-ontonotes5.json
augmentation-work-of-art-ontonotes5.json
Train test set, ontonotes5-train-test.json
Entities#
Original website, https://github.com/yusufsyaifudin/indonesia-ner, with augmentation.
download#
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/entities-data-v4.json
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/event-augmentation.json
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/law-augmentation.json
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/location-augmentation.json
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/name-augmentation.json
https://huggingface.co/datasets/mesolitica/NER-augmentation/resolve/main/org-augmentation.json
Part-of-Speech#
Original website, https://github.com/UniversalDependencies/UD_Indonesian-GSD, included augmentation.