@inproceedings{khankasikam-muansuwqan-2005-thai,
title = "{T}hai Word Segmentation a Lexical Semantic Approach",
author = "Khankasikam, Krisda and
Muansuwqan, Nuttanart",
booktitle = "Proceedings of Machine Translation Summit X: Posters",
month = sep # " 13-15",
year = "2005",
address = "Phuket, Thailand",
url = "https://aclanthology.org/2005.mtsummit-posters.2/",
pages = "331--338",
abstract = "In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed to determine word boundary in Thai sentences. Many applications of Thai Language Processing require the word segmentation. Several approaches of Thai word segmentation such as maximal matching, longest matching and n-gram model do not take semantics into consideration. This paper presents a Thai word segmentation system using semantic corpus which is composed of four steps: generating all possible candidates, proper noun consideration, semantic tagging and semantic checking. The first three steps are conducted using a dictionary. Semantic checking is carried out on the basis of corpus-based approach. Finally, we assign the semantic scores to segmented words and select the ones that contain maximum semantic scores. In order to assign semantic scores, we use a Thai proper noun database and the semantic corpus derived from ORCHID corpus. This approach is more reliable than other approaches that do not take the meaning into consideration and performs the level of accuracy at 96-99{\%} depending on the characteristic of input and the dictionary used in the segmentation."
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="khankasikam-muansuwqan-2005-thai">
<titleInfo>
<title>Thai Word Segmentation a Lexical Semantic Approach</title>
</titleInfo>
<name type="personal">
<namePart type="given">Krisda</namePart>
<namePart type="family">Khankasikam</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Nuttanart</namePart>
<namePart type="family">Muansuwqan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2005-sep 13-15</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Proceedings of Machine Translation Summit X: Posters</title>
</titleInfo>
<originInfo>
<place>
<placeTerm type="text">Phuket, Thailand</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
</relatedItem>
<abstract>In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed to determine word boundary in Thai sentences. Many applications of Thai Language Processing require the word segmentation. Several approaches of Thai word segmentation such as maximal matching, longest matching and n-gram model do not take semantics into consideration. This paper presents a Thai word segmentation system using semantic corpus which is composed of four steps: generating all possible candidates, proper noun consideration, semantic tagging and semantic checking. The first three steps are conducted using a dictionary. Semantic checking is carried out on the basis of corpus-based approach. Finally, we assign the semantic scores to segmented words and select the ones that contain maximum semantic scores. In order to assign semantic scores, we use a Thai proper noun database and the semantic corpus derived from ORCHID corpus. This approach is more reliable than other approaches that do not take the meaning into consideration and performs the level of accuracy at 96-99% depending on the characteristic of input and the dictionary used in the segmentation.</abstract>
<identifier type="citekey">khankasikam-muansuwqan-2005-thai</identifier>
<location>
<url>https://aclanthology.org/2005.mtsummit-posters.2/</url>
</location>
<part>
<date>2005-sep 13-15</date>
<extent unit="page">
<start>331</start>
<end>338</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Thai Word Segmentation a Lexical Semantic Approach
%A Khankasikam, Krisda
%A Muansuwqan, Nuttanart
%S Proceedings of Machine Translation Summit X: Posters
%D 2005
%8 sep 13 15
%C Phuket, Thailand
%F khankasikam-muansuwqan-2005-thai
%X In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed to determine word boundary in Thai sentences. Many applications of Thai Language Processing require the word segmentation. Several approaches of Thai word segmentation such as maximal matching, longest matching and n-gram model do not take semantics into consideration. This paper presents a Thai word segmentation system using semantic corpus which is composed of four steps: generating all possible candidates, proper noun consideration, semantic tagging and semantic checking. The first three steps are conducted using a dictionary. Semantic checking is carried out on the basis of corpus-based approach. Finally, we assign the semantic scores to segmented words and select the ones that contain maximum semantic scores. In order to assign semantic scores, we use a Thai proper noun database and the semantic corpus derived from ORCHID corpus. This approach is more reliable than other approaches that do not take the meaning into consideration and performs the level of accuracy at 96-99% depending on the characteristic of input and the dictionary used in the segmentation.
%U https://aclanthology.org/2005.mtsummit-posters.2/
%P 331-338
Markdown (Informal)
[Thai Word Segmentation a Lexical Semantic Approach](https://aclanthology.org/2005.mtsummit-posters.2/) (Khankasikam & Muansuwqan, MTSummit 2005)
ACL