@inproceedings{bod-1993-monte,
title = "{M}onte {C}arlo Parsing",
author = "Bod, Rens",
editor = "Bunt, Harry and
Berwick, Robert and
Church, Ken and
Joshi, Aravind and
Kaplan, Ronald and
Kay, Martin and
Lang, Bernard and
Nagao, Makoto and
Nijholt, Anton and
Steedman, Mark and
Thompson, Henry and
Tomita, Masaru and
Vijay-Shanker, K. and
Wilks, Yorick and
Wittenburg, Kent",
booktitle = "Proceedings of the Third International Workshop on Parsing Technologies",
month = aug # " 10-13",
year = "1993",
address = "Tilburg, Netherlands and Durbuy, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/1993.iwpt-1.2",
pages = "1--12",
abstract = "In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations {--} like Data Oriented Parsing (DOP) or Stochastic Lexicalized Tree-Adjoining Grammar (SLTAG) {--} the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbi-style optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be $O(n^2\varepsilon^{-2})$, where $n$ is the length of the input string and $\varepsilon$ the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96{\%} test set parsing accuracy.",
}