Rabih Zbib


2020

pdf bib
Reformulating Information Retrieval from Speech and Text as a Detection Problem
Damianos Karakos | Rabih Zbib | William Hartmann | Richard Schwartz | John Makhoul
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

In the IARPA MATERIAL program, information retrieval (IR) is treated as a hard detection problem; the system has to output a single global ranking over all queries, and apply a hard threshold on this global list to come up with all the hypothesized relevant documents. This means that how queries are ranked relative to each other can have a dramatic impact on performance. In this paper, we study such a performance measure, the Average Query Weighted Value (AQWV), which is a combination of miss and false alarm rates. AQWV requires that the same detection threshold is applied to all queries. Hence, detection scores of different queries should be comparable, and, to do that, a score normalization technique (commonly used in keyword spotting from speech) should be used. We describe unsupervised methods for score normalization, which are borrowed from the speech field and adapted accordingly for IR, and demonstrate that they greatly improve AQWV on the task of cross-language information retrieval (CLIR), on three low-resource languages used in MATERIAL. We also present a novel supervised score normalization approach which gives additional gains.

2019

pdf bib
Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval
Lingjun Zhao | Rabih Zbib | Zhuolin Jiang | Damianos Karakos | Zhongqiang Huang
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Retrieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the training data usually has limited coverage for possible queries. In this paper, we design a model which does not require relevance annotations, instead it is trained on samples extracted from translation corpora as weak supervision. This model relies on an attention mechanism to learn spans in the foreign sentence that are relevant to the query. We report experiments on two low resource languages: Swahili and Tagalog, trained on less that 100k parallel sentences each. The proposed model achieves 19 MAP points improvement compared to using CNNs for feature extraction, 12 points improvement from machine translation-based CLIR, and up to 6 points improvement compared to probabilistic CLIR models.

2015

pdf bib
Statistical Machine Translation Features with Multitask Tensor Networks
Hendra Setiawan | Zhongqiang Huang | Jacob Devlin | Thomas Lamar | Rabih Zbib | Richard Schwartz | John Makhoul
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
Jacob Devlin | Rabih Zbib | Zhongqiang Huang | Thomas Lamar | Richard Schwartz | John Makhoul
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Rabih Zbib | Gretchen Markiewicz | Spyros Matsoukas | Richard Schwartz | John Makhoul
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Zhongqiang Huang | Jacob Devlin | Rabih Zbib
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Machine Translation of Arabic Dialects
Rabih Zbib | Erika Malchiodi | Jacob Devlin | David Stallard | Spyros Matsoukas | Richard Schwartz | John Makhoul | Omar F. Zaidan | Chris Callison-Burch
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Decision Trees for Lexical Smoothing in Statistical Machine Translation
Rabih Zbib | Spyros Matsoukas | Richard Schwartz | John Makhoul
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Ibrahim Badr | Rabih Zbib | James Glass
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Segmentation for English-to-Arabic Statistical Machine Translation
Ibrahim Badr | Rabih Zbib | James Glass
Proceedings of ACL-08: HLT, Short Papers