Unicode Cross-language Interactive Retrieval System

Arabic Information Retrieval and Related Literature 




   Last Update Nov. 08, 2007  
  1. Voorhees, E. (2002). Overview of TREC 2002   NIST  TREC 2002 Proceedings. pp. 1-15.   
  2. Voorhees, E. and Harman, D. (2001) Overview of TREC 2001 , NIST TREC 2001 Proceedings pp. 1-15. 
  3. Fredric C. Gey, Douglas W. Oard (2001)  The TREC-2001 Cross-Language Information
    Retrieval Track: Searching Arabic Using English, French or Arabic Queries
     NIST  TREC 2001 Proceedings pp. 16-25
  4. Shereen Khoja, Roger Garside and Gerry Knowles (2001) An Arabic Tagset for the Morphosyntactic Tagging of Arabic Corpus Linguistics 2001, Lancaster University, Lancaster, UK, March 2001
  5. Kui Lam Kwok, Qiang Deng (2003) GeoName: a system for back-transliterating pinyin place names  Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References pp. 26-30
  6. Mohammed Aljlayl and Ophir Frieder. Effective Arabic-English cross-language information retrieval via machine-readable dictionaries and machine translation. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9-13, 2001,  New Orleans, LA USA. ACM 2001. pp. 295 - 302
  7. M. Aljlayl, O. Frieder, and D. Grossman, (2002) On Arabic-English Cross-Language Information Retrieval: A Machine Translation Approach. IEEE Third Int'l Conf. on Information Technology: Coding and Computing (ITCC), Las Vegas, Nevada, April 2002
  8. Attardi, G., S. Di Marco and F. Sebastiani. 1998. Automated Generation of Category-Specific Thesauri for Interactive Query Expansion. In Joseph Fong (ed.), Proceedings of IDC'99, 9th International Database Conference on Heterogeneous and Internet Databases, Hong Kong, CN, 1999, pp. 429-432.
  9. Grefenstette, G. 1992. Use of Syntactic Context to Produce Term Association Lists for Text Retrieval . In Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, ed. N. Belkin, P. Ingwersen and A. M. Pesjtersen: pp. 89-97. New York: ACM Press.
  10. Government Printing Office. Ide, E. (1971). New experiments in relevance feedback. In G. Salton (Ed.), The Smart System-- experments in automatic document processing (pp. 337-354). Englewood Cliffs, NJ: Prentice-Hall, Inc.
  11. Qiu, Y., 1993. Concept Based Query Expansion. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval.
  12. Schtze, H. and J. Pederson. 1997. A Cooccurance-based Thesaurus and Two Applications to Information Retrieval. Information Processing and Management 33, no. 3: pp. 307-318.
  13. Schtze, H; Pederson, J. O. (1995) Information Retrieval Based on Word Senses. Proceedings of the Symposium on Document Analysis and Information Retrieval 4 pp. 161 -175.
  14. Sanderson, M. (1994) Word sense disambiguation and information retrieval . Proceedings of the 17th ACM SIGIR Conference, Pages 142-151.
  15. Sanderson, M. (2000) Retrieving with good sense in Information Retrieval Vol 2, No 1 pp. 49 69.
  16. Vooehees, E. M. (1993). Using WordNet to disambiguate word sense for text retrieval App.eared in proceedings of ACM SIGIR Conference (16): pp. 171-180.
  17. Uzuner, O. Katz, B. Yuret D. (1999) Word sense disambiguation for information retrieval . In Proceedings of the 1999 16th National Conference on Artificial Intelligence (AAAI-99).
  18. Yarowsky D. (1995) Unsupervised Word Sense Disambiguation Rivaling Supervised Methods . Proceedings of ACL 95.
  19. Yarowsky D. (1992) Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora . Proceedings of COLING-92.
  20. Krovetz, R; Croft, W. B. 1992. Lexical Ambiguity and Information Retrieval in ACM Transactions on Information Retrieval Vol 10 Issue 1
  21. Christopher M. Stokoe and John Tait (2002) Automated Word Sense Disambiguation for Internet Information Retrieval Proceedings of Text Retrieval Conference (TREC 2002), National Institute of Standards and Technology, Gaithersburg, MD, USA pp 743-745. November 2002.
  22. David Walker, 2001. Query Expansion using Thesauri:Previous Approaches and Possible New Directions IS-242: Information Retrieval Systems University of California, Los Angeles June 12, 2001
  23. Y. Qiu. Automatic Query Expansion Based on A Similarity Thesaurus. PhD Thesis, Swiss Federal Institute of Technology (ETH), 1995.
  24. Imai, Hisao, Nigel Collier and Jun'ichi Tsujii. (1999). A Combined Query Expansion Approach for Information Retrieval. In the Proceedings of Genome Informatics. Tokyo, Japan. Universal Academy Press Inc.
  25. Jinxi Xu, W. Bruce Croft (2000) Improving the Effectiveness of Informational Retrieval with Local Context Analysis ACM Transactions on Information Systems  
  26. William Hersh, M.D., Susan Price, M.D., Larry Donohoe, M.L.I.S. Assessing Thesaurus-Based Query Expansion Using the UMLS Metathesaurus Division of Medical Informatics  Outcomes Research Oregon Health Sciences University Portland, Oregon, USA 
  27. Stefan Klink, Armin Hust, Markus Junker, Andreas Dengel. 2002. Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts Proceedings of DAS 2002, 5th International Workshop on Document Analysis Systems
  28. Qiu, Y. and Frei, H.-P. (1994). Improving the retrieval effectiveness by a similarity thesaurus. Technical Report 225, ETH Zurich, Department of Computer Science.
  29. Ballesteros, L., and Croft, B. Phrasal Translation and Query Expansion Techniques for Cross-language Information Retrieval. SIGIR 1997, 84-91.
  30. Xu, J. and Croft, W. B. Query Expansion using Local Global Document Analysis . The 19th Annual International ACM SIGIR 1996, Zurich, Switzerland, Pages 4-11.
  31. Angel F. Zazo, Carlos G. Figuerola, Jos Luis A. Berrocal and Emilio Rodr?guez, 2002. Term Expansion using Stemming and Thesauri in Spanish .  CLEF 2002 Workshop 19-20 September, Rome, Italy.
  32. Fatiha Sadat, Masatoshi Yoshikawa, and Shunsuke Uemura. 2002. The Role of Query Expansion Techniques in French-English Information Retrieval. Journes Science and Technology Workshop 2002 (JST2002), November 17-19, 2002.
  33. Larkey, Leah S., Ballesteros, Lisa, and Connell, Margaret. (2002) Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis In Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, Finland, August 11-15, 2002, pp. 275-282.
  34. Larkey, Leah S. and Connell, Margaret, (2002) Arabic Information Retrieval at UMass in TREC-10 In Voorhees, E.M. & Harman, D.K. (Eds.) The Tenth Text Retrieval Conference, TREC 2001 NIST Special Publication 500-250, pp. 562-570.
  35. Hideo Joho, Claire Coverson, Mark Sanderson, Micheline Hancock-Beaulieu (2002) Hierarchical presentation of expansion terms . SAC 2002: 645-649
  36. Aitao Chen and Fredic Gey. Building an Arabic Stemmer for Information Retrieval . In: Proceedings of the Eleventh Text REtrieval Conference (TREC 2002). National Institute of Standards and Technology, Nov 18-22, 2002.
  37. Leah S. Larkey, James Allan, Margaret E. Connell, Alvaro Bolivar, Courtney Wade (2002) UMass at TREC 2002: Cross Language and Novelty Tracks
  38. Marco De Boni (2001) Word Sense Disambiguation for Information Retrieval http://www-users.cs.york.ac.uk/~mdeboni/research/wn_disambiguation.html Date Feb 12,04
  39. M. Shamim Khan, Sebastian Khor (2004) Enhanced Web document retrieval using automatic query expansion . Journal Of The American Society For Information Science And Technology, Vol. 55 Number 1 pp. 2940, 2004
  40. Ahmed Abdelali. (2004). Localization in Modern Standard Arabic . Journal Of The American Society For Information Science And Technology, Vol. 55 Number 1 pp. 2328, 2004
  41. Yaser Al-Onaizan. Kevin Knight (2002). Translating Named Entities Using Monolingual and Bilingual Resources .  Proceedings of the 40th Annual Meeting of the Association Computational Linguistics (ACL), Philadelphia, July 2002, pp. 400-408.
  42. G?khan Tr , Dilek Hakkani-Tr, Kemal Oflazer, Name Tagging Using Lexical, Contextual and Morphological Information , In Proceedings of the Workshop on Information Extraction Meets Corpus Linguistics at Second International Conference on Language Resources and Evaluation (LREC 2000), May 2000, Athens, Greece.
  43. Jinxi Xu, Alexander Fraser, Ralph M. Weischedel (2001) TREC 2001 Cross-lingual Retrieval at BBN NIST TREC 2001 Proceedings pp. 68-77
  44. Kevin P. Scannell (2003) Automatic thesaurus generation for minority languages: an Irish example , san Actes de la 10e confrence TALN Batz-sur-Mer du 11 au 14 Juin 2003. pp. 203-212.
  45. Donald C. Comeau, W. John Wilbur Non-word identification or spell checking without a dictionary Journal Of The American Society For Information Science And Technology, Vol. 55 Number 2 pp.169-177. 2004
  46. Dror Kamir, Naama Soreq, Yoni Neeman A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew ACL-2002 Workshop on Computational Approaches to Semitic Languages. University of Pennsylvania, Thursday 11 July, 2002.  pp. 58-66.
  47. Yaser Al-Onaizan, Kevin Knight Machine Transliteration of Names in Arabic Texts  ACL-2002 Workshop on Computational Approaches to Semitic Languages. University of Pennsylvania, Thursday 11 July, 2002.  pp. 9-21.
  48. Amati, G., Carpineto, C., and Romano, G.: FUB at TREC-10 Web Track: a probabilistic framework for topic relevance term weighting . Proceedings of the Tenth Text REtrieval Conference (TREC-10), NIST Special Publication 500-250, pag. 182-191.
  49. Belkin N. J., and W. B. Croft (1987), " Retrieval Techniques ," in Annual Review of Information Science and Technology, ed. M. Williams. New York : Elsevier Science Publishers, pp. 109-145.
  50. Information Retrieval and Information Filtering (IRIF), Spring 1996: Introduction to Course  http://www.ida.liu.se/labs/iislab/courses/IRIF/IRIF_introduktion.html  Introductory notes for course, by Juha Takkinen ( juhta@ida.liu.se).
  51. Chia-Hui Chang, Ching-Chi Hsu. Integrating Query Expansion and Conceptual Relevance Feedback for Personalized Web Information Retrieval . Computer Networks 30(1-7): 621-623 (1998)
  52. CLSP NSF 2002 Workshop Novel Speech Recognition Models for Arabic Final Report Final Presentation Opening Day
  53. Martin Braschler, B?rbel Ripplinger How Effective is Stemming and Decompounding for German Text Retrieval?  Information Retrieval, 7, 291316, 2004
  54. Kevin Daimi, (2001) Identifying Syntactic Ambiguities in Single-Parse Arabic Sentence Journal of Computers and the Humanities 35: 333349.
  55.  http://www.darislam.com/home/alfekr/data/feker3/10.htm visited on Aug. 18, 2004
  56. Oard, D.W. Gey, F.C. (2002)  The TREC 2002 Arabic/English CLIR Track , NIST  TREC 2002 Proceedings. page 16-26. 
  57. AbdulJaleel, N. Corrada-Emmanuel, A. Li, Q. Liu, X. Wade, C. and Allan, J (2003) UMASS at TREC 2003: HARD and QA . In Proceedings of the Twelfth Text Retrieval Conference (TREC 2003). NIST, 2003. 
  58. Black, W. J. and  El-Kateb, S. (2004) A Prototype English-Arabic Dictionary Based onWordNet . In Sojka et al. Proceedings of the Second International WordNet Conference   GWC 2004, Brno, Czech Republic, January 20 --23, 2004. pages 67-74. 
  59. De Roeck, A.N., & Al-Fares, W. (2000). A morphologically sensitive clustering algorithm for identifying Arabic roots. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, China.
  60. Mustafa H. S., Al-Radaideh Q. A. (2004) Using N-Grams for Arabic Text Searching. Journal of the American Society for Information Science and Technology. 55(11), p 1002-1007.
  61. Sanderson, M.(1997) Word Sense Disambiguation and Information Retrieval . PhD Thesis. Technical Report (TR-1997-7) of the Department of Computing Science at the University of Glasgow, Glasgow G12 8QQ, UK, 1997. 
  62. Hmeidi, I. , Kanaan, G. and M. Evens (1997) Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents . Journal of the American Society for Information Science, 48/10, pp. 867-881.
  63. Al-Sughaiyer I. , Al-Kharashi I. (2004) Arabic morphological analysis techniques: A comprehensive survey . Journal of the American Society for Information Science and Technology. 55(3), pp 189-213.
  64. Abu-Salem, H., Al-Omari, M., & Evans, M. E. (1999). Stemming methodologies over individual query words for an Arabic information retrieval system . Journal of the American Society for Information Science, 50(6), pp 524-529.
  65. Al-Kharashi, I. A. and Evans, M. W. (1994) Comparing words, stems, and roots as index terms in an Arabic information retrieval system. Journal of the American Society for Information Science (JASIS) 45(8), pp 548-560.
  66. Darwish K, Doermann D, Jones R, Oard D & Rautiainen M (2001) TREC-10 experiments at University of Maryland CLIR and video. Proc. Text RE-trieval Conference TREC10, Gaithersburg , MD , pp 549-562.
  67. Suleiman H. Mustafa A morphology-driven string matching approach to Arabic text searching Journal of Systems and Software, Volume 67, Issue 2, 15 August 2003 , Pages 77-87.
  68. Braschler, M., Peters C. Cross-Language Evaluation Forum: Objectives, Results, Achievements. Information Retrieval, 7, 731, April 2004  Special Issue: Special Issue on CLEF. pp. 7-31
  69. Bertoldi, N., Federico, M. Statistical Models for Monolingual and Bilingual Information Retrieval. Information Retrieval, 7, 731, April 2004  Special Issue: Special Issue on CLEF. pp. 53-72
  70. Abu-Salem, H. (1992) A Microcomputer Based Arabic Bibliographic Information Retrieval System with Relational Thesauri (Arabic IRS). Computer Science Department Ph.D.  Dissertation. Imprint: Illinois Institute of Technology , Chicago , IL : Illinois Institute of Technology, Chicago , IL
  71. Alshenifey, Mohammed. (1998) Lexical-semantic relations in Arabic text. Computer Science Department Ph.D.  Dissertation Imprint: Illinois Institute of Technology , Chicago , IL : Illinois Institute of Technology, Chicago, IL.
  72. Mohammed Aljlayl, Ophir Frieder, (2002) On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemmer Approach , ACM International Conference on Information and Knowledge Management, November 2002.
  73. Hiemstra, Djoerd. (2001) Using Language Models for Information Retrieval. Ph.D. Thesis, Centre for Telematics and Information Technology, University of Twente, January 2001, ISSN 1381-3617 (no. 01-32), ISBN 90-75296-05-3
  74. Imad A. Al-Sughaiyer, Ibrahim A. Al-Kharashi (2004) Arabic Morphological Analysis Techniques A Comprehensive Survey. Journal of the American Society for Information Science and Technology archive Volume 55 , Issue 3 (February 2004) pp. 189 213 ISSN:1532-2882.
  75. (1981) . 2. 1981. 
  76. Kim, S.-B., Seo, H.-C. and Rim, H.-C., Information retrieval using word senses: root sense tagging approach. SIGIR 2004, Shefield, UK, July 2004.
  77. Salminen, M. (2004) Word senses in information retrieval. Research Seminar on Intelligent Systems, University of Helsinki, Autumn 2004.
  78. Azzopardi, L., Girolami, M and van Rijsbergen, C.J. (2003) Investigating the Relationship between Language Model Perplexity and IR Precision-Recall Measures. In the Proceedings of the 26th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR, Toronto, Canada.
  79. Qujiang Peng, Takeshi Ito, Teiji Furugori. (2001) Word Sense Disambiguation with a Corpus-Based Semantic Network. NLPRS 2001: 75-82
  80. Jong-Hoon Oh, Key-Sun Choi (2002) Word Sense Disambiguation using Static and Dynamic Sense Vectors. COLING 2002
  81. Keh-Jiann, Chen. Jia-Ming, You. (2002) A Study on Word Similarity using Context Vector Models. Computational Linguistics and Chinese Language Processing. Vol. 7 , No. 2, August 2002, pp. 37-58
  82. Filip Ginter. Jorma Boberg. Jouni Jarvinen. Tapio Salakoski. (2004) New Techniques for Disambiguation in Natural Language and Their Application to Biological Text. Journal of Machine Learning Research M. 5 2004, pp. 605-621.
  83. Lin, J. and Gunopulos, D. (2003) Dimensionality Reduction by Random Projection and Latent Semantic Indexing . In proceedings of the Text Mining Workshop, at the 3rd SIAM International Conference on Data Mining. San Francisco, CA. May 1-3, 2003.
  84. Zobel, J. (1998) How Reliable Are the Results of Large-Scale Information Retrieval Experiments? SIGIR 1998: 307-314.
  85. Nigam, K. Lafferty, J. and McCallum. A. (1999) Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 6167, Stockholm, Sweden, August 1999.
  86. Sawaf, H. Zaplo, J. and Ney, H. (2001) Statistical Classification Methods for Arabic News Articles. Arabic Natural Language Processing Workshop, ACL'2001. Toulouse, France, July 2001.
  87. Kurland, O. and Lee, L. (2004) Corpus structure, language models, and ad hoc information retrieval. Proceedings of SIGIR, pp. 194-201, 2004.
  88. Della Mea, V. and Mizzaro S. (2004) Measuring retrieval effectiveness: A new proposal and a first experimental validation. Journal of the American Society for Information Science and Technology 55(6). pp.530-543.
  89. Voorhees, E. (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, Volume 36(5). pp. 697-716.
  90. Clarkson, P.R. and Rosenfeld, R. (1997) Statistical Language Modeling Using the CMU-Cambridge Toolkit. Proceedings ESCA Eurospeech 1997
  91. De Marneffe, M. Dupont, P. (2004) Comparative study of statistical word sense discrimination techniques. JADT 2004: 7th International Conference on the Statistical Analysis of Textual Data. (ppt)
  92. Widdows, D. (2003) A Mathematical Model for Context and Word-Meaning. The Fourth International and Interdisciplinary Conference on Modeling and Using Context, Stanford, California, June 2003, pages 369382
  93. Schutze H. (1998) Automatic Word Sense Discrimination. Computational Linguistics 24(1). pp. 97-133.
  94. Li, Z. Lu, X. and Shi W. (2003) Process Variation Dimension Reduction Based on SVD ISCAS 2003
  95. Sarwar, B. Karypis, G. Konstan, J. and Riedl, J. (2000) Application of Dimensionality Reduction in Recommender System - A Case Study. Technical Report CS-TR 00-043, Computer Science and Engineering Dept., University of Minnesota, July 2000.
  96. Gale, W. Church, K. Yarowsky, D. (1992) Discrimination Decisions for 100,000-Dimensional Spaces. AT&T Statistiacal Research Report No. 103.
  97. Ney, H. (1997) Corpus-Based Statistical Methods in Speech and Language Processing. In Corpus-Based Methods in Language and Speech Processing. edited by Steve Young and Gerrit Bloothooft. pp. 1-26. ISBN 0-7923-4463-4
  98. Cronen-Townsend, S. Zhou, Y. and Croft, W.B. (2002) Predicting Query Performance, in the Proceedings of ACM SIGIR 2002. pp. 299-306, 2002.
  99. Furnas G. W., Landauer T. K., Gomez L. M., Dumais S. T. (1987) The Vocabulary Problem in Human-System Communication: an Analysis and a Solution ; Bell Communications Research 1987.
  100. Berglund, Y. (2000) Utilising present-day English corpora: a case study concerning expressions of future . ICAME Journal 24, 25-64.
  101. Goweder, A. (2004) The role of stemming in IR: the case of Arabic. PhD dissertation, University of Essex.
  102. Khoja, Shereen (2003) An Automatic Arabic Part-of-Speech Tagger. PhD Thesis, Lancaster University, November 2003.
  103. Al-Fares, Waleed. (2001) Arabic root-based clustering: An algorithm for identifying roots based on n-grams and morphological similarity. Ph.D. dissertation. September 2001.
  104. Ponte, Jay. (1998) A Language Modeling Approach to Information Retrieval. Ph.D. dissertation UMass Amherst 1998.
  105. Ponte, J. M. and Croft, W. B. (1998) A language modeling approach to information retrieval system. in Proc. ACM. SIGIR 98, New York, 1998, pp. 275281.
  106. Berger, A. and Lafferty, J. (1999) Information retrieval as statistical translation. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR)
  107. Berger, A. (1999) Information Retrieval and Information Theory. Thesis proposal. School of Computer Science, Carnegie Mellon University.
  108. Berger, A. (2001) Statistical Machine Learning for Information Retrieval. PhD Thesis. School of Computer Science, Carnegie Mellon University  
  109. Hideo, Fujii. (1997) An Investigation of the Linguistic Characteristics of Japanese Information Retrieval , Ph.D. Dissertation. UMass Amherst 1997.
  110. Rayson, P. (2003). Matrix: A statistical method and software tool for linguistic analysis through corpus comparison. Ph.D. thesis, Lancaster University.
  111. Dunning, Ted. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, Volume 19, number 1, pp. 61-74.
  112. Mona T. Diab (2003) Word Sense Disambiguation within a Multilingual Framework. Thesis. Linguistics Department. University of Maryland, College Park.
  113. Van Mol, Mark (2000). Exploring annotated Arabic corpora, preliminary results, in Corpora and Natural Language Processing, proceedings of the International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications, Monastir, pp. 94-98.
  114. Mark Van Mol (2003) Variation in Modern Standard Arabic in Radio News Broadcasts A Synchronic Descriptive Investigation into the Use of Complementary Particles. Peeters Publishers, 2003. ISBN: 9042911581 Introduction.
  115. Mark Van Mol Summary of Project at the Institute for Modern Languages of the Katholieke Universiteit Leuven.
  116. Kyung-Soon Lee; Kyo Kageura; Key-Sun Choi (2002) Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval. COLING 2002: The 17th International Conference on Computational Linguistics. 2002.
  117. Stokoe C., Oakes M. P., Tait J. (2003) Word sense disambiguation in information retrieval revisited. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents. Toronto, Canada. pp. 159 - 166. ISBN:1-58113-646-3.
  118. Ballesteros, L. and Croft, W.B. Resolving ambiguity for cross-language retrieval. SIGIR '98, 64-71, 1998.
  119. Gonzalo, J. Penas A. and Verdejo F. (1999) Lexical Ambiguity and Information Retrieval Revisited. Proceedings of the 1999 Joint SIGDAT Conference on EMNLP and VLC, Maryland; pp.195-202.
  120. S-B. Kim, H-C. Seo, H-C. Rim (2004) Information retrieval using word senses: root sense tagging approach. SIGIR 2004.
  121. Adam Kilgarriff and Gregory Grefenstette. (2003) Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29(3) - Special Issue on the Web as Corpus - September 2003. pp. 333-348.
  122. Philip Resnik, Noah A. Smith. (2003) The Web as a Parallel Corpus. Computational Linguistics 29(3) - Special Issue on the Web as Corpus - September 2003. pp. 349-380.
  123. Celina Santamaria, Julio Gonzalo and Felisa Verdejo. (2003) Automatic Association of Web Directories to Word Senses. Computational Linguistics 29(3) - Special Issue on the Web as Corpus - September 2003. pp. 485-502.
  124. Darwish, K. (2002). Building a shallow Arabic morphological analyzer in one day. In Proceedings of the Association for Computational Linguistics (ACL-02), 40th Anniversary Meeting. pp. 47-54.
  125. Moukdad, H. and Large, A. (2001). Information retrieval from full-text Arabic databases: can search engines designed for English do the job? Libri 51:2:63-74.
  126. MacMullen W.J., (2003) Requirements Definition and Design Criteria for Test Corpora in Information Science, SILS Tech. Report 2003-03, School of Information and Library Science, Univ. of North Carolina at Chapel Hill, 2003.

  127. Voorhees, Ellen M., (2003) Overview of TREC 2003. TREC 2003: 1-13

  128. Cronen-Townsend, S., Zhou, Y., and Croft, W.B., (2004) A Language Modeling Framework for Selective Query Expansion. CIIR Technical Report. IR-338.

  129. Cronen-Townsend, S., Zhou, Y., and Croft, W.B., (2004) A Framework for Selective Query Expansion. a poster presentation, in the Proceedings of CIKM'04, pp.236-237

  130. Kekalainen, J. Jarvelin, K. (1998) The impact of query structure and query expansion on retrieval performance. Proceedings of the 21st annual international ACM SIGIR. pp. 130 - 137

  131. Alemayehu, N. (2003) Analysis of performance variation using query expansion. JASIST. Volume 54, Issue 5 , Pages 379 - 391

  132. Moukdad, H. (2004) Lost in Cyberspace: How Do Search Engines Handle Arabic Queries? Annual Conference of the Canadian Association for Information Science. Canada, June 3 - 5, 2004

  133. Stenmark, D. (2005) Query Expansion on a Corporate Intranet: Using LSI to Increase Precision in Explorative Search. Proceedings of the 38th Hawaii International Conference on System Sciences - 2005.

  134. Mitra, M., Singhal A., and Buckley. C., (1998) Improving automatic query expansion. In ACM SIGIR 98, Melbourne Australia, 1998. pp. 206-214

  135. Mano H. and Ogawa, Y., (2001) Selecting Expansion Terms in Automatic Query Expansion. in Proceedings of SIGIR 01, ACM Press, New Orleans, LA, 2001, pp. 390-391.

  136. Sahami, M. 1998. Using Machine Learning to Improve Information Access. PhD Thesis, Stanford University, Computer Science Department. STAN-CS-TR-98-1615.

  137. Spink, A., Wolfram, D., Jansen, B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for In-formation Science, 53(2), 226-234. full text

  138. Susanne M. Humphrey, Willie J. Rogers, Halil Kilicoglu, Dina Demner-Fushman, Thomas C. Rindflesch (2006) Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment. Journal of the American Society for In-formation Science, 57(1), 96-113.

  139. Suleiman H. Mustafa (2005) Word-oriented approximate string matching using occurrence heuristic tables: A heuristic for searching Arabic text. Journal of the American Society for In-formation Science, 56(14), 1504-1511.

  140. Chau, M., Fang, X., Liu Sheng R. O., (2005) Analysis of the query logs of a Web site search engine. Journal of the American Society for In-formation Science, 56(13), 1363-1376.

  141. Monika Henzinger, Bay-Wei Chang, Brian Milch, and Sergey Bri (2003) Query-Free News Search. Proc. 12th World Wide Web Conference, 2003, pages 1-10.

  142. Agichtein, E., Cucerzan, S., (2005) Predicting accuracy of extracting information from unstructured text collections. CIKM 2005: 413-420

  143. Gao, J., Nie, J., Wu, G., and Cao, G. (2004) Dependence language model for information retrieval. In SIGIR-2004. Sheffield, UK, July 25-29, 2004.

  144. Srikanth, M. & Srihari, R. (2002). Biterm language models for document retrieval. In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, (pp. 425--426), Tampere, Finland.

  145. Srikanth M., Srihari, R., (2003) Incorporating Query Term Dependencies in Language Models for Document Retrieval. SIGIR03, July 28August 1, 2003, Toronto, Canada. 

  146. Zhai, C., and Lafferty, J., (2001) A study of smoothing methods for language models applied to Ad Hoc information retrieval. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, 2001.

  147. Heenan C. (2002) A Review of Academic Research on Information Retrieval. Engineering Informatics Group. Stanford University. August 6, 2002.

  148. April Kontostathis and William M. Pottenger (2006) A framework for understanding Latent Semantic Indexing (LSI) performance . Information Processing and Management 42 (2006) 5673.

  149. Bernard J. Jansen and Amanda Spink (2006) How are we searching the World Wide Web? A comparison of nine search engine transaction logs . Information Processing and Management 42 (2006) 248263

  150. Lisa A. Ballesteros. Resolving Ambiguity for Cross-language Retrieval: A Dictionary Approach. Dissertation, Department of Computer Science, University of Massachusetts at Amherst, September 2001. Advisor: W. Bruce Croft.

  151. Jia-Long Wu (2005) Unified Language System for Engineering Design (ULSED): A Framework and Automation Tools for Better Design Information Retrieval. PhD Dissertation. Mechanical Engineering Dept. University of California, Berkeley. 2005.

  152. D. Ravishankar, K. Thirunarayan, and T. Immaneni. (2005) A Modular Approach to Document Indexing and Semantic Search, In: Proceedings of the IASTED International Conference on Web Technologies, Applications and Services, pp. 165 -170,  July 2005.

  153. Turney, Peter (2001) Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In De Raedt, Luc and Flach, Peter, Eds. Proceedings Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), pages pp. 491-502, Freiburg, Germany.

  154. Yan, X., Li, X. and Song, D. (2004) A Correlation Analysis on LSA and HAL Semantic Space Models. In Proceedings of International Symposium on Computational and Information Sciences (CIS'2004), LNCS 3314, pp 710-717.

  155. Bai, J., Song, D., Bruza, P.D., Nie, J.Y., Cao, G. (2005) Query Expansion Using Term Relationships in Language Models for Information Retrieval. In Proceedings of The 14th International ACM Conference on Information and Knowledge Management (CIKM 2005), pp. 688-695.

  156. Rifat, Ozcan, Aslandogan, Y. Alp (2005) Concept-Based Information Access. ITCC (1) 2005: 794-799.

  157. Y. Jing and W. B. Croft. An association thesaurus for information retrieval. In RIAO 94 Conference Proceedings, pages 146--160, New York, October 1994.

  158. Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y., Query Expansion by Mining User Logs, IEEE Transaction on Knowledge and Data Engineering, Vol. 15, No. 4, pp. 829-839.

  159. B. Schiffman and K. R. McKeown. Experiments in automated lexicon building for text searching. In COLING-2000, 2000.

  160. F.A. Grootjen and Th.P. van der Weide. Conceptual Query Expansion. Technical Report NIII-R0406, Nijmegen Institute for Information and Computing Sciences, University of Nijmegen, Nijmegen, The Netherlands, EU, 2004.

  161. Terra, E., Clarke C. (2005) Scoring missing terms in information retrieval tasks. CIKM 2004. pp. 50-58

  162. Gordon M., & Pathak P. (1999). Finding information on the World Wide Web: The retrieval effectiveness of search engines. Information Processing & Management, 35(2), 141-180.

  163. Savoy, J. (2001) Information Retrieval on the Web. From www.unine.ch/info/Gi/Papers/SI.pdf

  164. Spink, A. (2002). A user centered approach to evaluating human interaction with Web search engines: an exploratory study. Information Processing & Management, 38(3), 410-426.

  165. Siatri, R. (1998) Information seeking in electronic environment: a comparative investigation among computer scientists in British and Greek Universities. Information Research, Volume 4 No. 2 October 1998

  166. Carpineto, C., De Mori, R., Romano, G., Bigi, B. (2001) An Information-Theoretic Approach to Automatic Query Expansion. ACM Transactions on Information Systems, Vol. 19, No. 1, January 2001, Pages 127.

  167. Sparck Jones, K., Walker, S., Robertson. E. S., (2000) A probabilistic model of information retrieval: development and comparative experiments - Part 1. Inf. Process. Manage. 36(6): 779-808

  168. Sparck Jones, K., Walker, S., Robertson. E. S., (2000) A probabilistic model of information retrieval: development and comparative experiments - Part 2. Inf. Process. Manage. 36(6): 809-840

  169. Gale, W., Church, K., and Yarowsky, D. (1992) One sense per discourse. In Proc. of the DARPA Speech and Natural Language Workshop.

  170. McNamee, P. and Mayfield, J. (2004) Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval, 7:73-97.

  171. Hull. D. A. (1995) Information Retrieval using Statistical Classification. PhD thesis, Stanford University, 1995.

  172. Schatz, B., Johnson, E., and Cochrane, P. (1996) Interactive term suggestion for users of digital libraries: Using subject thesauri and co-occurrence lists for information retrieval. In Proceedings of the First ACM Conference on Digital Libraries, pages 126--133, Bethesda, Maryland.

  173. Sebastiani, F. (2001) Interactive Query Expansion with Automatically Generated Category-Specific Thesauri. in Amita G. Chin (ed.), Text Databases and Document Management: Theory and Practice. Idea Group Publishing. pp.103-117.

  174. Nelson, G. (2006) The core and periphery of world Englishes: a corpus-based exploration. World Englishes, Vol. 25, No. 1, pp. 115129, 2006.

  175. Billerbeck, B., Zobel, J. (2004) Questioning Query Expansion: An Examination of Behaviour and Parameters. ADC 2004: 69-76.

  176. Eric C. Jensen, C. E. (2006) Repeatable Evaluation of Information Retrieval Effectiveness in Dynamic Environments. PhD Thesis. Illinois Institute of Technology.

  177. Beitzel, M. S., Jensen, C. E., Chowdhury, A., and Grossman, D., (2004) Hourly Analysis of a Very Large Topically Categorized Web Query Log, In Proceedings of the 2004 ACM Conference on Research and Development in Information Retrieval (SIGIR-2004), Sheffield, UK, July 2004.

  178. Xie, Y., OHallaron, D. Locality in Search Engine Queries and Its Implications for Caching. Infocom 2002.

  179. Spink, A., Ozmutlu, S., Ozmutlu, H.C., and Jansen, B.J. U.S. versus European web searching trends. SIGIR Forum 36(2), 32-38, 2002.

  180. Zhang, Z., and Nasraoui, O. (2006) Mining Search Engine Query Logs for Query Recommendation. WWW 2006, May 2226, 2006, Edinburgh, Scotland.

  181. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 34-46.

  182. Schamber, L. (1994) Relevance and Information Behavior. Annual Review of Information Science and Technology. 29:3-48.

  183. Cormack, G., Palmer, C., Clarke, C. (1998) Efficient construction of Large Test Collections, in the proceedings of 21st. ACM SIGIR Conference, pp. 282-289.

  184. Arshad, A. (2004) Beyond Concordance Lines: Using Concordances to Investigating Language Development. Internet Journal of e-Language Learning & Teaching, 1(1), January 2004, 43-51.

  185. Kilgariff, A., and Rose, T. (1998) Measures for Corpus Similarity and Homogeneity. Proc. 3rd Int. Conf. Empirical Methods in NLP. pp 46-52. Granada.

  186. Clarkson, P., Robinson T., (1999) Towards improved language model evaluation measures. In: Proc. Eurospeech, p. 2707.

  187. Azzopardi, L., Girolami, M and van Rijsbergen, C.J. (2004) Topic Based Language Models for ad hoc Information Retrieval. In the Proceedings of the International Joint Conference on Neural Networks, Budapest,Hungary.

  188. Tao, T., Wang, X., Mei, Q., amd Zhai, C., (2006) Language Model Information Retrieval with Document Expansion. Proceedings of HLT/NAACL.

  189. Yoshioka, M., and Haraguchi, M. (2005) On a Combination of Probabilistic and Boolean IR Models for WWW Document Retrieval. ACM Transactions on Asian Language Information Processing, Vol. 4, No. 3, September 2005, Pages 340356.

  190. Lee, Y., Papineni, K., Roukos, S., Emam, O., and Hassan, H. 2003. Language model based arabic word segmentation. In Proceedings of the 41st Annual Meeting on Association For Computational Linguistics - Volume 1 (Sapporo, Japan, July 07 - 12, 2003). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, pp. 399-406.

  191. Zitouni, I., Sorensen, J., Luo, X., and Florian, R. (2005) The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution. Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages. June 2005. Ann Arbor, Michigan. ACL. pp. 63--70.

Improving IR/Clustering

  1. Liu, X. and Croft, W. B. 2004. Cluster-based retrieval using language models. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Sheffield, United Kingdom, July 25 - 29, 2004). SIGIR '04. ACM Press, New York, NY, 186-193. DOI= http://doi.acm.org/10.1145/1008992.1009026
  2. Kurland. O and Lee, L. 2004. Corpus structure, language models, and ad hoc information retrieval. Proceedings of SIGIR, 2004. pp. 194-201.
  3. Xu, J. and Croft, W. B. 1999. Cluster-Based Language Models for Distributed Retrieval. Research and Development in Information Retrieval. 1999. pp.254-261.


  1. TREC Data - Non-English Test Questions (Topics)
  2. Parallel Text English-Arabic (July 2002), 2190 URL pairs


  1. Genzel D. and Charniak, E. 2002. Entropy rate constancy in text. In Proceedings of ACL2002, Philadelphia. pp. 199-206.
  2. Genzel D. and Charniak, E. 2003 Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number. Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp. 65-72.
  3. Goodman, J. T. 2001. A bit of progress in language modeling. Computer Speech and Language, 15:403434.
  4. Leech G. 1992. 100 million words of English: the British National Corpus. Language Research, 28(1):113.
  5. Marcus, M. P.,  Santorini, B., and Marcinkiewicz, M. A. 1993. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics, 19:313330.
  6. Plotkin, J. B. and Nowak, M. A. 2000. Language evolution and information theory. Journal of Theoretical Biology, pages 147159.
  7. Brown, Peter F., Della Pieta, Vincent J., DeSouza, Peter V., and Lai, Jenifer C. (1992) Class-based n-gram Models of Natural Language. Computational Linguistics, vol. 18, no. 4, pp. 467--479.
  8. Rosenfeld, R. (1996) A maximum entropy approach to adaptive statistical language modeling, Computer, Speech and Language, vol. 10, pp. 187-228.
  9. Sethy, A., Georgiou, P., and Narayanan. S., (2005) Building topic specific language models from webdata using competitive models. In Proc. of EUROSPEECH, Interspeech, Lisbon, Portugal.
  10. Galescu, L., Ringger, E.K., and Allen. J., (1998) Rapid Language Model Development for New Task Domains. in Proceedings of the ELRA First International Conference on Language Resources and Evaluation (LREC). Granada, Spain.
  11. Rosenfeld, R. (2000) Two decades of statistical language modeling: where do we go from here?. Proc. IEEE, Vol 88, No 8, pp. 1270-1278.
  12. Gao, J., and Lin, C. (2004) Introduction to the special issue on statistical language modeling. ACM Transactions on Asian Language Information Processing (TALIP). 3(2). June 2004. pp. 87-93.
  13. Lavrenko, V., Croft, W. B., (2001) Relevance-Based Language Models. SIGIR 2001. pp.120-127.
  14. Sethy, A., Ramabhadran, B., and Narayanan, S., (2004) Measuring convergence in language model estimation using relative entropy. In Proceedings of ICSLP , Jeju, Korea, October 2004. 

Other References

  1. Improving IR Query Expansion and more.
  2. Abdelali A., Cowie J., Soliman H. Building A Modern Standard Arabic Corpus . Paper to be published.
  3. Goweder, A. and De Roeck, A. Assessment of a significant Arabic corpus . Presented at the Arabic NLP Workshop at ACL/EACL 2001, Toulouse, France, 2001. 
  4. Sarkar, A. De Roeck, A. Garthwaite, P. ( 16 February 2004) Easy measures for evaluating non-English corpora for language engineering. Some lessons from Arabic and Bengali . Technical Report Number 2004/05 Department of Computing, The Open University Walton Hall, Milton KeynesMK7 6AA United Kingdom. 2004.
  5. Joachim Griesbaum. (2004) Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de Information Research Volume 9 No 4 July 2004 ISSN 1368-1613
  6. Fletcher, William H. (2004) Making the Web More Useful as a Source for Linguistic Corpora . Language and Computers, 2 October 2004, vol. 52, no. 1, pp. 191-205(15)
  7. Mohamed Maamouri and Ann Bies (2004) Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools . Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, COLING 2004, Geneva, August 28, 2004.
  8. Mohamed Maamouri (1998) Arabic Diglossia and its Impact on the Quality of Education in the Arab Region HUMAN DEVELOPMENT: MOVING FORWARD WORKSHOP. Mediterranean Development Forum. Marrakech, Morocco. September 3 - 6, 1998
  9. Cognitive models based on Latent Semantic Analysis a tutorial
  10. Quesada, J. Creating your own LSA space In T. Landauer, D McNamara, S. Dennis & W. Kintsch (Eds) Latent Semantic Analysis: A road to meaning
  11. Diab, Mona. The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet. Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo 2004.
  12. Bente Maegaard. (2004) The NEMLAR project on Arabic language resources 9th EAMT Workshop, "Broadening horizons of machine translation and its applications", 26-27 April 2004, Malta; pp.124-128.
  13. Wynne, M (editor). 2005. Developing Linguistic Corpora: a Guide to Good Practice. Oxford: Oxbow Books. Available online from http://ahds.ac.uk/linguistic-corpora/
  14. Chalhoub-Deville M, Wigglesworth G (2005) Rater judgment and English language speaking proficiency. World Englishes, Vol. 24, No. 3. pp. 383-391. Abstract
  15. Chrisanthi, A. (2005) Doing critical research in information systems: some further thoughts. Information Systems Journal, Volume 15, Number 2, April 2005, pp. 103-109(7) 
  16. Martin Romacker , Udo Hahn, Context-Based Ambiguity Management for Natural Language Processing, Proceedings of the Third International and Interdisciplinary Conference on Modeling and Using Context, p.184-197, July 27-30, 2001 


  1. Kboubi, F., Chabi H. A., Ben Ahmed, M., (2005) Table Recognition Evaluation and Combination Methods. Eigth International Conference on Document Analysis and Recognition proceedings ICDAR 2005. August 31-Sept 1, 2005. Seoul, Korea. pp. 1237-1241


  1. Localization in MSA -Presentation-

Links to Arabic References

  1. Standard Classical Arabic Course Syllabus ( Ar En Gloss

MT related Material

  1. Soudi, A., Cavalli-Sforza, V., & Jamari, A., A Prototype English-to-Arabic Interlingua-based MT System. Proceedings of the Workshop on Arabic Language Resources and Evaluation - Status and Prospects, 3rd International Conference on Language Resources and Evaluation (LREC 2002), Jun 1, 2002, Las Palmas de Gran Canaria, Spain.
  2. Ulrich Germann, Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect? ACL 2001 Workshop on Data-Driven Machine Translation. Toulouse, France, July 7, 2001
  3. Kevin Knight, Automating Knowledge Acquisition for Machine Translation . AI Magazine 18(4): Winter 1997, pp. 81-96
  4. Schafer, C. & Yarowsky, D. (2003) A Two-Level Syntax-Based Approach to Arabic-English Statistical Machine Translation. MT Summit IX Workshop Machine Translation for Semitic Languages: Issues and Approaches. New Orleans , Louisiana
  5. Elizabeth D. Liddy. (????) Cross Language Information Exploitation of Arabic. Center for Natural Language Processing School of Information Studies Syracuse University
  6. Smith, Noah A. and Michael E. Jahr (2000) Cairo: An Alignment Visualization Tool. in the proceedings of The Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece.
  7. Jain V. Improved Metrics For Machine Translation Evaluation. Department of Computer & Information Science University of Pennsylvania
  8. Culy, C. and Riehemann, S. Z. (2003) The Limits of N-Gram Translation Evaluation Metrics, in Proceedings of MT Summit IX.
  9. Alon Lavie, Erik Peterson, Katharina Probst, Shuly Wintner and Yaniv Eytani. (2004) Rapid prototyping of a transfer-based Hebrew-to-English Machine Translation system. Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-04).
  10. Alon Lavie, Katharina Probst, Erik Peterson, Stephan Vogel, Lori Levin, Ariadna Font-Llitjos, and Jaime Carbonell. (2004) A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources. Proceedings of the 9th Workshop of the European Association for Machine Translation (EAMT-04).
  11. Morneau, Rick (2006) The Lexical Semantics of a Machine Translation Interlingua. Draft date: July 11, 2006
  12. Nabhan R. A., Rafea A. A., Shaalan K. F. (2005) Enhancing Phrase Extraction From Word Alignments Using Morphology. Fifth Conference on Language Engineering. Faculty of Engineering, Ain Shams University. Cairo 9/2005. pp. 57-65. 

Usefull Links for MT

  1. Bibliography for Statistical Alignment and Machine Translation

Contact Ahmed Abdelali

Ahmed Abdelali 1999-2012