Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Homework 7
Due: Oct. 30, 2013 (before class)
October 21, 2013
Problem 1: Query Expansion (40pt)
In this problem, you are asked to implement query expansion using Lucene library. You will use the same document
collection and queries that were used in Homework 5, which can also be found at http://www.cse.msu.edu/
˜cse484/hw/hw4.zip. The assignment is comprised of two parts. In the first part, you are asked to implement a
simple algorithm of query expansion that expands a query with the most frequent words appearing in the top ranked
documents. In the second part, you are asked to identify the limitation of the approach presented in the first, and come
up with your solution to address the limitation.
Part I (20pt) In this phase, you are asked to implement a simple heuristic to expand the queries. To facilitate your
development, you are provided with a simple template Java code that can be downloaded from http://www.cse.
msu.edu/˜cse484/hw/hw7_code.zip. In the downloaded file, you will find two directories: IndexTREC and
BatchSearch. Files under IndexTREC will be used for document indexing and files under BatchSearch will be used
for query expansion. You need to create a java project for each directory, and compiled them in class files. Your
implementation of query expansion will go to the file BatchSearch.java under the directory BatchSearch, in which we
provide a brief instruction for implementing query expansion. You need to accomplish the following tasks in this part
of homework:
• Index the collection of documents using IndexTREC. Note that you should not re-use the index generated in the
previous assignments.
• For each query in in /query/query.sgml, to find the most related words, you will first retrieve the top 100
ranked documents, and count, for each non-query word, the number of top ranked document it appears. You
will then return the 20 most frequent non-query words appearing in the top 100 ranked documents.
• Submit your code and the expanded query words for each query. Note that all the expanded query words should
not appear in the original query. Discuss your observation of the expanded query words.
Part II (20pt) Based on your observation from Part I, devise a better strategy for query expansion that alleviates the
limitation of the approach presented in Part I. You need to submit (1) a short description for your strategy of query
expansion, (2) implementation of your algorithm, and (3) expanded query words that do not appear in the original
query.
1