Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Hash Tables
Lab Assignment 4
Hugo Serrano
June 7, 2013
1 Introduction
In this lab, you will have to implement a hash table and test how different
hash function techniques impact the performance of such data structure.
Topics covered in this activity:
• Hash tables;
• Linked lists;
• Hash functions;
• Collision Resolution with Chaining
1.1 The Problem
The Postal Service is developing a new technology that aims to inform its users
at the time of posting, the estimated time for delivery of a mail. This technology
will be embedded in mailboxes. When a mail is inserted into the mailbox,
a scanner reads the origin and destination on the envelope, and the system
calculates the total distance that the letter will travel.
1.2 The System
For this task, you will have to implement the following components of the sys-
tem:
Storage system - for performance, all the data must be stored in a hash
table.
Search engine - the search engine will receive as input a ZIP code (key) and
will retrieve from the hash table the latitude and longitude (value) of the
region corresponding to the ZIP code.
Distance calculator - based on the coordinates of two ZIP codes (origin and
destination), the calculator must provide the distance between the two
points in miles or kilometers.
1
1.3 The Dataset
The dataset you will use for this task is a CSV file (zipcodes.csv) with three
columns, namely zip, latitude and longitude like in the example below. 1.
00501,40.81,-73.04
00544,40.81,-73.04
00601,18.16,-66.72
00602,18.38,-67.18
00603,18.43,-67.15
00604,18.43,-67.15
00605,18.43,-67.15
00606,18.18,-66.98
00610,18.28,-67.14
2 The Task
Your code must provide the following functions:
1. Insert new entries in the table;
2. Get the latitude and longitude values from the hash table for a given ZIP
code.
3. Read the CSV file and insert all the entries into the hash table using
the ZIP codes as keys;
4. Keys are strings, thus they must be prehashed so they can be represented
as nonnegative integers. For prehashing, you can use the hashCode method
for strings;
5. Hash the (already prehashed) keys using either division or multiplication
method;
6. Collisions must be resolved by separate chaining;
7. Print out the load factor α;
8. Measure and plot the search times for 1000 to 30000 ZIP codes in incre-
ments of 1000.
9. Print out the average length of the linked lists
µl =
M∑
i=0
length(ti)
M
1The file is available on http://my.fit.edu/~hbarbosafilh2011/teaching/zipcodes.csv
2
The distance d must be calculated using the haversine formula.
d = R× 2 arcsin
(√
sin2
(
φ2 − φ1
2
)
+ cos(φ1) cos(φ2) sin
2
(
λ2 − λ1
2
))
where R is the radius of the Earth (3961 miles or 6373 km), φ1 and φ2 are
the latitudes of the points 1 and 2 respectively, and λ1 and λ2 are the longitudes
of points 1 and 2 respectively. 2
2.1 Extra marks
You will receive extra marks for the following functionalities:
• Calculate the standard deviation σl of linked lists’ length (5 points);
• Plot the histogram with the frequencies of the lists’ lengths using some
plotting library for Java (e.g. JFree Chart) (5 points) ;
3 Grading
You will be graded based on the following criteria:
1. Correct Implementation (5 points for each functionality in the tasks list).
2. Complete understanding of the data structures - both the hash table
and the linked list -, their operations and respective implementations (20
points);
3. Quality of the hash function based on its hashing uniformity (10 points);
4. Execution (10 points)
5. Originality (5 points)
6. Comments (5 points)
7. Additional functionalities (5 points)
Additionally, the best unique hash function will receive 5 more extra marks.
The function must show good uniformity and must be unique. If two or more
students end up with the same best function, the marks will go to the second
best, and so on.
2http://en.wikipedia.org/wiki/Haversine_formula
3