CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 1
File Operations and Text Parsing in Java
This assignment involves implementing a smallish Java program that performs some basic file parsing and navigation tasks,
and parsing of character strings.
The program will deal with two input files. The first input file, whose name will be supplied from the command line, contains
a collection of data records pertaining to geographical features, obtained from the website for the USGS Board on Geographic
Names (www.usgs.gov/core-science-systems/ngp/board-on-geographic-names/download-gnis-data)[1]. The file begins with a
descriptive header line, followed by a sequence of GIS records, one per line, which contain the following fields in the
indicated order:
Figure 1: Geographic Data Record Format
Name Type
Length/
Decimals
Short Description
Feature ID Integer 10
Permanent, unique feature record identifier and official feature name Feature
Name
String 120
Feature
Class
String 50 See Figure 3 later in this specification
State Alpha String 2
The unique two letter alphabetic code and the unique two number code for a US State State
Numeric
String 2
County
Name
String 100
The name and unique three number code for a county or county equivalent
County
Numeric
String 3
Primary
Latitude
DMS
String 7
The official feature location
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Primary
Longitude
DMS
String 8
Primary
Latitude
DEC
Real Number 11/7
Primary
Longitude
DEC
Real Number 12/7
Source
Latitude
DMS
String 7
Source coordinates of linear feature only (Class = Stream, Valley, Arroyo)
DMS-degrees/minutes/seconds
DEC-decimal degrees.
Note: Records showing "Unknown" and zeros for the latitude and longitude DMS and
decimal fields, respectively, indicate that the coordinates of the feature are unknown.
They are recorded in the database as zeros to satisfy the format requirements of a
numerical data type. They are not errors and do not reference the actual geographic
coordinates at 0 latitude, 0 longitude.
Source
Longitude
DMS
String 8
Source
Latitude
DEC
Real Number 11/7
Source
Longitude
DEC
Real Number 12/7
Elevation
(meters)
Integer 5 Elevation in meters above (-below) sea level of the surface at the primary coordinates
Elevation Integer 6 Elevation in feet above (-below) sea level of the surface at the primary coordinates
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 2
(feet)
Map Name String 100 Name of USGS base series topographic map containing the primary coordinates.
Date
Created
String
The date the feature was initially committed to the database.
Date Edited String
The date any attribute of an existing feature was last edited.
Notes:
See https://geonames.usgs.gov/docs/metadata/gnis.xml for far more information than you need.
The type specifications used here have been modified from the source (URL above) to better reflect the realities of
your programming environment.
Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W) format, or DEC (real
number, -82.1417975) format. In DMS format, latitude will always be expressed using 6 digits followed by a single
character specifying the hemisphere, and longitude will always be expressed using 7 digits followed by a hemisphere
designator.
Although some fields are mandatory, some may be omitted altogether. Best practice is to treat every field as if it
may be left unspecified. Certain fields are necessary in order to index a record: the feature name and the primary
latitude and primary longitude. If a record omits any of those fields, you may discard the record, or index it as far as
possible.
In the GIS record file, each record will occur on a single line, and the fields will be separated by pipe ('|') symbols. Empty
fields will be indicated by a pair of pipe symbols with no characters between them. See the posted VA_Monterey.txt file
for many examples.
GIS record files are guaranteed to conform to the syntax described above, so there is no explicit requirement that you validate
the files. On the other hand, some error-checking during parsing may help you detect errors in your parsing logic.
A file can be thought of as a sequence of bytes, each at a unique offset from the beginning of the file, just like the cells of an
array. So, each GIS record begins at a unique offset from the beginning of the file.
File Formats and Line Termination
Each input and output file will be plain ASCII text.
Each line of a text file ends with a particular marker (known as the line terminator). In MS-DOS/Windows file systems, the
line terminator is a sequence of two ASCII characters (CR + LF, 0X0D0A). In Unix systems, the line terminator is a single
ASCII character (LF). Other systems may use other line termination conventions.
Why should you care? Which line termination is used has an effect on the file offsets for all but the first record in the data
file. As long as we’re all testing with files that use the same line termination, we should all get the same file offsets. But if
you change the file format (of the posted data files) to use different line termination, you will get different file offsets than are
shown in the posted log files. That would mean that a command script prepared for use with GIS record files in one format
will not work correctly with GIS record files in another format. Most good text editors will tell you what line termination is
used in an opened file, and also let you change the line termination scheme.
All that being said, as project is auto-graded, all input files will have the same line termination, and the grading of correctness
of your searches will depend on whether you report the correct search results, not on the file offsets you report. On the other
hand, if your indexing logic does not produce the correct offsets, you will definitely have difficulties with a future
assignment.
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 3
Figure 2: Sample Geographic Data
Records
Note that some record fields are optional, and
that when there is no given value for a field, there
are still delimiter symbols for it.
Also, some of the lines are "wrapped" to fit into
the text box; lines are never "wrapped" in the
actual data files.
F
E
A
T
U
R
E
_
I
D
|
F
E
A
T
U
R
E
_
N
A
M
E
|
F
E
A
T
U
R
E
_
C
L
A
S
S
|
S
T
A
T
E
_
A
L
P
H
A
|
S
T
A
T
E
_
N
U
M
E
R
I
C
|
C
O
U
N
T
Y
_
N
A
M
E
|
C
O
U
N
T
Y
_
N
U
M
E
R
I
C
|
P
R
I
M
A
R
Y
_
L
A
T
_
D
M
S
|
P
R
I
M
_
L
O
N
G
_
D
M
S
|
P
R
I
M
_
L
A
T
_
D
E
C
|
P
R
I
M
_
L
O
N
G
_
D
E
C
|
S
O
U
R
C
E
_
L
A
T
_
D
M
S
|
S
O
U
R
C
E
_
L
O
N
G
_
D
M
S
|
S
O
U
R
C
E
_
L
A
T
_
D
E
C
|
S
O
U
R
C
E
_
L
O
N
G
_
D
E
C
|
E
L
E
V
_
I
N
_
M
|
E
L
E
V
_
I
N
_
F
T
|
M
A
P
_
N
A
M
E
|
D
A
T
E
_
C
R
E
A
T
E
D
|
D
A
T
E
_
E
D
I
T
E
D
1
4
7
9
1
1
6
|
M
o
n
t
e
r
e
y
E
l
e
m
e
n
t
a
r
y
S
c
h
o
o
l
|
S
c
h
o
o
l
|
V
A
|
5
1
|
R
o
a
n
o
k
e
(
c
i
t
y
)
|
7
7
0
|
3
7
1
9
0
6
N
|
0
7
9
5
6
0
8
W
|
3
7
.
3
1
8
3
7
5
3
|
-
7
9
.
9
3
5
5
8
5
7
|
|
|
|
|
3
2
3
|
1
0
6
0
|
R
o
a
n
o
k
e
|
0
9
/
2
8
/
1
9
7
9
|
0
9
/
1
5
/
2
0
1
0
1
4
8
1
3
4
5
|
A
s
b
u
r
y
C
h
u
r
c
h
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
6
0
7
N
|
0
7
9
3
3
1
2
W
|
3
8
.
4
3
5
3
9
8
1
|
-
7
9
.
5
5
3
3
8
0
7
|
|
|
|
|
8
1
8
|
2
6
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
1
8
5
2
|
B
l
u
e
G
r
a
s
s
|
P
o
p
u
l
a
t
e
d
P
l
a
c
e
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
3
0
0
0
N
|
0
7
9
3
2
5
9
W
|
3
8
.
5
0
0
1
1
8
8
|
-
7
9
.
5
4
9
7
7
0
2
|
|
|
|
|
7
7
7
|
2
5
4
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
1
8
7
8
|
B
l
u
e
g
r
a
s
s
V
a
l
l
e
y
|
V
a
l
l
e
y
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
2
2
2
W
|
3
8
.
4
9
8
1
7
4
5
|
-
7
9
.
5
3
9
4
9
2
|
3
8
2
6
0
1
N
|
0
7
9
3
8
0
0
W
|
3
8
.
4
3
3
7
3
0
9
|
-
7
9
.
6
3
3
3
8
3
3
|
7
5
9
|
2
4
9
0
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
1
1
0
|
B
u
c
k
H
i
l
l
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
9
0
2
N
|
0
7
9
3
3
5
8
W
|
3
8
.
3
1
7
3
4
5
2
|
-
7
9
.
5
6
6
1
5
7
7
|
|
|
|
|
1
0
0
3
|
3
2
9
1
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
1
7
6
|
B
u
r
n
e
r
s
R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
5
0
9
N
|
0
7
9
3
4
0
9
W
|
3
8
.
4
1
9
2
8
7
3
|
-
7
9
.
5
6
9
2
1
4
4
|
3
8
2
5
3
1
N
|
0
7
9
3
5
3
8
W
|
3
8
.
4
2
5
2
7
7
8
|
-
7
9
.
5
9
3
8
8
8
9
|
8
4
8
|
2
7
8
2
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
3
2
4
|
M
o
u
n
t
C
a
r
l
y
l
e
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
5
5
6
N
|
0
7
9
3
3
5
3
W
|
3
8
.
2
6
5
6
7
9
9
|
-
7
9
.
5
6
4
7
6
8
2
|
|
|
|
|
6
9
8
|
2
2
9
0
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
4
3
4
|
C
e
n
t
r
a
l
C
h
u
r
c
h
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
3
2
3
W
|
3
8
.
4
9
8
1
7
4
4
|
-
7
9
.
5
5
6
4
3
7
1
|
|
|
|
|
7
7
3
|
2
5
3
6
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
5
5
7
|
C
l
a
y
l
i
c
k
H
o
l
l
o
w
|
V
a
l
l
e
y
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
6
1
3
N
|
0
7
9
3
2
3
8
W
|
3
8
.
2
7
0
4
0
2
1
|
-
7
9
.
5
4
3
9
3
4
3
|
3
8
1
7
3
3
N
|
0
7
9
3
3
2
4
W
|
3
8
.
2
9
2
5
|
-
7
9
.
5
5
6
6
6
6
7
|
5
7
3
|
1
8
8
0
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
7
8
5
|
C
r
a
b
R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
7
0
7
N
|
0
7
9
3
1
4
4
W
|
3
8
.
2
8
5
4
0
1
8
|
-
7
9
.
5
2
8
9
3
4
|
3
8
1
9
0
3
N
|
0
7
9
3
4
1
5
W
|
3
8
.
3
1
7
5
|
-
7
9
.
5
7
0
8
3
3
3
|
5
7
9
|
1
9
0
0
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
2
9
5
0
|
D
a
v
i
s
R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
8
2
4
N
|
0
7
9
3
0
5
3
W
|
3
8
.
3
0
6
7
9
0
3
|
-
7
9
.
5
1
4
7
6
7
1
|
3
8
2
0
5
7
N
|
0
7
9
3
5
0
5
W
|
3
8
.
3
4
9
1
6
6
7
|
-
7
9
.
5
8
4
7
2
2
2
|
6
0
1
|
1
9
7
2
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
2
8
1
|
E
l
k
R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
3
6
N
|
0
7
9
3
1
5
3
W
|
3
8
.
4
9
3
4
5
2
4
|
-
7
9
.
5
3
1
4
3
6
2
|
3
8
3
1
2
1
N
|
0
7
9
3
0
5
6
W
|
3
8
.
5
2
2
6
1
8
5
|
-
7
9
.
5
1
5
6
0
2
7
|
7
5
7
|
2
4
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
4
9
2
|
F
o
r
k
s
o
f
W
a
t
e
r
s
|
L
o
c
a
l
e
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
8
5
6
N
|
0
7
9
3
0
3
1
W
|
3
8
.
4
8
2
3
4
1
7
|
-
7
9
.
5
0
8
6
5
7
5
|
|
|
|
|
7
0
5
|
2
3
1
3
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
5
2
7
|
F
r
a
n
k
R
u
n
|
S
t
r
e
a
m
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
5
3
N
|
0
7
9
3
3
1
0
W
|
3
8
.
4
9
8
1
7
4
4
|
-
7
9
.
5
5
2
8
2
5
8
|
3
8
3
3
0
4
N
|
0
7
9
3
3
4
1
W
|
3
8
.
5
5
1
2
2
8
5
|
-
7
9
.
5
6
1
4
3
8
1
|
7
8
0
|
2
5
5
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
6
4
7
|
G
i
n
s
e
n
g
M
o
u
n
t
a
i
n
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
8
5
0
N
|
0
7
9
3
1
3
9
W
|
3
8
.
4
8
0
6
7
5
|
-
7
9
.
5
2
7
5
4
7
|
|
|
|
|
9
7
8
|
3
2
0
9
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
8
6
0
|
G
u
l
f
M
o
u
n
t
a
i
n
|
S
u
m
m
i
t
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
9
4
0
N
|
0
7
9
3
1
0
3
W
|
3
8
.
4
9
4
5
6
3
6
|
-
7
9
.
5
1
7
5
4
6
8
|
|
|
|
|
1
0
0
6
|
3
3
0
0
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
3
9
1
6
|
H
a
m
i
l
t
o
n
C
h
a
p
e
l
|
C
h
u
r
c
h
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
7
4
0
N
|
0
7
9
3
7
0
7
W
|
3
8
.
2
9
4
5
6
7
7
|
-
7
9
.
6
1
8
6
5
9
1
|
|
|
|
|
8
2
3
|
2
7
0
0
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
1
4
8
4
0
9
7
|
H
i
g
h
l
a
n
d
H
i
g
h
S
c
h
o
o
l
|
S
c
h
o
o
l
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
2
4
2
6
N
|
0
7
9
3
4
4
4
W
|
3
8
.
4
0
7
1
3
8
7
|
-
7
9
.
5
7
8
9
3
3
3
|
|
|
|
|
8
7
9
|
2
8
8
4
|
M
o
n
t
e
r
e
y
|
0
9
/
2
8
/
1
9
7
9
|
0
9
/
1
5
/
2
0
1
0
1
4
8
4
0
9
9
|
H
i
g
h
l
a
n
d
W
i
l
d
l
i
f
e
M
a
n
a
g
e
m
e
n
t
A
r
e
a
|
P
a
r
k
|
V
A
|
5
1
|
H
i
g
h
l
a
n
d
|
0
9
1
|
3
8
1
9
0
5
N
|
0
7
9
3
4
3
9
W
|
3
8
.
3
1
8
1
7
8
5
|
-
7
9
.
5
7
7
5
4
7
|
|
|
|
|
9
5
4
|
3
1
3
0
|
M
o
n
t
e
r
e
y
S
E
|
0
9
/
2
8
/
1
9
7
9
|
.
.
.
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 4
Your program may be invoked from the command line, in either of the following ways:
java GISParser -index
java GISParser -search
Note the instructions given above imply your main class (the one that implements public static void main()) must
be named GISParser, and the GISParser class must not be in a package (Eclipse calls this the default package). If you
execute your program from within Eclipse, you can still specify command-line parameters; see the Eclipse for 3114 notes
for an example.
Computing Record File Offsets [30%]
java GISParser -index
Your program will parse the given GIS record file and write, to the file named by the final parameter, the file offset and
Feature Name field for each of the records found in the file, listed in the order the records occur in the input file. Here's the
beginning of a sample input file (lines are truncated for the purpose of this display):
Here's the beginning of the corresponding output file:
The offset of a record is the position of the first character of that record within the GIS record file. Implementing this
functionality will require solving some small puzzles that are related to the rest of the assignment:
how to determine whether a file exists and can be opened for reading
how to determine the offset of the current character in a text file
how to read a whole line of text at once from a text file
how to break a delimited string into its components
how to properly close a file when you are finished processing it[2]
The following Java classes provide functionality that was useful in the reference solution:
File
RandomAccessFile
FileWriter
String
Scanner
Formatter
That is not to say that other Java classes could not play a role, or that alternative approaches might not use these classes. It is
absolutely not necessary to parse the input data character-by-character in order to achieve the specified results; avoiding that
approach will not only lead you to a more efficient solution, but will also leave you with a better understanding of how to
make use of the Java library.
FEATURE_ID|FEATURE_NAME|FEATURE_CLASS|STATE_ALPHA|STATE_NUMERIC|COUNTY_ . . .
885513|Siegrest Draw|Valley|NM|35|Eddy|015|323815N|1043256W|32.6376116| . . .
885526|AAA Tank|Reservoir|NM|35|Eddy|015|321043N|1041456W|32.1786543|-1 . . .
885566|Adobe Draw|Valley|NM|35|Eddy|015|322820N|1042141W|32.4723375|-10 . . .
885567|Adobe Flat|Flat|NM|35|Eddy|015|322849N|1042119W|32.4803932|-104. . . .
885607|Alacran Hills|Range|NM|35|Eddy|015|322812N|1041055W|32.4701183|- . . .
885684|Alkali Lake|Lake|NM|35|Eddy|015|323039N|1041133W|32.5109371|-104 . . .
NM_EddyKnown.txt contains the following records:
265 Siegrest Draw
425 AAA Tank
553 Adobe Draw
710 Adobe Flat
829 Alacran Hills
952 Alkali Lake
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 5
It's not necessary to align your output neatly, but it's good practice, and not difficult, to do so.
Performing Record Searches [70%]
java GISParser -search
Invoked in the second way, your program will process the given commands file (and NOT write out a list file offsets and
Feature Name values), and write its output to a file named by the final parameter. Each command must be echoed into the
output file, on a line by itself, numbered as shown in the posted log files. Following each echoed command, your program
will write one line reporting the results of carrying out that command, as described below.
The first input file will be a GIS record file, as described above. The second input file, whose name will also be supplied on
the command line, contains a sequence of search commands that must be processed. The only types of search that must be
supported are:
show_name
If the offset is valid (see below), write the Feature Name field for the record that occurs at that offset
show_latitude
show_longitude
If the offset is valid (see below), write the primary latitude or primary longitude, as specified, for the record that
occurs at that offset. The specified fields should be separated by whitespace (your choice as to what). If the
specified field is not included in the record, write "Coordinate is not given".
The Primary Latitude and Longitude fields are given in the GIS records in two formats, DMS and DEC. You should
parse the DMS version. Your output must reformat the latitude or longitude in a human-friendly manner.
Specifically:
1090224W 109d 2m 24s West
Note that latitude values will always consist of 6 decimal digits, followed by a hemisphere designator ('N' or 'S'), and
longitude values will always consist of 7 decimal digits, followed by a hemisphere designator ('W' or 'E'). The Java
String and Scanner classes both have useful methods for breaking out the relevant parts.
show_elevation
If the offset is valid (see below), write the elevation in feet field for the record that occurs at that offset. One
complication is that the elevation fields are optional; if the elevation is not given, write "Elevation is not given".
What about invalid offsets?
If the offset is not positive, write the error message “Offset is not positive”.
If the offset is larger than the length of the data file, write the error message “Offset is too large”.
If the offset is non-negative but does not correspond to the first character on a line of the file that contains a
GIS record, write the error message “Offset is unaligned”.
The only other command is:
quit
Cease processing the commands file, log the message “Exiting”, close all files and exit the program.
Each command will occur on a line by itself. Lines beginning with a semi-colon character ';' are comments and should be
ignored. Blank lines are possible. The command file is guaranteed to conform to this specification, so you do not need to
worry about error-checking when reading it. Here is a short sample commands file:
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 6
Here's a corresponding output sample:
Additional sample input and output files will be supplied on the course website, along with a program to generate additional
samples, and a grading harness.
Under no circumstances may your program store more than one complete GIS record in memory at any given instant,
nor is your program allowed to store a collection of file offsets for the records in the file.
A Look to the Future
You may have noticed (at least) one bit of weirdness in the commands: it's unrealistic to think that a user who wanted to find
a specific record in a data file would already know the offset at which the record occurred in the file. It's much more realistic
to expect a user will want to find a record that matches a given criterion. For example:
Show me a record that contains the feature name "Blacksburg".
Show me a record that matches the coordinates (32 28 49N, 104 25 15W).
In order to satisfy queries like these, we would want to have indexing capabilities. All and index does is map a key value to a
location (or set of locations) where matching records can be found. We could answer the first sort of query if we had an index
that supplied a file offset (or set of file offsets) that matched the feature name "Blacksburg".
A future project in this course will involve building indexing features.
; CS 3114: test script for GIS file parsing project.
;
; Report feature name:
show_name 634
;
show_longitude 9778
;
; Report feature elevation:
show_elevation 1749
;
; Report feature latitude:
show_latitude 11976
;
; Exit program
quit
1: show_name 634
Carlsbad
2: show_longitude 9778
104d 13m 34s West
3: show_elevation 1749
Unaligned offset
4: show_latitude 11976
Offset is too large
5: quit
Exiting
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 7
Test Data Generator and Output Comparator
The course website supplies a tar file with two useful tools for testing your solution. Both tools are packaged as executable
Java jar files, and require that you have installed the Java JDK[3] version 11.
The first tool, gisGenerator.jar, can be used to create test data and reference results. The tool is designed to be
executed from a command-line environment. Here is a sample session, using a Windows 10 terminal window, that shows the
tool at work. The same approach works on a Linux system. First, put the jar file, and the supplied GIS data master files into
a directory:
The generator has hard-coded dependencies on the two data files shown in the display above, so those files must be present in
the same directory when you run the generator.
You can execute the jar file by invoking the Java VM from the command line; in this case, the tool will display help regarding
the command-line interface:
Running the tool with valid command-line parameters will produce test data and reference solution files:
The created files are:
gisDB01.txt GIS records extracted from the two master files
gisCommands01.txt search commands suitable for using with the previous file
RefOffsets.txt (annotated) correct offsets for records in gisDB01.txt
Refresults.txt (annotated) correct results from performing the generated searches
The reference files are annotated with point values used by the comparison tool that is described next.
Z:\J1.GISParser\testing > dir
06/16/2020 10:23 PM 17,831 GISGenerator.jar
06/09/2020 08:21 PM 218,541 NM_EddyKnown.txt
06/09/2020 08:30 PM 7,021 NM_EddyUnknown.txt
Z:\J1.GISParser\testing > java -jar GISGenerator.jar
Invocation: java -jar GISGenerator.jar dbfilename commandsfilename [-repeat]
Creates database file and commands file for testing GISParser.
Creates reference file of record offsets and feature names.
Creates reference log file showing command processing results.
Z:\J1.GISParser\testing > java -jar GISGenerator.jar gisDB01.txt gisCommands01.txt
Z:\J1.GISParser\testing >dir
06/17/2020 07:37 PM 1,191 gisCommands01.txt
06/17/2020 07:37 PM 12,427 gisDB01.txt
06/16/2020 10:23 PM 17,831 GISGenerator.jar
06/09/2020 08:21 PM 218,541 NM_EddyKnown.txt
06/09/2020 08:30 PM 7,021 NM_EddyUnknown.txt
06/17/2020 07:37 PM 3,373 RefOffsets.txt
06/17/2020 07:37 PM 1,489 RefResults.txt
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 8
The second tool, LogComparator.jar, can be used to compare output files produced by your solution to reference
results, and generate scores. The tool is designed to be executed from a command-line environment. Here is a sample
session, using a Windows 10 terminal window, that shows the tool at work. The same approach works on a Linux system.
First, put the jar file, and the relevant GIS reference files and your output files into a directory:
In this case, the stu* files were produced by my solution, using the test data files produced earlier by the generator tool.
Executing the comparison tool with no parameters displays instructions:
Invoking the comparison tool on the two offsets files[4]:
The tool writes its results to the terminal window (which simplifies use in a script); you can use redirection to save the output
to a file and then open the file in a text editor. If there were any deductions, those would be shown in the output, next to the
lines that did not match. You can then examine the failures and use those to debug your solution.
The reference output files are annotated with a point value for each line:
Your output should NOT include point annotations. Sample GIS and command files will be supplied on the course website,
along with the corresponding correct results. You should test your solution with each of those samples. There is no guarantee
these will cover all the logical cases, so you should use the supplied tools to create additional test files.
[ 0] RefOffsets.txt contains the following records:
[ 0]
[40] 265 10231 Water Well
[10] 389 Cole Place
[10] 508 Hughes Ranch
[10] 628 10216 Water Well
[10] 752 05036 Water Well
. . .
Z:\J1.GISParser\testing > dir
06/03/2017 10:38 PM 5,140 LogComparator.jar
06/17/2020 07:37 PM 3,373 RefOffsets.txt
06/17/2020 07:37 PM 1,489 RefResults.txt
06/17/2020 08:33 PM 2,870 stuOffsets01.txt
06/17/2020 08:34 PM 1,227 stuSearches01.txt
11 File(s) 286,942 bytes
2 Dir(s) 45,703,204,864 bytes free
Z:\Fall2020\3114\Projects\J1.GISParser\testing > java -jar LogComparator.jar
Invoke: java -jar LogComparator.jar [
The reference results file should be created with the posted generator tool.
The student results file should be created by your solution, using the.
the input file corresponding to the reference results file.
The order of the file names on the command line does matter.
Z:\Fall2020\3114\Projects\J1.GISParser\testing > java -jar LogComparator.jar 1
RefOffsets.txt stuOffsets01.txt
Maximum score 700
gisDB01.txt contains the following records:
265 10231 Water Well
389 Cole Place
508 Hughes Ranch
628 10216 Water Well
752 05036 Water Well
. . .
1 >> Score: 700.00 / 700
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 9
Compiling and Running Your Solution from the Command Line
It's easy to execute your Java program from the command line (terminal window, if you prefer). The example here was
created using the Windows 10 terminal, but the procedure is essentially identical in Linux. This does assume that you have
installed the Java JDK.
To compile, simply invoke the Java compiler (javac) on your source files. I produced one solution that did not use
packages; in Eclipse terminology, each of my classes is in the default package. I can then compile with the command:
javac *.java:
I then copied the test input files into the same directory and invoked the Java VM on my (compiled) top-level class:
Now, I could use the comparison tool described earlier to compare my output to the reference output.
Grading
Solutions to this assignment will be graded using Java 11, as installed on the rlogin cluster. The evaluation will be performed
using command-line tools only. You should consult the tutorials on the course website, as well as the course staff, when you
have questions.
You may develop your solution on Windows or Linux, as you like, and you may use Eclipse or any other IDE you like. But,
it is your responsibility to ensure that you have tested your code with the posted testing/grading code on Windows or CentOS
8 using Java 11. Failure to do that is likely to result in a score of zero on this assignment.
Z:\J1.GISParser\soln>dir /W *.java
LongitudeHemi.java Longitude.java cmdParser.java
. . .
GISParser.java
Z:\J1.GISParser\soln>javac *.java
Z:\J1.GISParser\soln>dir /W /A-D
LongitudeHemi.java Longitude.java cmdParser.java
GISParser.java GISParser.class LatitudeHemi.class
Longitude.class cmdParser.class . . .
Z:\J1.GISParser\soln>java GISParser -index gisDB01.txt stuOffsets01.txt
Z:\J1.GISParser\soln>java GISParser -search gisDB01.txt gisCmds01.txt stuSearches01.txt
Z:\J1.GISParser\soln>dir *.txt
01/11/2022 01:45 PM 12,030 gisDB01.txt
01/11/2022 01:45 PM 949 gisCmds01.txt
01/11/2022 02:12 PM 2,836 stuOffsets01.txt
01/11/2022 02:13 PM 1,089 stuSearches01.txt
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 10
Grading Code
The website has a link to a zip file containing tools you can use to test and grade the correctness of your solution. In fact,
these are the same tools that we will use to do the grading of your solution (as described earlier).
readme.txt explains how this all works
gradeJ1.py Python script that runs the grading tools[2]
J1Tools.zip zip file containing:
GISGenerator.jar Java test data generator tool
NM_EddyKnown.txt GIS record input files used by the generator tool
NM_EddyUnknown.txt
LogComparator.jar comparison tool that scores your results
Since this is the exact code that will be used when we grade your solution, it is imperative that you make good use of this
code in your own testing. See the readme.txt file for detailed instructions.
Documentation
You should document your implementation in accordance with the Programming Standards page on the course website. It is
possible that your implementation will be evaluated for documentation, as well as for correctness of results. If so, your
submission will be evaluated by one of the TAs, who will assess a deduction (ideally zero) against your score from the
grading code.
Note that the evaluation of your project may depend substantially on the quality of your code and documentation.
Design Considerations
You should apply good object-oriented design principles in your project. Think through object responsibilities and
interactions, and sketch out your design before you start coding. The most common design shortcomings with an assignment
like this are to identify a too-small set of candidate classes, or to adopt a minimal design in order to reduce coding time. As
inspiration, I will tell you that my solution incorporates 8 distinct classes and 2 enumerated types, all of which play important
roles within the requirements of the assignment, and also in my solution for a future assignment.
Keep in mind that later projects in this course may build on this one. For example, it is likely that in a later project some other
part of the GIS database system will need to actually do something with various fields of the GIS records that are retrieved in
this assignment.
You should consider what possible errors might be encountered, and which ones you should check for. One common
example is the validation of command-line parameters. We will not test your solution with invalid parameters, but in reality
it's fairly common for a user to supply incorrect, or no, parameters. It's simple to test for the existence of files, and log an
error message and exit cleanly if expected files do not exist. As mentioned earlier, a GIS record may not specify a value for
every field. In other cases, but not this assignment, logically invalid values may be supplied; for example, a character string
(e.g., "Fred") where a numeric value is expected. A program that crashes, or even just computes nonsensical results, in such a
situation looks very unprofessional.
Finally, you should be careful about designing your solution to catch exceptions and write useful diagnostics to standard
output if an exception is caught, especially if it cannot be handled internally.
Checking for unexpected input, even if not strictly required, is simply good practice. So is catching exceptions, and handling
them internally; writing a program that crashes makes you look unprofessional. Providing meaningful error messages when
catching an exception, or detecting input-related problems, also helps you when debugging.
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 11
What to Submit
For this assignment, you must submit a zip file containing all the Java source code files for your implementation (i.e., .java
files). Submit only the Java source files. Do not submit Java bytecode (i.e., .class) files. If you use packages in your
implementation (and that's good practice), your zip file must include the correct directory structure for those packages. That's
easy to verify by running the supplied grading script on the zip file you are planning to submit.
This assignment will be auto-graded using Java SE 11, and the posted testing/grading code. Using a different version of the
Java compiler/libraries, whether older or newer, may very well result in your solution failing testing.
Your submitted zip file will be placed in the appropriate subdirectory with the packaged test code, and will then be evaluated
by running the supplied testing script. We will make no accommodations for submissions that do not work with that script.
Warning: the requirement here is for a zip file, based on the fact that there is a standard utility for creating zip files on Linux
systems. See "man zip" and "man unzip" for details, and that it's trivial to create a zip file in Windows. We will not accept
files in any other format (e.g., tar files, 7-zip'd files, gzip'd files, jar files, rar files, …). Such submissions will NOT work with
the supplied script, and will be discarded when we run the grading code ourselves.
Instructions, and the appropriate link, for submitting to the Curator are given in the Student Guide at the Curator website:
http://www.cs.vt.edu/curator/
You will be allowed to submit your solution several times, in order to make corrections. Your score will be determined by
testing your last submission. The Curator will not be used to grade your submissions in real time, but you will have already
done the grading using the posted code.
Some Java Gotchas
A character may not be a 1-byte character
Be careful of how Java deals with the notion of a "character". Some methods do not work in the manner you might expect.
For example, here's the beginning of the Oracle documentation for the method readChar() in the very useful
RandomAccessFile class:
public final char readChar()
throws IOException
Reads a character from this file.
So far, so good; looks like you could use this to read a single character from a text file… but here's the rest of the
documentation:
This method reads two bytes from the file, starting at the current file pointer. If the
bytes read, in order, are b1 and b2, where 0 <= b1, b2 <= 255, then the result is equal
to:
(char)((b1 << 8) | b2)
So, this method deals with two-byte representations of characters, which is certainly not what we have with an ASCII input
file. The writeChar() method writes two-byte representations of a character. Neither is appropriate for this project.
A quick examination of the other methods in RandomAccessFile will identify some alternatives that are more suitable for
your needs:
read() in its various incarnations
writeByte() and writeBytes()
readLine()
Do not assume that there are no other, equally useful, methods. Similar comments apply to other useful classes, like
FileWriter. Moral: be careful of the methods supplied by classes in the Java library. Always check the documentation,
carefully.
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 12
Escaping the pipe
Each GIS record is represented as a single line in the GIS data files. The most efficient approach is to grab the whole line as a
String, and then decompose the String. That can be accomplished in several ways, including using the various next()
methods in the Scanner class and using the split() method in the String class. Either way, you'd need to specify that
the delimiter between fields is a pipe character '|'. That turns out to be a little subtle.
Scanner objects use delimiters specified by a Pattern object set by a String; so does split(). In the syntax of the
Pattern class, the pipe character has special meaning (logical OR if you care). Therefore "\" will not work. An "escape"
is required. But, "\" is also special in String literals, so "\|" won't work either. The correct syntax would be something
like "\\|". Note the double backslashes!
Pledge
Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically,
you must include the following pledge statement at the beginning of the file that contains main():
// On my honor:
//
// - I have not discussed the Java language code in my program with
// anyone other than my instructor or the teaching assistants
// assigned to this course.
//
// - I have not used Java language code obtained from another student,
// or any other unauthorized source, including the Internet, either
// modified or unmodified.
//
// - If any Java language code or documentation used in my program
// was obtained from another source, such as a text book or course
// notes, that has been clearly noted with a proper citation in
// the comments of my program.
//
// - I have not designed this program in such a way as to defeat or
// interfere with the normal operation of the supplied grading code.
//
//
//
I reserve the option of assigning a score of zero to any submission that is undocumented or
does not contain this statement.
Change Log
Version Date Page Change(s)
7.00 Jan 18 Base version.
7.10 Jan 20 7 Corrected invocations of GISGenerator.jar in two shell displays.
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 13
Appendix: Using Packages
I also produced a solution using packages for the logical components of my design, and not placing my top-level class
(GISParser) into a package. In Eclipse, GISParser would be in the default package. Since all the components in the
subdirectories are imported by my top-level class, GISParser, I just need to invoke the Java compiler on that file:
Then, I can copy the test input files into the same directory and invoke the Java VM on my (compiled) top-level class:
Finally, I can use the comparison tool described earlier to compare my output to the reference output, as shown before.
Z:\J1.GISParser\soln > dir
06/11/2020 08:19 PM DS
06/11/2020 08:24 PM 6,604 GISParser.java
Z:\J1.GISParser\soln > tree
Z:.
└───DS
└───J1
├───Components
└───Types
Z:\J1.GISParser\soln > javac GISParser.java
Z:\J1.GISParser\soln > dir
06/11/2020 08:19 PM DS
06/17/2020 09:40 PM 5,385 GISParser.class
06/11/2020 08:24 PM 6,604 GISParser.java
Z:\J1.GISParser\soln >dir
06/11/2020 08:19 PM DS
06/17/2020 09:03 PM 1,191 gisCommands01.txt
06/17/2020 09:03 PM 12,427 gisDB01.txt
06/17/2020 09:40 PM 5,385 GISParser.class
06/11/2020 08:24 PM 6,604 GISParser.java
Z:\J1.GISParser\soln >java GISParser -index gisDB01.txt stuOffsets01.txt
Z:\J1.GISParser\soln >java GISParser -search gisDB01.txt gisCommands01.txt
stuSearches01.txt
Z:\J1.GISParser\soln >dir
06/11/2020 08:19 PM DS
06/17/2020 09:03 PM 1,191 gisCommands01.txt
06/17/2020 09:03 PM 12,427 gisDB01.txt
06/17/2020 09:40 PM 5,385 GISParser.class
06/11/2020 08:24 PM 6,604 GISParser.java
06/17/2020 09:53 PM 2,870 stuOffsets01.txt
06/17/2020 09:53 PM 1,227 stuSearches01.txt
CS 3114 Data Structures & Algorithms Project 1: File Operations and Parsing
Version 7.10 This is a purely individual assignment! 14
Notes
[1] The file format used on this site has changed a number of times since I first began using them. For the purposes of this
assignment, we will be using files that were selected a few years ago, and there is no guarantee these files will
correspond to the current format.
[2] Output written to a file stream using Library classes is typically buffered (stored in memory), not written to the file
immediately. This improves performance, and is invisible to your code. But, if your program crashes, or fails to close
an output file properly, it is possible that some output will be lost and never written to the file.
[3] The JDK can be obtained from: www.oracle.com/java/technologies/downloads/archive/
[4] The first parameter to the comparison tool is an integer that the automated grading script uses in labelling results.
When running the tool manually, it doesn't matter what you use for that value, but you must use something.
[2] If Python is properly installed on your Windows installation, you can run the supplied script this way:
On CentOS, I installed Python as follows:
I then ran the script this way:
I could make the script executable, without invoking Python directly by adding the following lines at the beginning of
the script (but that causes problems if I update to a new version of Python):
#! /usr/bin/python3.9
#
Z:\J1.GISParser\soln > gradeJ1.py -all wmcquain.J1.1.zip J1tools.zip
#1043 wmcquain: ~> su
Password:
[root@localhost wmcquain]# yum install python39
. . .
#1066 wmcquain: testing> python3.9 gradeJ1.py -all wmcquain.J1.1.zip J1tools.zip
]