Persistent Data
Eric McCreath
2Overview
In this lecture we will:
Consider different approaches for storing a programs
information.
using Serializable,
Bespoke text formats,
XML,
JSON, and
consider the option of using a database.
3Ways of storing data
There are many ways in which your program can store information.
These range in complexity from a simple text file to making use of
a database like mysql . As you select the approach for your
program you should consider what the important factors are in your
context. Factors include:
required libraries,
standard formats,
storage effiency/transaction latency,
program enviroment configuration,
type of data to store,
concurrency issues such as atomicity, durability,
robustness of the format, and
extensibility of the format.
4Serializable
Java enables you to mark a class as "Serializable". And then to
save objects of this class you can use an ObjectOutputStream
along with the writeObject method to turn the fields of the object
into a blob of binary data.
By using ObjectInputStream along with readObject you can load
the blob of binary data and recover the object with the fields
holding their previous values.
If we stored an object which held the name and age of a person, in
this case Hugh age 10, then Java would store a file like:
$ od -c data.ser
0000000 254 355 \0 005 s r \0 027 P e r s i s t D
0000020 a t a S e r i a l i z a b l e 8
0000040 264 261 003 345 360 < 303 002 \0 002 I \0 003 a g e
0000060 L \0 004 n a m e t \0 022 L j a v a /
0000100 l a n g / S t r i n g ; x p \0 \0
0000120 \0 \n t \0 004 H u g h
0000131
5Serializable
Advantages include: very simple to implement, space efficient in
storing objects, robust, and fast.
Disadvantages include: it is not a standard format that will
generally work with other languages, and it is fragile to changes in
the fields of the classes you store (if you change your class you
may no longer be able load all your old data).
6Bespoke
You may also just come up with your own bespoke format. To do
this you simply write a generator and parser for this format and use
standard file IO to read and write the file.
The format could either be binary or text based. Using a binary
approach generally would produce more space efficient storage,
although, it would be more involved.
Great care is needed to make such approaches robust. Generally
this will involve escaping characters. Also the formats can be less
amenable to change and extension than approaches like XML and
JSON.
Comma separated values (CSV) are a defato standard for storing
tabular data using a "comma" (or in DSV another token such as ":",
";", or tab) to divide the colmuns in the table. It is often used for
transferring data between applications (particularly spreadsheets).
7Bespoke
So a simple approach for storing a persons name and age may be
to just use a single line with a colon separating the name and age.
So in our example we would store:
Hugh:10
Note the above would work fine until Hugh changes his name to
"Hugh:21" and starts to apply for a drivers licence!
8XML
Extensible Markup Language (XML) is a standard approach for
storing information in a file. The information is stored in textual
form and is both human and machine readable. It is widely used
and many libraries available. This makes documents in XML very
portable.
XML uses "tags" to describe "elements" of data. Within the XML
document, elements can be nested forming a tree like structure.
Below is an example of an XML document that stores a person's
name and age.
Hugh
10
9XML
The standard JDK comes with libraries that enable you to read and
write XML documents.
The simplest way of doing this is to use the DOM interface. This
will parse the entire XLM document into a tree structure which you
can traverse and obtain the desired information from.
In situations that involve very big XML documents you can use the
SAX interface. This does not store the entire document tree,
rather, it calls a method that is provided to it.
10
JSON
Java Script Object Notation (JSON), like XML, is also an open
standard format that is widely used. It was designed to send data
between a web client and a server, however, the format has been
embraced for more general storage of data. The primary focus of
JSON is storing attribute-value pairs and will generally produce
smaller and more readable documents than XML.
So the name and age of a person could be stored in JSON as:
{"age":10,"name":"Hugh"}
There is no inbuilt Java library for JSON, however, there are a
number of open-source libraries available (our lab machines have
json-simple.jar installed).
11
Database
If the data you wish to store is accessible by multiple users or
multiple instances of the same application then it may be worth
storing the information in a database. This could be done using:
a light weight approach like SQLite which stores data in a local
single file without creating or connecting to a server process.
a client-server approach whereby your Java program connects to
a database such as MySQL or postgres . Java has a standard
API for doing this (the JDBC API).
12
Example Exam Questions
What are some advantages in storing information in either XML
and JSON compared to sorting information using the Serializable
approach?
List some differences and similarities between XML and JSON.
Modify the code PersistDataXML.java such that it can load and
save a list of people rather than just a single person as it currently
does.
The code is available at:
git@gitlab.cecs.anu.edu.au:comp2100/lecture-example-code.git