SAMPLE ONE Sample exam solutions We have provided some sample solutions to help you study. These are just examples of correct solutions. There may (will) be other solutions that would also get full marks. Question 1: Data Representation 1. Convert 253 into its hexadecimal representation. Remember that hexadecimal is base 16 Step 1: Divide by 16 253 / 16 = 15 remainder 13 Step 2: Write the remainder in hexadecimal 13 (dec) = D (hex) (You will have to memorise the hex 0-F lookup table) Step 3: Divide result by 16 The result (15) is less than 16 so 15/16 = 0 remainder 15 Step 4: Write the remainder in hexadecimal 15 (dec) = F Step 5: Combine the hex digits in reverse to get the final answer 253 (dec) = FD (hex) You must show working to get full marks. 2. Convert 3.625 into its binary representation. Its easiest to do the integer and fractional components separately. So, start by converting 3 to binary. Remember that binary is base 2 Step 1: Divide by 2 3/2 = 1 remainder 1 Step 2: Divide the results by 2 1/2 = 0 remainder 1 Step 3: Write the remainders in reverse to get the final answer 3 (dec) = 11 (bin) Now convert the fractional component 0.625 to binary using the same process in reverse. INFO1903: Informatics (Adv) 1 SAMPLE ONE INFO1903: Informatics (Adv) Step 1: Multiply by 2 0.625 x 2 = 1.25 Step 2: Keep integer component (1) and continue with the rest 0.25 x 2 = 0.5 Step 3: Keep integer component (0) and continue with the rest 0.5 x 2 = 1 Step 4: Construct final answer from integer components 0.625 (dec) = 0.101 (bin) So putting these together we get 3.625 (dec) = 11.101 (bin). 3. Why is hexadecimal a useful notation in computer science? Obviously at a fundamental level all electronic data is in binary. However, there are various other represen- tations that come in useful for different purposes. Hexadecimal means that a byte can always be represented by 2 hex digits (there are 265 possible byte values). Since each hexadecimal digit represents four binary digits (from 0000 to 1111), it is easy to convert between binary and hex. Question 2: Unix tools The file marks.txt contains rows, each of which has an identifier for a student, followed by three numeric values (representing the practical work score, the exam score, and the final mark). For example, one row might be afek5 32 40 72 1. Write a command to output those rows in which the exam score is less than 5 below the practical score. gawk ’$3 < $2 - 5’ marks.txt 2. Which of the following regular expressions could be used to find the rows where the student identifier consists of one or more lower case letters followed by a single digit (a) egrep ’[a-z]*[0-9]’ marks.txt (b) egrep ’[a-z]+[0-9]’ marks.txt (c) egrep ’ˆ[a-z]+[0-9] ’ marks.txt (d) egrep ’[a-z]+[0-9]+’ marks.txt The correct answer is (c). The regexp in (b) will also pick out cases like a student with identifier 7abc97, since this contains a substring abc9 with one or more lower case letters followed by a single digit. This example also will be allowed by the other patterns. Only (c) includes the checks that the lower cases letters come at the start of the line, and the single digit is followed by a space so it is at the end of the identifier. 3. Fred Foolish has been asked to write a command to produce an order-of-merit list, that is, it should sort the file based on the final mark, so the student with the highest mark comes first. Fred suggests the following sort -k 4r marks.txt > merit.txt Explain to Fred why this command will not work correctly in all cases. Your answer should include an example file where the output will be incorrect. The problem here is that this command sorts the values as strings, rather than using the numerical value in ordering them. A simple example is where the final marks are 7 and 13; the ordering as strings has 13 less than 7. Instead Fred should use sort -k 4rn Question 3: Spreadsheets A spreadsheet contains a rectangular array of cells, in rows 4 to 49 and in columns B to H, each representing the results of a measurement of a quantity under different conditions. Write a formula to put in cell A51, that gives the average value of all the measurements. =AVERAGE(B4:H49) James Curran and Tara Murphy 2 SAMPLE ONE INFO1903: Informatics (Adv) Question 4: Python Write a function called is prime which takes a single integer argument and returns True if the integer is a prime number, and False otherwise. A prime number is a number with only two distinct divisors, 1 and itself. Note: 1 is therefore not a prime number. 1 def is_prime(n): 2 if n <= 0: 3 return False 4 elif n == 1: 5 # 1 is not prime 6 return False 7 elif n == 2: 8 # 2 is the only even prime 9 return True 10 elif n % 2 == 0: 11 # all other evens are not prime 12 return False 13 else: 14 # check whether any numbers between 3 and 15 # sqrt(n) are divisors of n. If so, n is 16 # not prime 17 for i in range(3, int(n**0.5) + 1, 2): 18 if n % i == 0: 19 return False 20 # otherwise n is prime 21 return True Note that in the exam you do not need to include comments (unless specifically asked to). I have included them here to help explain the solution. James Curran and Tara Murphy 3 SAMPLE ONE INFO1903: Informatics (Adv) Question 5: Python The original question just asked for a function to convert from Python variable names (with underscores between words) to Java variable names with capitalisation to distinguish words. This is a little bit undefined for a typical exam question, so you may have interpreted it a bit differently. Our solution assumes that variables starting with underscores are meant to be private, and so keeps the underscore in this case only: 1 def camel_case(python): 2 # keep underscores at the beginning of private variables 3 java = '' 4 if python.startswith('_'): 5 java = '_' 6 python = python[1:] 7 8 new_word = False 9 for c in python: 10 if c == '_': 11 # skip underscore characters 12 new_word = True 13 continue 14 if new_word: 15 # make a new word start with an uppercase character 16 java += c.upper() 17 else: 18 java += c 19 new_word = False 20 return java James Curran and Tara Murphy 4 SAMPLE ONE INFO1903: Informatics (Adv) In the labs, I challenged people to come up with a more complete solution including documentation (using Python doc strings) and test cases (using Python doctests), which also handled possible error cases, e.g. rais- ing a TypeError for non-string arguments, and a ValueError if the Python string passed in was empty. You could be fancier than my solution below but this is the level I would expect in the exams: 1 def camel_case(python): 2 """camel_case(string) -> string 3 4 The camel_case function takes a single argument, a string containing 5 the name of a Python variable, and it returns a string containing a 6 Java camel case equivalent name. 7 8 >>> camel_case('get_value') 9 'getValue' 10 >>> camel_case('this_is_a_test') 11 'thisIsATest' 12 >>> camel_case('_private') 13 '_private' 14 >>> camel_case('_') 15 '_' 16 >>> camel_case(3) 17 Traceback (most recent call last): 18 ... 19 TypeError: camel_case expects a single string argument 20 """ 21 22 if type(python) != str: 23 raise TypeError("camel_case expects a single string argument") 24 if not python: 25 raise ValueError("the string argument must contain a valid Python variable") 26 27 # keep underscores at the beginning of private variables 28 java = '' 29 if python.startswith('_'): 30 java = '_' 31 python = python[1:] 32 33 new_word = False 34 for c in python: 35 if c == '_': 36 # skip underscore characters 37 new_word = True 38 continue 39 if new_word: 40 java += c.upper() 41 else: 42 java += c 43 new_word = False 44 return java 45 46 if __name__ == '__main__': 47 import doctest 48 doctest.testmod() James Curran and Tara Murphy 5 SAMPLE ONE INFO1903: Informatics (Adv) Question 6: Databases You are developing a database system for a movie rental store where they need to track which customers have borrowed which movie. For each customer the owner wants to keep track of their membership number, name, address, and phone number; for each movie, there is a unique movie id (to allow for multiple copies of a movie), title and a rating (G, PG etc). When a movie is rented they want to know that it is unavailable and when it is due back. Each customer may rent zero or more movies. Each movie may be borrowed by only one customer. The schema for the database has the following tables: customer(membership id, name, address, phone) movie(movie id, title, rating) rent(movie id, membership id, due) Part 1 Write the SQL create table statements required to create the tables for this database. Explain your choice of attribute data types. 1 CREATE TABLE customer ( 2 membership_id MEDIUMINT UNSIGNED, 3 name VARCHAR(40), 4 address TEXT, 5 phone VARCHAR(15), 6 PRIMARY KEY (member_id) 7 ); In this table the name, address and phone number all make sense as text or string fields. I have made the member id a MEDIUMINT which allows numbers up to 16 777 215 (see the table at the end of Lab 9). If you were running a larger movie rental company you might have more members than that, so INT would also be reasonable. SMALLINT only allows numbers up to 65 535, which may be too small depending on the size of the company. 1 CREATE TABLE movie ( 2 movie_id INT UNSIGNED, 3 title VARCHAR(100), 4 rating VARCHAR(10), 5 PRIMARY KEY (movie_id) 6 ); In this table, the title and rating can both be character strings. Choose a length suitable for holding the type of information you are likely to have. We probably have a lot of movies in the company, so I have used an INT to be safe. An UNSIGNED INT lets you have even more unique numbers, since you know you do not need negative numbers for this field. 1 CREATE TABLE rent ( 2 membership_id MEDIUMINT UNSIGNED, 3 movie_id INT UNSIGNED, 4 borrowed DATE, 5 due DATE, 6 PRIMARY KEY (membership_id, movie_id, borrowed) 7 ); In this table we keep the membership id and movie id the same as before. The borrowed and due fields are both dates. The primary key needs to cover the customer, movie and date of borrowing to ensure uniqueness. Part 2 Write SQL queries to answer the following questions: 1. Print the names and addresses of all of the customers in the database. 1 SELECT name, address 2 FROM customer James Curran and Tara Murphy 6 SAMPLE ONE INFO1903: Informatics (Adv) 2. Print the names of all of the movies in the database (removing duplicates) that are rated G or PG. 1 SELECT DISTINCT title 2 FROM movie 3 WHERE rating = "PG" OR rating = "G" 3. Print the names of all of the customers who currently renting movies. 1 SELECT DISTINCT name 2 FROM customer 3 INNER JOIN rent 4 ON customer.membership_id = rent.membership_id 4. Print the name and phone number of all of the customers who have one or movies overdue. You may use the SQL function NOW() to get the current date; 1 SELECT DISTINCT name, phone_number 2 FROM customer 3 INNER JOIN rent 4 ON customer.membership_id = rent.membership_id 5 WHERE rent.due < NOW() 5. Print the names of all of the customers who have borrowed more than one movie that is due on the 23/6/2007 and print how many they borrowed that are due on that day. 1 SELECT name, COUNT(*) FROM customer 2 INNER JOIN rent 3 ON customer.membership_id = rent.membership_id 4 WHERE rent.due = '2007-06-23' 5 GROUP BY rent.membership_id 6 HAVING COUNT(*) > 1 Question 7: Data Management You are given a large table of stock market data to manage. Discuss 2 advantages and disadvantages of storing it using each of the following technologies: Note: it is ok to give your answers to short written responses in point form, but you must use complete sentences and clear English. The number of marks allocated to a question will give you a hint as to how much you should write. • csv file/s Some advantages of using csv are: 1. csv is a format that is easy to work with with a wide range of applications and tools. For example, it is easy to process using Unix shell scripts, or Python programs, but it can also be loaded into Excel if required. This would be useful if a range of people with different IT backgrounds needed to use the data. 2. As a plain text format, csv is efficient in terms of storage size — there is no real overhead in a csv file. This would be useful if the data has to be transmitted quickly. Some disadvantages of using csv are: 1. csv offers no or little protection against user error. Users could accidentally modify the content of the file, and it would be difficult to track or discover their changes. It is difficult to restrict user access to data stored in a plain text format. 2. csv requires custom tools to access and analyse the data. You would have to write specific program- s/scripts for each type of analysis you wanted to do, even for relatively simple things like extracting subsets of the data. This could be a problem if the people who have to work with the data do not have a strong IT background. James Curran and Tara Murphy 7 SAMPLE ONE INFO1903: Informatics (Adv) • JSON document/s Some advantages of using JSON are: 1. JSON is a very widely-used format. This means it would be easy to load the data into other appli- cations, and share the data with other companies or collaborators if required. There are many off the shelf programs that can read and display JSON files. 2. JSON is relatively human readable, so it would be possible for non-IT experts to understand the format of the data by looking at the files directly. Some disadvantages of using JSON are: 1. Although it is a text format, JSON can be quite verbose. Hence you are increasing the storage size (over, say, csv) without any of the advantages of using, say, a DBMS. 2. Like csv, JSON has the disadvantage that it would be difficult to control user access to particular content within the files. For example, if you wanted to have some users to have permission to look at some of the stockmarket data, but not other parts, that would require custom software to be written. • a spreadsheet such as Microsoft Excel Some advantages of using spreadsheets are: 1. Storing the data in a spreadsheet such as Excel allows users to do powerful analysis easily, using a WYSIWYG interface. This means that people without a programming background can easily work with the data. 2. It is likely that the users would want to plot the stockmarket data. Using Excel they could do this easily without needing any programming skills. Some disadvantages of using spreadsheets are: 1. It can be hard to ensure data integrity in spreadsheets. It is quite easy for a user to accidentally change values, and difficult to set up a rigorous testing environment to find these problems. This could be critical in a stockmarket application. 2. You are constrained to working in the spreadsheet environment. This may involve purchasing expen- sive (and proprietary) software. It might be more difficult to get into the guts of the data and do exactly what you want. • a DBMS such as MySQL Some advantages of using a DBMS are: 1. They offer good support for security and data integrity — you can restrict access to certain users, and to subsets of the data. This would be very important for stock market applications. You can cut down on problems such as data duplication using primary and foreign keys. 2. A DBMS allows the users to run arbitrary queries in a predefined language — SQL. This makes data access relatively straightforward with less need for custom software. Some disadvantages of using a DBMS are: 1. There is significant overhead in setting up a DBMS in comparison to storing the data as plain text. This requires some expertise which may not be available in the company. You would have to consider whether the advantages offered by a DBMS were worth the startup cost. 2. You may need to purchase expensive (and proprietary) software, and pay for a database administrator to help set up or maintain your system. James Curran and Tara Murphy 8