Lab 4 | CS243 | Spring 2022 CS243 Home Lab 4 Getting Started Introduction Part A Part B Part C Part D Submission CS243: Program Analysis and Optimization Spring 2022 Lab 4: Interprocedural Analysis with BDDBDDB In this homework, you will interpret the results of and extend some interprocedural program analyses using the BDDBDDB framework. This is an individual assignment. This assignment consists of exercises and problems. Exercises are intended to help you learn BDDBDDB and you do not need to submit anything for the exercises. For the problems, you need to submit the results. Details of what should be submitted is explained in the problem description. This lab was released April 29, and is due May 10. Getting started For this project we recommend using the CS myth machines. JDK 1.5 is REQUIRED! The shell script setup.sh will setup the Java 1.5 environment on Myth. Do source setup.sh to set it up. If you want to run Java 1.5 locally, please set the JAVA_HOME environment variable to your Java 1.5 path. Go to the lab4/ directory and type make. This should compile AddressBook.java, which is the program that we will be analyzing throughout this homework. Take a few minutes to familiarize yourself with the contents of AddressBook.java. If you read carefully, you will notice that it contains a few bugs, but that is what we are going to try to detect by writing some simple analyses in Datalog! Execute: python3 run_bddbddb_new.py AddressBook pacs.dtl This will run the context-sensitive pointer analysis (written in Datalog as pacs.dtl) on the Java program AddressBook, which you've just compiled. There may be some warning messages printed to the terminal. This is normal, so don't worry about it. As long as it prints Done at the end, then it probably ran successfully. If the analysis ran successfully, there will now be a results/ sub-directory containing various .bdd, .map, .tuples and other files. (Don't worry about what these files mean for now.) Execute: python3 print_bddbddb_results_new.py AddressBook results/hP.tuples This will print out all of the hP points-to relations from the current run in a (reasonably) human-readable text format. You'll soon find out what hP means, so don't worry about it for now. (Actually, it doesn't print out all of the relations; it only prints the ones relevant to AddressBook. There are many more relations generated by Java Standard Library code that this tiny program imports.) If the last step successfully outputs lines of text to your terminal, then you are all set and ready to begin this homework! If you've gotten stuck or can't get any output, please post your questions to the newsgroup. Introduction to program analysis using BDDBDDB BDDBDDB is an implementation of the Datalog logic programming language that allows programmers to specify program analyses in a compact declarative manner. Let's briefly run through how a Java program proceeds through an analysis using BDDBDDB when you execute this script:
python3 run_bddbddb_new.py
The Joeq front-end (a program called joeq.Main.GenRelations) extracts components (i.e., variables, types) from the .class bytecode file supplied on the command line and encodes them into a BDD (binary decision diagram) representation, stored in the results/*.bdd files. The solver executes the specified Datalog file on the BDD representation and outputs tuples files in results/*.tuples. If you view those files, they are simply tuples of numbers. The results/*.map files map between these numbers and human-readable source-level names. Then when you execute this script:
python3 print_bddbddb_results_new.py
it will print out a set of tuples to the command-line. We encourage you to take your own Java programs and experiment with running analyses on them. The simplest way to get started is to compile your programs in the same directory as our scripts. Let us know if you have problems setting up the infrastructure. Note that BDDBDDB is sensitive to Java memory limits and if you are getting OutOfMemory errors with larger test programs you should edit run_bddbddb_new.py to increase these limits (the -mx command line argument). Basic Datalog syntax Let's use pacs.dtl (context-sensitive pointer analysis) as an example. We will step through parts of the file in turn: First, the header includes various implementation-specific declarations, such as what directory to output the results of the analysis (e.g., results/) and what file to use to refer to the domains of interest (e.g., results/fielddomains.pa). We've annotated that file with plenty of comments, so hopefully those will be useful:
.basedir "results"
.include "fielddomains.pa"
# Here are the abbreviations for domains used in the relations:
# - V: variables
# - H: heap objects, named by invocation sites of object creation methods
# - T: types
# - F: fields
# - I: invocation sites
# - Z: numbers for parameters
# - N: names
# - M: methods
# - C: classes
The next section declares all of the relations:
###### Relations
# All input relations are extracted by Joeq front-end
# variable 'v' points to heap object 'h'
vP0 (v:V0, h:H0) input
# ha.field may point to hb
hP0 (ha:H0, field:F0, hb:H1) input
# statement: dest = source (includes passing of arguments and return values)
A (dest:V0, source:V1) input
# statement: base.field = src
S (base:V0, field:F0, src:V1) input
# statement: dest = base.field
L (base:V0, field:F0, dest:V1) input
# actual parameters for invocation sites:
# 'actualparam' is passed as parameter number 'num' at 'invoke'
actual (invoke:I0, num:Z0, actualparam:V1) input
# formal parameters for methods:
# the formal parameter number 'num' of 'method' is represented as 'formalparam'
formal (method:M0, num:Z0, formalparam:V0) input
# Method 'method' contains an invocation site 'invoke' with virtual
# method name 'name'
mI (method:M0, invoke:I0, name:N0) input
# Statically-bound method invocation edges:
# invocation site 'invoke' calls method 'target'
IE0 (invoke:I0, target:M0) input
# variable 'var' has type 'type'
vT (var:V0, type:T0) input
# heap object 'heap' has type 'type'
hT (heap:H0, type:T1) input
# type T1 is assignable to type T0 (based on sub-classing relationship)
aT (type:T0, type:T1) input
# It is type-safe to assign heap object 'h' to variable 'v'
vPfilter (v:V0, h:H0)
# virtual method dispatch info:
# method 'method' is the target of dispatching method name 'name' on type 'type'
cha (type:T1, name:N0, method:M0) input
# 'v' is a local variable in method 'm'
mV(m : M0, v : V0) input
# 'v' is the return value of method 'method'
Mret (method:M0, v:V1) input
# 'v' is return value of invocation site 'invoke'
Iret (invoke:I0, v:V0) input
IEnum (invoke:I0, target:M0, ccaller:VC1, ccallee:VC0) input
#Mthr (method:M0, v:V1) input
#Ithr (invoke:I0, v:V0) input
#IE1 (invoke:I0, target:M0)
# Context-sensitive output relations (cannot print as outputtuples
# because there are astronomical numbers of contexts)
IEcs (ccaller:VC0, invoke:I0, ccallee:VC1, target:M0) output
cA (vcdest:VC0, dest:V0, vcsrc:VC1, source:V1) output
cvP (vc:VC0, v:V0, h:H0) output
# These relations are output as tuples that can be
# printed by print_bddbddb_results_new.py
# field 'ha.field' of heap object 'ha' may point to heap object 'hb'
hP (ha:H0, field:F0, hb:H1) outputtuples
# variable 'v' may point to heap object 'h'
vP (v:V0, h:H0) outputtuples
# invocation site 'invoke' may call method 'target'
IE (invoke:I0, target:M0) outputtuples
### Insert your new relations below here:
Each relation contains one or more terms, each of which may take on values from a particular domain. input (extensional database) relations are constructed from the BDD representation that the Joeq front-end builds from the source program. output and outputtuples (intensional database) relations are inferred by running the Datalog program. Any relations marked as outputtuples will have their tuples dumped to results/*.tuples files, which can later be printed in human-readable format. Each relation is like a 'declaration' in an imperative programming language. For example, the hP relation which you will be mainly dealing with in this assignment is shown below:
# field 'ha.field' of heap object 'ha' may point to heap object 'hb'
hP (ha:H0, field:F0, hb:H1) outputtuples
The name of the relation is listed, along with its terms: ha, field, and hb. Each term can take on values drawn from a domain. Here, the H stands for the domain of heap-allocated objects, and the F stands for the domain of field names (ignore the numbers like H0 and H1 for now). Because outputtuples is specified, when the analysis is run, the tuples will be written to a file called results/hP.tuples. Recall that to print out the hP relations, the command we used was:
python3 print_bddbddb_results_new.py AddressBook results/hP.tuples
Now you know where that tuples file comes from. The meaning of each relation is documented briefly in the comments above each line. If necessary, please refer to Chapter 3 of John Whaley's Ph.D. thesis (the creator of BDDBDDB) for more detailed definitions. The final section consists of the rules which tell Datalog how to infer new relations from the input relations:
###### Rules
cvP(_,v,h) :- vP0(v,h).
cA(_,v1,_,v2) :- A(v1,v2).
IEcs(vc2,i,vc1,m) :- IE0(i,m), IEnum(i,m,vc2,vc1).
vPfilter(v,h) :- vT(v,tv), aT(tv,th), hT(h,th).
cA(vc1,v1,vc2,v2) :- formal(m,z,v1), IEcs(vc2,i,vc1,m), actual(i,z,v2).
cA(vc2,v2,vc1,v1) :- Mret(m,v1), IEcs(vc2,i,vc1,m), Iret(i,v2).
cvP(vc1,v1,h) :- cA(vc1,v1,vc2,v2), cvP(vc2,v2,h), vPfilter(v1,h).
hP(h1,f,h2) :- S(v1,f,v2), cvP(vc1,v1,h1), cvP(vc1,v2,h2).
cvP(vc1,v2,h2) :- L(v1,f,v2), cvP(vc1,v1,h1), hP(h1,f,h2), vPfilter(v2,h2). split
vP(v, h) :- cvP(_, v, h).
IE(i, m) :- IEcs (_,i,_,m).
### Insert your new rules below here:
The relation on the lefthand side of each rule is true if all relations on the righthand side are true (the two sides are separated by the :- symbol). Relations on the righthand side are separated by commas, and each line must end with a period. The special _ symbol represents don't care, so any term can be substituted in its place. For example, this rule:
cvP(_,v,h) :- vP0(v,h).
means that in all possible contexts, cvP is true for a given v and h if vP0(v,h) is true. References BDDBDDB homepage Datalog examples from the official BDDBDDB website PLDI 2004 research paper PLDI 2006 PowerPoint tutorial Chapter 3 of John Whaley's Ph.D. thesis Part A: Interpreting the results of the pointer analysis Let's now study the output from running: python3 print_bddbddb_results_new.py AddressBook results/hP.tuples The hP relation specifies that a particular field in a particular heap-allocated object may point to another heap-allocated object. Let's look at one relation in particular, which you should find in your output:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
The || string separates each line into components. There are three components in this particular relation: AddressBook.java:63 Concrete: MaleRecord @ 6 Record/name AddressBook.java:37 Concrete: String @ 1 The first represents a heap-allocated object, in this case a MaleRecord object allocated in line 63 of AddressBook.java (look at the corresponding place in the source code to convince yourself of what line performed the allocation). The second represents the name field of the Record class. The third represents a heap-allocated object, in this case a String object allocated at line 37 ("Joe"). What this line says is: The field name of the MaleRecord object allocated on line 63 of AddressBook.java can point to the String "Joe" allocated on line 37. Go ahead and inspect the code manually to make sure that this relation makes sense to you. For some points-to relations the first component is the value global(null). This is used for static fields. Exercise 1 Copy the output of running python3 print_bddbddb_results_new.py AddressBook results/hP.tuples into a text file named exercise1.txt. There should be 26 total lines, each representing one hP relation. For every line or group of lines, explain what the points-to relation intuitively means in terms of what someone inspecting the source code can understand. For example, for the line:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
an appropriate answer would be:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
- Line 63 ('Record joeRecord = new MaleRecord(joe, joeEmail);') sets
joeRecord.name to point to the String "Joe" allocated on line 37.
In this exercise, we want you to build up some intuition about what the results of a pointer analysis means. Exercise 2 There is a bug in AddressBook.java:main() where it forgets to add some Record object into the address book. Which Record object is it, and more importantly, by looking at the output of the pointer analysis that you created in Exercise 1, what relations in particular could you consult in order to corroborate your answer? In other words, if the program had added that record into the address book, how would the hP relations look different? (You can try to fix the bug yourself, re-compile, and re-run the analysis to see for sure.) (Note: Apologies if this problem seems like a contrived example, which it is, but the principle is that for real programs of considerable size, it's infeasible to answer questions like this one by inspecting the source code. However, the pointer analysis we use can scale up to hundreds of thousands of lines of code.) Part B: Building a simple alias analysis The goal of a pointer analysis is to answer questions like: For an object X, what objects can the fields of X possibly point to? The goal of an alias analysis is to answer questions like: For objects X and Y with fields fX and fY, respectively, can X.fX and Y.fY possibly point to the same object Z? Problem 1 By doing Exercises 1 and 2, you will likely notice that the alias analysis results can be extracted from the pointer analysis results. In this problem, you will write a simple alias analysis in Datalog atop of the existing pointer analysis. First, copy pacs.dtl into a new file called alias.dtl Next, add a new relation in alias.dtl called hALIAS with the following declaration:
### Insert your new relations below here:
# Can h1.field1 and h2.field2 point to the same object?
hALIAS (h1:H0, field1:F0, h2:H1, field2:F1) outputtuples
Add a new rule (or multiple rules if you'd like) such that the hALIAS relation returns all of the pairs h1.field1 and h2.field2 which can point to the same object. Then you can test your new analysis by running it:
python3 run_bddbddb_new.py AddressBook alias.dtl
and printing out the hALIAS relations to your terminal:
python3 print_bddbddb_results_new.py AddressBook results/hALIAS.tuples
When you are satisfied that your analysis has met the specs, copy your alias.dtl file into the submit/ sub-directory. Also, copy the results of printing out the hALIAS relations into a file called submit/problem1.txt. At the top of that file, write a paragraph describing what parts of this output you found to be useful and what parts you found to be not useful (i.e., noisy output). Don't stress about details on this part; we just want you to build up some intuition about interpreting the results of your own analysis. To sanity-check your results, your analysis should be able to report that joeRecord.name and joeAltRecord.name are aliased. Problem 2 Now you will refine the alias analysis you created in Problem 1 so that it produces less output (actually, so that it produces more relevant output for one specific debugging task). You want to be able to use your refined analysis to detect the following bug in AddressBook.java:
66. Record maryRecord = new FemaleRecord(mary, maryEmail);
67. Record janeRecord = new FemaleRecord(jane, maryEmail);
janeRecord should have been constructed with janeEmail rather than maryEmail (this might be due to a copy-and-paste error, for instance). Due to this bug, maryRecord.emailAddr and janeRecord.emailAddr are aliased, when in fact they should not be. Copy alias.dtl into alias_refined.dtl and modify the rules in alias_refined.dtl such that aliasing relationships are only reported when: fields in different heap-allocated objects the heap-allocated objects share the same type For instance, we don't care that maryRecord.emailAddr aliases maryRecord.emailAddr (because they belong to the same heap object, and in fact, they are the same field), nor that the static String field joe refers to the same string object as joeRecord.name (because the base objects have different types). The expected hALIAS result from running alias_refined.dtl should be:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name
AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name || AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name
AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr || AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr
AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr || AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr
which says that joeRecord.name and joeAltRecord.name are aliased and that maryRecord.emailAddr and janeRecord.emailAddr are aliased. There should be no other output. Hints: != is the inequality operator in Datalog. hT is the relation that maps a heap object to its type. Remember to copy alias_refined.dtl into submit/ when you are done. Part C: Field-sensitive versus field-insensitive pointer analysis The pointer analysis we've been studying thus far is field sensitive, which means that it makes a distinction between different fields within an object. Field sensitivity is easy to achieve in Java because it is type-safe and pointer arithmetic is not allowed. In contrast, pointer analyses for C are usually field insensitive, due to the difficulties of distinguishing between field accesses in a sound manner (since pointer arithmetic can be used to access arbitrary fields in a type-unsafe way). Problem 3 Your task is to modify the rules in pacs.dtl to simulate a field insensitive analysis. First copy pacs.dtl to a new file named pacs_field_insensitive.dtl. Then add a printsize directive to the declaration of the hP relation:
hP (ha:H0, field:F0, hb:H1) outputtuples printsize
Now run pacs_field_insensitive.dtl, and when it finishes, it should print out how many total tuples are in the hP relation:
Saving results: SIZE OF hP: 105.
There is a lot of other output, but look for the Saving results: line. There should be 105 tuples in hP. (Actually, depending on your machine type, there may be 110 tuples, so don't worry about the exact number, as long as it's in this ballpark.) Note that this includes tuples from code in the Java Standard Library, which are not printed out by print_bddbddb_results_new.py. Now modify the declaration of hP to remove the field term:
hP (ha:H0, hb:H1) outputtuples printsize
Finally, modify the rules involving hP to make it field insensitive. That is, hP is true whenever any field of ha may point to hb. (Hint: Remember that the _ term means don't care.) Copy pacs_field_insensitive.dtl into submit/ when you are done. Copy the output of running python3 print_bddbddb_results_new.py AddressBook results/hP.tuples on the results of your field-insensitive analysis into a text file named submit/problem3.txt. In that same file, record the size of the hP relation. Did it increase or decrease from its original value of 105? Why? How does your set of tuples differ from the originals in exercise1.txt and why are they different? Part D: Context-sensitive versus context-insensitive pointer analysis As you've learned in class, a context-insensitive analysis can be much faster to compute than a context-sensitive analysis, but it is often less precise. In this problem, you will compare the results from context-sensitive and context-insensitive analyses. Problem 4 pa.dtl contains a context-insensitive pointer analysis. Run it using:
python3 run_bddbddb_new.py AddressBook pa.dtl
and print out the resulting hP results using:
python3 print_bddbddb_results_new.py AddressBook results/hP.tuples
Copy the results into submit/problem4.txt and, in that same file, write a comparison of these results with the context-sensitive pointer analysis results in Exercise 1. In particular, point out where the context-insensitive analysis leads to less precise results, and how the precision is lost (by inspecting what occurs in AddressBook.java). Consult Chapter 12 of the Dragon book for hints. Submission To submit your assignment, please follow the following steps : 1. In lab4 directory, type make submission, it will check if the submit/ folder contains the required files and generates a file submission.tar.gz
2. Upload submission.tar.gz to Canvas.
Your submit directory should contain the following files:
alias.dtl
problem1.txt
alias_refined.dtl
pacs_field_insensitive.dtl
problem3.txt
problem4.txt
If you discover a bug after submitting (and before the due date), simply submit again. We'll take the latest version. © Stanford 2022 | Built using the Bootstrap framework by your friendly TA