Java编程辅导|程式编写|网络家教|代写软件|代做程序--Lab 4 | CS243

The Joeq front-end (a program called joeq.Main.GenRelations) extracts components (i.e., variables, types) from the .class bytecode file supplied on the command line and encodes them into a BDD (binary decision diagram) representation, stored in the results/*.bdd files. The solver executes the specified Datalog file on the BDD representation and outputs tuples files in results/*.tuples. If you view those files, they are simply tuples of numbers. The results/*.map files map between these numbers and human-readable source-level names. Then when you execute this script: python3 print_bddbddb_results_new.py

it will print out a set of tuples to the command-line. We encourage you to take your own Java programs and experiment with running analyses on them. The simplest way to get started is to compile your programs in the same directory as our scripts. Let us know if you have problems setting up the infrastructure. Note that BDDBDDB is sensitive to Java memory limits and if you are getting OutOfMemory errors with larger test programs you should edit run_bddbddb_new.py to increase these limits (the -mx command line argument). Basic Datalog syntax Let's use pacs.dtl (context-sensitive pointer analysis) as an example. We will step through parts of the file in turn: First, the header includes various implementation-specific declarations, such as what directory to output the results of the analysis (e.g., results/) and what file to use to refer to the domains of interest (e.g., results/fielddomains.pa). We've annotated that file with plenty of comments, so hopefully those will be useful: .basedir "results" .include "fielddomains.pa" # Here are the abbreviations for domains used in the relations: # - V: variables # - H: heap objects, named by invocation sites of object creation methods # - T: types # - F: fields # - I: invocation sites # - Z: numbers for parameters # - N: names # - M: methods # - C: classes The next section declares all of the relations: ###### Relations # All input relations are extracted by Joeq front-end # variable 'v' points to heap object 'h' vP0 (v:V0, h:H0) input # ha.field may point to hb hP0 (ha:H0, field:F0, hb:H1) input # statement: dest = source (includes passing of arguments and return values) A (dest:V0, source:V1) input # statement: base.field = src S (base:V0, field:F0, src:V1) input # statement: dest = base.field L (base:V0, field:F0, dest:V1) input # actual parameters for invocation sites: # 'actualparam' is passed as parameter number 'num' at 'invoke' actual (invoke:I0, num:Z0, actualparam:V1) input # formal parameters for methods: # the formal parameter number 'num' of 'method' is represented as 'formalparam' formal (method:M0, num:Z0, formalparam:V0) input # Method 'method' contains an invocation site 'invoke' with virtual # method name 'name' mI (method:M0, invoke:I0, name:N0) input # Statically-bound method invocation edges: # invocation site 'invoke' calls method 'target' IE0 (invoke:I0, target:M0) input # variable 'var' has type 'type' vT (var:V0, type:T0) input # heap object 'heap' has type 'type' hT (heap:H0, type:T1) input # type T1 is assignable to type T0 (based on sub-classing relationship) aT (type:T0, type:T1) input # It is type-safe to assign heap object 'h' to variable 'v' vPfilter (v:V0, h:H0) # virtual method dispatch info: # method 'method' is the target of dispatching method name 'name' on type 'type' cha (type:T1, name:N0, method:M0) input # 'v' is a local variable in method 'm' mV(m : M0, v : V0) input # 'v' is the return value of method 'method' Mret (method:M0, v:V1) input # 'v' is return value of invocation site 'invoke' Iret (invoke:I0, v:V0) input IEnum (invoke:I0, target:M0, ccaller:VC1, ccallee:VC0) input #Mthr (method:M0, v:V1) input #Ithr (invoke:I0, v:V0) input #IE1 (invoke:I0, target:M0) # Context-sensitive output relations (cannot print as outputtuples # because there are astronomical numbers of contexts) IEcs (ccaller:VC0, invoke:I0, ccallee:VC1, target:M0) output cA (vcdest:VC0, dest:V0, vcsrc:VC1, source:V1) output cvP (vc:VC0, v:V0, h:H0) output # These relations are output as tuples that can be # printed by print_bddbddb_results_new.py # field 'ha.field' of heap object 'ha' may point to heap object 'hb' hP (ha:H0, field:F0, hb:H1) outputtuples # variable 'v' may point to heap object 'h' vP (v:V0, h:H0) outputtuples # invocation site 'invoke' may call method 'target' IE (invoke:I0, target:M0) outputtuples ### Insert your new relations below here: Each relation contains one or more terms, each of which may take on values from a particular domain. input (extensional database) relations are constructed from the BDD representation that the Joeq front-end builds from the source program. output and outputtuples (intensional database) relations are inferred by running the Datalog program. Any relations marked as outputtuples will have their tuples dumped to results/*.tuples files, which can later be printed in human-readable format. Each relation is like a 'declaration' in an imperative programming language. For example, the hP relation which you will be mainly dealing with in this assignment is shown below: # field 'ha.field' of heap object 'ha' may point to heap object 'hb' hP (ha:H0, field:F0, hb:H1) outputtuples The name of the relation is listed, along with its terms: ha, field, and hb. Each term can take on values drawn from a domain. Here, the H stands for the domain of heap-allocated objects, and the F stands for the domain of field names (ignore the numbers like H0 and H1 for now). Because outputtuples is specified, when the analysis is run, the tuples will be written to a file called results/hP.tuples. Recall that to print out the hP relations, the command we used was: python3 print_bddbddb_results_new.py AddressBook results/hP.tuples Now you know where that tuples file comes from. The meaning of each relation is documented briefly in the comments above each line. If necessary, please refer to Chapter 3 of John Whaley's Ph.D. thesis (the creator of BDDBDDB) for more detailed definitions. The final section consists of the rules which tell Datalog how to infer new relations from the input relations: ###### Rules cvP(_,v,h) :- vP0(v,h). cA(_,v1,_,v2) :- A(v1,v2). IEcs(vc2,i,vc1,m) :- IE0(i,m), IEnum(i,m,vc2,vc1). vPfilter(v,h) :- vT(v,tv), aT(tv,th), hT(h,th). cA(vc1,v1,vc2,v2) :- formal(m,z,v1), IEcs(vc2,i,vc1,m), actual(i,z,v2). cA(vc2,v2,vc1,v1) :- Mret(m,v1), IEcs(vc2,i,vc1,m), Iret(i,v2). cvP(vc1,v1,h) :- cA(vc1,v1,vc2,v2), cvP(vc2,v2,h), vPfilter(v1,h). hP(h1,f,h2) :- S(v1,f,v2), cvP(vc1,v1,h1), cvP(vc1,v2,h2). cvP(vc1,v2,h2) :- L(v1,f,v2), cvP(vc1,v1,h1), hP(h1,f,h2), vPfilter(v2,h2). split vP(v, h) :- cvP(_, v, h). IE(i, m) :- IEcs (_,i,_,m). ### Insert your new rules below here: The relation on the lefthand side of each rule is true if all relations on the righthand side are true (the two sides are separated by the :- symbol). Relations on the righthand side are separated by commas, and each line must end with a period. The special _ symbol represents don't care, so any term can be substituted in its place. For example, this rule: cvP(_,v,h) :- vP0(v,h). means that in all possible contexts, cvP is true for a given v and h if vP0(v,h) is true. References BDDBDDB homepage Datalog examples from the official BDDBDDB website PLDI 2004 research paper PLDI 2006 PowerPoint tutorial Chapter 3 of John Whaley's Ph.D. thesis Part A: Interpreting the results of the pointer analysis Let's now study the output from running: python3 print_bddbddb_results_new.py AddressBook results/hP.tuples The hP relation specifies that a particular field in a particular heap-allocated object may point to another heap-allocated object. Let's look at one relation in particular, which you should find in your output: AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1 The || string separates each line into components. There are three components in this particular relation: AddressBook.java:63 Concrete: MaleRecord @ 6 Record/name AddressBook.java:37 Concrete: String @ 1 The first represents a heap-allocated object, in this case a MaleRecord object allocated in line 63 of AddressBook.java (look at the corresponding place in the source code to convince yourself of what line performed the allocation). The second represents the name field of the Record class. The third represents a heap-allocated object, in this case a String object allocated at line 37 ("Joe"). What this line says is: The field name of the MaleRecord object allocated on line 63 of AddressBook.java can point to the String "Joe" allocated on line 37. Go ahead and inspect the code manually to make sure that this relation makes sense to you. For some points-to relations the first component is the value global(null). This is used for static fields. Exercise 1 Copy the output of running python3 print_bddbddb_results_new.py AddressBook results/hP.tuples into a text file named exercise1.txt. There should be 26 total lines, each representing one hP relation. For every line or group of lines, explain what the points-to relation intuitively means in terms of what someone inspecting the source code can understand. For example, for the line: AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1 an appropriate answer would be: AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1 - Line 63 ('Record joeRecord = new MaleRecord(joe, joeEmail);') sets joeRecord.name to point to the String "Joe" allocated on line 37. In this exercise, we want you to build up some intuition about what the results of a pointer analysis means. Exercise 2 There is a bug in AddressBook.java:main() where it forgets to add some Record object into the address book. Which Record object is it, and more importantly, by looking at the output of the pointer analysis that you created in Exercise 1, what relations in particular could you consult in order to corroborate your answer? In other words, if the program had added that record into the address book, how would the hP relations look different? (You can try to fix the bug yourself, re-compile, and re-run the analysis to see for sure.) (Note: Apologies if this problem seems like a contrived example, which it is, but the principle is that for real programs of considerable size, it's infeasible to answer questions like this one by inspecting the source code. However, the pointer analysis we use can scale up to hundreds of thousands of lines of code.) Part B: Building a simple alias analysis The goal of a pointer analysis is to answer questions like: For an object X, what objects can the fields of X possibly point to? The goal of an alias analysis is to answer questions like: For objects X and Y with fields fX and fY, respectively, can X.fX and Y.fY possibly point to the same object Z? Problem 1 By doing Exercises 1 and 2, you will likely notice that the alias analysis results can be extracted from the pointer analysis results. In this problem, you will write a simple alias analysis in Datalog atop of the existing pointer analysis. First, copy pacs.dtl into a new file called alias.dtl Next, add a new relation in alias.dtl called hALIAS with the following declaration: ### Insert your new relations below here: # Can h1.field1 and h2.field2 point to the same object? hALIAS (h1:H0, field1:F0, h2:H1, field2:F1) outputtuples Add a new rule (or multiple rules if you'd like) such that the hALIAS relation returns all of the pairs h1.field1 and h2.field2 which can point to the same object. Then you can test your new analysis by running it: python3 run_bddbddb_new.py AddressBook alias.dtl and printing out the hALIAS relations to your terminal: python3 print_bddbddb_results_new.py AddressBook results/hALIAS.tuples When you are satisfied that your analysis has met the specs, copy your alias.dtl file into the submit/ sub-directory. Also, copy the results of printing out the hALIAS relations into a file called submit/problem1.txt. At the top of that file, write a paragraph describing what parts of this output you found to be useful and what parts you found to be not useful (i.e., noisy output). Don't stress about details on this part; we just want you to build up some intuition about interpreting the results of your own analysis. To sanity-check your results, your analysis should be able to report that joeRecord.name and joeAltRecord.name are aliased. Problem 2 Now you will refine the alias analysis you created in Problem 1 so that it produces less output (actually, so that it produces more relevant output for one specific debugging task). You want to be able to use your refined analysis to detect the following bug in AddressBook.java: 66. Record maryRecord = new FemaleRecord(mary, maryEmail); 67. Record janeRecord = new FemaleRecord(jane, maryEmail); janeRecord should have been constructed with janeEmail rather than maryEmail (this might be due to a copy-and-paste error, for instance). Due to this bug, maryRecord.emailAddr and janeRecord.emailAddr are aliased, when in fact they should not be. Copy alias.dtl into alias_refined.dtl and modify the rules in alias_refined.dtl such that aliasing relationships are only reported when: fields in different heap-allocated objects the heap-allocated objects share the same type For instance, we don't care that maryRecord.emailAddr aliases maryRecord.emailAddr (because they belong to the same heap object, and in fact, they are the same field), nor that the static String field joe refers to the same string object as joeRecord.name (because the base objects have different types). The expected hALIAS result from running alias_refined.dtl should be: AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name || AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr || AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr || AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr which says that joeRecord.name and joeAltRecord.name are aliased and that maryRecord.emailAddr and janeRecord.emailAddr are aliased. There should be no other output. Hints: != is the inequality operator in Datalog. hT is the relation that maps a heap object to its type. Remember to copy alias_refined.dtl into submit/ when you are done. Part C: Field-sensitive versus field-insensitive pointer analysis The pointer analysis we've been studying thus far is field sensitive, which means that it makes a distinction between different fields within an object. Field sensitivity is easy to achieve in Java because it is type-safe and pointer arithmetic is not allowed. In contrast, pointer analyses for C are usually field insensitive, due to the difficulties of distinguishing between field accesses in a sound manner (since pointer arithmetic can be used to access arbitrary fields in a type-unsafe way). Problem 3 Your task is to modify the rules in pacs.dtl to simulate a field insensitive analysis. First copy pacs.dtl to a new file named pacs_field_insensitive.dtl. Then add a printsize directive to the declaration of the hP relation: hP (ha:H0, field:F0, hb:H1) outputtuples printsize Now run pacs_field_insensitive.dtl, and when it finishes, it should print out how many total tuples are in the hP relation: Saving results: SIZE OF hP: 105. There is a lot of other output, but look for the Saving results: line. There should be 105 tuples in hP. (Actually, depending on your machine type, there may be 110 tuples, so don't worry about the exact number, as long as it's in this ballpark.) Note that this includes tuples from code in the Java Standard Library, which are not printed out by print_bddbddb_results_new.py. Now modify the declaration of hP to remove the field term: hP (ha:H0, hb:H1) outputtuples printsize Finally, modify the rules involving hP to make it field insensitive. That is, hP is true whenever any field of ha may point to hb. (Hint: Remember that the _ term means don't care.) Copy pacs_field_insensitive.dtl into submit/ when you are done. Copy the output of running python3 print_bddbddb_results_new.py AddressBook results/hP.tuples on the results of your field-insensitive analysis into a text file named submit/problem3.txt. In that same file, record the size of the hP relation. Did it increase or decrease from its original value of 105? Why? How does your set of tuples differ from the originals in exercise1.txt and why are they different? Part D: Context-sensitive versus context-insensitive pointer analysis As you've learned in class, a context-insensitive analysis can be much faster to compute than a context-sensitive analysis, but it is often less precise. In this problem, you will compare the results from context-sensitive and context-insensitive analyses. Problem 4 pa.dtl contains a context-insensitive pointer analysis. Run it using: python3 run_bddbddb_new.py AddressBook pa.dtl and print out the resulting hP results using: python3 print_bddbddb_results_new.py AddressBook results/hP.tuples Copy the results into submit/problem4.txt and, in that same file, write a comparison of these results with the context-sensitive pointer analysis results in Exercise 1. In particular, point out where the context-insensitive analysis leads to less precise results, and how the precision is lost (by inspecting what occurs in AddressBook.java). Consult Chapter 12 of the Dragon book for hints. Submission To submit your assignment, please follow the following steps : 1. In lab4 directory, type make submission, it will check if the submit/ folder contains the required files and generates a file submission.tar.gz 2. Upload submission.tar.gz to Canvas. Your submit directory should contain the following files: alias.dtl problem1.txt alias_refined.dtl pacs_field_insensitive.dtl problem3.txt problem4.txt If you discover a bug after submitting (and before the due date), simply submit again. We'll take the latest version. © Stanford 2022 | Built using the Bootstrap framework by your friendly TA

Java程序辅导