Static Single Assignment for Decompilation - CORE CORE Search Search Services Access to raw data API Dataset FastSync Content discovery Recommender Discovery Managing content Repository dashboard Support FAQs About About CORE Blog Contact us Static Single Assignment for Decompilation By Michael Van Emmerik Abstract Static Single Assignment enables the efficient implementation of many important decompiler components, including expression propagation, preservation analysis, type analysis, and the analysis of indirect jumps and calls. Source code is an essential part of all software development. It is so valuable that when it is not available, it can be worthwhile deriving it from the executable form of computer programs through the process of decompilation. There are many applications for decompiled source code, including inspections for malware, bugs, and vulnerabilities; interoperability; and the maintenance of an application that has some or all source code missing. Existing machine code decompilers, in contrast to existing decompilers for Java and similar platforms, have significant deficiencies. These include poor recovery of parameters and returns, poor handling of indirect jumps and calls, and poor to nonexistent type analysis. It is shown that use of the Static Single Assignment form (SSA form) enables many of these deficiencies to be overcome. SSA enables or assists with: data How analysis, particularly expression propagation; the identification of parameters and returns, without assuming ABI compliance; preservation analysis (whether a location is preserved across a call), which is needed for analysing parameters and return locations; type analysis, implemented as a sparse data flow problem; and the analysis of indirect jumps and calls. Expression propagation is a key element of a decompiler, since it allows long sequences of individual instruction semantics to be combined into more complex, high level statements. Parameters, returns, and types are features of high level languages that do not appear explicitly in machine code programs, hence their recovery is important for readability and the ability to recompile the generated code. In addition, type analysis is either absent from existing machine code decompilers, or is limited to a relatively simple propagation of types from library function calls. The analysis of indirect jumps and calls is important for finding all code in a machine code program, and enables the translation of important high level program elements such as switch statements, assigned gotos, virtual function calls, and calls through function pointers. Because of these challenges, machine code decompilers are the most interesting case. Existing machine code decompilers are weak at identifying parameters and returns, particularly where parameters are passed in registers, or the calling convention is non standard. A general analysis of parameters and returns is demonstrated, using new devices such as Collectors. These analyses become more complex in the presence of recursion. The elimination of redundant parameters and returns are shown to be global analyses, implying that for a general decompiler, procedures can not be finalised until all other procedures are analysed. Full type analysis is discussed, where the semantics of individual instructions, as well as information from library calls, contribute to the solution. A sparse, iterative, data How based approach is compared with the more common constraint based approach. The former requires special functions to handle the multiple constraints that result from overloaded operators such as addition and subtraction. Special problems arise with aggregate types (arrays and structures), and address taking of variables. Indirect branch instructions are often handled at instruction decode time. Delaying analysis until the program is represented in SSA form allows more powerful techniques such as expression propagation to be used. This results in a simpler, more general analysis, at the cost of having to throwaway some results and restart some analyses. It is shown that this technique easily extends to handling Fortran assigned gotos, which can not be effectively analysed at decode time. The analysis of indirect call instructions has the potential for enabling the recovery of object oriented virtual function calls. Many of the techniques presented in this thesis have been verified with the Boomerang open source decompiler. The goal of extending the state of the art of machine code decompilation has been achieved. There are of course still some areas left for future work. The most promising areas for future research have been identified as range analysis and alias analysis Publisher: University of Queensland, School of Information Technology and Electrical Engineering Year: 2007 OAI identifier: oai:espace.library.uq.edu.au:UQ:158682 Provided by: University of Queensland eSpace Download PDF: Sorry, we are unable to provide the full text but you may find it at the following location(s): https://espace.library.uq.edu.... (external link) https://espace.library.uq.edu.... (external link) https://espace.library.uq.edu.... (external link) Suggested articles To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request. Useful links Blog Services About CORE Contact us Cookies Privacy notice Writing about CORE? Discover our research outputs and cite our work. CORE is a not-for-profit service delivered by the Open University and Jisc.