A new algorithm for identifying loops in decompilation

  • Wei Tao
  • , Mao Jian*
  • , Zou Wei
  • , Chen Yu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Loop identification is an essential step of control flow analysis in decompilation. The Classical algorithm for identifying loops is Tarjan's interval-finding algorithm, which is restricted to reducible graphs. Havlak presents one extension of Tarjan's algorithm to deal with irreducible graphs, which constructs a loop-nesting forest for an arbitrary flow graph. There's evidence showing that the running time of this algorithm is quadratic in the worst-case, and not almost linear as claimed. Ramalingam presents an improved algorithm with low time complexity on arbitrary graphs, but it performs not quite well on "real" control flow graphs (CFG). We present a novel algorithm for identifying loops in arbitrary CFGs. Based on a more detailed exploration on properties of loops and depth-first search (DFS), this algorithm traverses a CFG only once based on DFS and collects all information needed on the fly. It runs in approximately linear time and does not use any complicated data structures such as Interval/Derived Sequence of Graphs (DSG) or UNION-FIND sets. To perform complexity analysis of the algorithm, we introduce a new concept called unstructuredness coefficient to describe the unstructuredness of CFGs, and we find that the unstructuredness coefficients of these executables are usually small (<1.5). Such "low-unstructuredness" property distinguishes these CFGs from general single-root connected directed graphs, and it offers an explanation why those algorithms existed perform not quite well on real-world cases. The new algorithm has been applied to 11526 CFGs in 6 typical binary executables on both Linux and Window platforms. Experimental result has validated our theoretical analysis and it shows that our algorithm runs 2-5 times faster than the Havlak-Tarjan algorithm, and 2-8 times faster than the Ramalingam-Havlak-Tarjan algorithm.

Original languageEnglish
Title of host publicationStatic Analysis - 14th International Symposium, SAS 2007, Proceedings
Pages170-183
Number of pages14
StatePublished - 2007
Externally publishedYes
Event14th International Static Analysis Symposium, SAS 2007 - Kongens Lyngby, Denmark
Duration: 22 Aug 200724 Aug 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4634 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Static Analysis Symposium, SAS 2007
Country/TerritoryDenmark
CityKongens Lyngby
Period22/08/0724/08/07

Keywords

  • Control flow analysis
  • Decompilation
  • Loop identifying
  • Unstructuredness coefficient

Fingerprint

Dive into the research topics of 'A new algorithm for identifying loops in decompilation'. Together they form a unique fingerprint.

Cite this