Bypassing the guards. Debug PHP code packed by SourceGuardian

Exams Market
Bypassing the guards. Debug PHP code packed by SourceGuardian

Probably everyone knows what PHP is. Not everyone knows that you can write almost full-fledged applications on it, protected by the SourceGuardian packer, for which there is no free decompiler. Today we will analyze a practical way to quickly debug such bytecode under Windows without having to parse its structure in detail. We don't need any special tools either - just sleight of hand and a little magic.

This time, we were in control of a program implemented as a local web application. It looks like this: in the address bar of the browser under Windows, type localhost, after which the program interface will be displayed right there. All this miracle is realized using the local Apache server, which is part of the program. Of course, the application has a number of licensing restrictions - trial, functions that do not work until the license is purchased, and so on. We will try to overcome all these phenomena.

Typically, such restrictions are implemented by a set of PHP scripts, which are plain text files. But not in our case. All the scripts that make up our application are a jumble of letters, numbers and special characters, but upon closer inspection, we see that they have the same title like:

<?php if(!function_exists('sg_load')){$__v=phpversion();$__x=explode('.',$__v);$__v2=-$__x[0].'.'.(int)$__x[1];$__u=strtolower(substr(php_uname(),0,3));$__ts=(@const-ant('PHP_ZTS') || @constant('ZEND_THREAD_SAFE')?'ts':'');$__f=$__f0='ixed.'.$__-v2.$__ts.'.'.$__u;$__ff=$__ff0='ixed.'.$__v2.'.'.(int)$__x[2].$__ts.'.'.$__u;[email protected]_get('extension_dir');[email protected]($__ed);$__dl=function_exists-('dl') && function_exists('file_exists') && @ini_get('enable_dl') && [email protected]_get(-'safe_mode');if($__dl && $__e && version_compare($__v,'5.2.5','<') && function_-exists('getcwd') && function_exists('dirname')){$__d=$__d0=getcwd();if(@$__d[1]-==':') {$__d=str_replace('\','/',substr($__d,2));$__e=str_replace('\','/',sub-str($__e,2));}$__e.=($__h=str_repeat('/..',substr_count($__e,'/')));$__f='/ixed-/'.$__f0;$__ff='/ixed/'.$__ff0;while(!file_exists($__e.$__d.$__ff) && !file_exi-sts($__e.$__d.$__f) && strlen($__d)>1){$__d=dirname($__d);}if(file_exists($__e.-$__d.$__ff)) dl($__h.$__d.$__ff); else if(file_exists($__e.$__d.$__f)) dl($__h.-$__d.$__f);}if(!function_exists('sg_load') && $__dl && $__e0){if(file_exists($_-_e0.'/'.$__ff0)) dl($__ff0); else if(file_exists($__e0.'/'.$__f0)) dl($__f0);}i-f(!function_exists('sg_load')){$__ixedurl=''.urlencode($__v).'&php_ts='.($__ts?'1':'0').'&php_is='[email protected]('PHP_INT_SIZE').'&os_s='.urlencode(php_uname('s')).'&os_r='.urlencode(p-hp_uname('r')).'&os_m='.urlencode(php_uname('m'));$__sapi=php_sapi_name();if(!$-__e0) $__e0=$__ed;if(function_exists('php_ini_loaded_file')) $__ini=php_ini_loa-ded_file(); else $__ini='php.ini';if((substr($__sapi,0,3)=='cgi')||($__sapi=='c-li')||($__sapi=='embed')){$__msg="\nPHP script '".__FILE__."' is protected by SourceGuardian and requires a SourceGuardian loader '".$__f0."' to be installed.

A jumble of numbers and letters - a string argument to the sg_load function. There is an obvious obfuscation, and to identify it, you don't even need to use the wonderful Phpid utility from Manhunter: the code clearly indicates the name of the protector - SourceGuardian. Oddly enough, but the network has an approximate description of the encoding format of this obfuscator. The sg_load argument is a string that starts with 16 hexadecimal characters, followed by a base64-encoded set of binary data.


First comes the checksum, calculated character by character, starting at the PHP start tag up to the eighth character of the sg_load argument. This checksum is then compared against the next eight characters in the argument to verify the integrity of the fallback code. The checksum algorithm is a 32-bit version of the BSD checksum. Base64 encoded binary data is decoded to reveal a binary format with four data types.

The char type is used to represent single characters, small numbers, and boolean values. Integers are stored in 32-bit little endian format (int type), and strings can either be zero-terminated (zstr type) or prefixed length (lstr type). First, the first header is parsed, which contains the version number and security settings in case the file is bound to a specific IP address or hostname. The first byte of the data block in the first header determines the destination of the next bytes until byte 0xFF is found. For example, a value of 0x2 indicates that execution is limited to the hostname, and a value of 0x4 indicates that the length of the second header follows.

Once the offset of the encrypted second header is calculated from the first header, it is decrypted using the Blowfish block cipher in CBC mode. To decrypt it, both built-in keys and external ones, for example, from license files or obtained over the network, can be used. Deciphering the code is impossible without them.

Each successfully decrypted block contains three integers and actual data. The first integer is the checksum calculated from simple data. The second integer contains the length of the unencrypted data, and the third integer is the size of the data after decompression. Checksums are calculated using the previously mentioned 32-bit BSD checksum. If the first integer matches the calculated checksum, the decryption was successful.

After decryption, the data is also unpacked using the Lempel-Ziv algorithm. SourceGuardian uses the lzo1x and lzss implementations for files encoded with an older version of SourceGuardian. Using this technique, the second PHP header and data blocks are compressed and encrypted. Like the first header, the parser iterates over the data and extracts the values of the second header. It contains information about the limitations of the environment, which include the name of the license holder, license number, date the file was created, and the expiration date of the file.

The second header is followed by PHP data. SourceGuardian can store multiple versions of it for compatibility with different PHP versions. One data block is used for each version. The blocks are composed of two integers indicating the compatible PHP version and the size of the encrypted data as well as the actual PHP data. If a compatible block of data is found for the current version of PHP, it is decrypted. In general, all this information is, of course, very useful for creating your own unpacker, but it is disheartening for the amount of work that needs to be done to implement it. Our goal, as usual, is to solve the problem in the simplest possible way with the help of available tools.

To achieve what you want, you can dump the already decrypted and unpacked threaded code directly from the PHP virtual machine. Such methods are widespread and described on the Internet. At least three options are advised:

Opcache, since PHP 7.1​

php -d opcache.opt_debug_level=0x10000 test.php

phpdbg, since PHP 5.6​

phpdbg -p* test.php

vld, third-party extension​

php -d test.php
The result of this action will be a dump of the compiled program in the internal instructions of the sewn code, which can be used to restore the logic. However, we will not use this method. First, we need Linux for it, or we will have to tinker with compiling utilities for Windows. Secondly, even having received a listing, we will not be able to analyze the logic of the script directly at runtime, especially within a running program. And the most important argument against - the creators of SourceGuardian are not lamers at all: they insured themselves against such a possibility, which the Internet again told us about. Smart people, of course, have found a way to combat this phenomenon, but it is quite labor-intensive.

Let's start by simply loading a PHP program into the x64dbg debugger we are used to, no matter how strange it sounds. How can you debug PHP code with a Windows debugger, you ask? Yes, easily. To understand this, let's try to figure out what PHP is.

PHP is a platform-independent scripting language that is parsed and executed by an interpreter. The PHP interpreter is written in C and can be compiled for any platform. Unlike languages like C, PHP code is not compiled into an executable file. Instead, at runtime, PHP application code is compiled to bytecode by an engine called the Zend Engine. Once compiled, its instructions (opcodes) are executed by the Zend Engine's own virtual machine. It has a virtual processor and its own instruction set. These instructions are at a higher level than normal machine code and are not directly executed by the CPU. Instead, the virtual machine provides a handler for each instruction that parses the virtual machine command and runs its own CPU code.

As you can see, the instruction interpreter can be loaded into the debugger and traced the threaded code using it. In our case, it is built into the Apache HTTP Server located in the httpd.exe module. You can't just take it and run it from the debugger. Moreover, you cannot even attach to its process directly - it runs in the background and is hidden in the list of active processes to attach to x64dbg. But there is a little trick: if you go into the x64dbg parameters and in the farthest, usually hidden "Other" tab, select the "Install x64dbg with an online debugger (JIT)" mode, you can debug even hidden processes. To do this, in the task manager window, right-click on the Apache HTTP Server and select "Debug".

So, we got into the very core of the Zend virtual machine. In the "Debug Symbols" tab, in the list of loaded libraries, we see the php7ts.dll module, which contains this machine, and the module, which implements its SourceGuardian security extension. The place of protection implementation looks like this:

00007FFCBBE35B20 | jne ixed.7.2ts.7FFCBBE35B92 |
00007FFCBBE35B22 | cmp qword ptr ss:[rsp+1C0],0 |
00007FFCBBE35B2B | je ixed.7.2ts.7FFCBBE35B92 |
00007FFCBBE35B2D | call qword ptr ds:[<&zend_get_executed_ |
00007FFCBBE35B33 | mov rcx,qword ptr ss:[rsp+1C0] |
00007FFCBBE35B3B | mov qword ptr ds:[rcx+10],rax |
00007FFCBBE35B3F | call qword ptr ds:[<&tsrm_get_ls_cache> |
00007FFCBBE35B45 | mov ecx,dword ptr ds:[7FFCBBE4C95C] |
00007FFCBBE35B4B | dec ecx |
00007FFCBBE35B4D | movsxd rcx,ecx |
00007FFCBBE35B50 | mov rax,qword ptr ds:[rax] |
00007FFCBBE35B53 | mov rax,qword ptr ds:[rax+rcx*8] |
00007FFCBBE35B57 | mov dword ptr ds:[rax+18],1 |
00007FFCBBE35B5E | mov rdx,qword ptr ss:[rsp+13E0] |
00007FFCBBE35B66 | mov rcx,qword ptr ss:[rsp+1C0] |
00007FFCBBE35B6E | call qword ptr ds:[<&zend_execute>] |
00007FFCBBE35B74 | mov rcx,qword ptr ss:[rsp+1C0] |
00007FFCBBE35B7C | call qword ptr ds:[<&destroy_op_array>] |
00007FFCBBE35B82 | mov rcx,qword ptr ss:[rsp+1C0] |
00007FFCBBE35B8A | call qword ptr ds:[<&[email protected]@8>] |
00007FFCBBE35B90 | jmp ixed.7.2ts.7FFCBBE35BA7 |
00007FFCBBE35B92 | mov rax,qword ptr ss:[rsp+13E0] |

Here, the decrypted and unpacked code from the argument string has already been carefully converted into a sequence of instructions and fed to the input to the interpreter implemented by the zend_execute function.

For convenience, let's start by compiling a list of commands for our virtual machine. On the Internet, many different options are googled, which are very and not very different from each other, but we need our version, specifically compiled for us. To do this, we open the IDA disassembler and use it to look for the zend_get_opcode_name function in the php7ts.dll library. The function is quite simple - it returns the name from the Zend instruction opcode. Its implementation is also extremely simple, it just takes the name from the following array of strings (I will not give the entire list, everyone can easily generate it on their own):
.rdata:00000001806F2160 off_1806F2160 dq offset aZendNop ; DATA XREF: zend_get_opcode_name+3^o
.rdata:00000001806F2160 ; "ZEND_NOP"
.rdata:00000001806F2168 dq offset aZendAdd ; "ZEND_ADD"
.rdata:00000001806F2170 dq offset aZendSub ; "ZEND_SUB"
.rdata:00000001806F2178 dq offset aZendMul ; "ZEND_MUL"
.rdata:00000001806F2180 dq offset aZendDiv ; "ZEND_DIV"
.rdata:00000001806F2188 dq offset aZendMod ; "ZEND_MOD"
.rdata:00000001806F2190 dq offset aZendSl ; "ZEND_SL"
.rdata:00000001806F2198 dq offset aZendSr ; "ZEND_SR"
.rdata:00000001806F21A0 dq offset aZendConcat ; "ZEND_CONCAT"
.rdata:00000001806F21A8 dq offset aZendBwOr ; "ZEND_BW_OR"
.rdata:00000001806F21B0 dq offset aZendBwAnd ; "ZEND_BW_AND"
.rdata:00000001806F21B8 dq offset aZendBwXor ; "ZEND_BW_XOR"
.rdata:00000001806F21C0 dq offset aZendBwNot ; "ZEND_BW_NOT"
.rdata:00000001806F21C8 dq offset aZendBoolNot ; "ZEND_BOOL_NOT"

This list differs from the googled options, but now we have an idea of which of the commands is being executed at a given time. So that you do not relax, I will add a fly in the ointment: some obfuscators can replace the executive addresses of the command handlers with their own, so when decrypting a command you need to keep your ears open all the time, not relying entirely on the opcode. At a minimum, you need to make sure that the handler's address falls into the php7ts.dll address space.

So, we have a list of commands, you can start debugging. In order not to mess around with the interpreter binding, we put a breakpoint directly in the main loop of the threaded code interpreter inside the execute_ex function:

00007FFCB6236280 | 48:895C24 08 | mov qword ptr ss:[rsp+8],rbx | execute_ex
00007FFCB6236285 | 57 | push rdi |
00007FFCB6236286 | 48:83EC 20 | sub rsp,20 |
00007FFCB623628A | 6548:8B0425 58000000 | mov rax,qword ptr gs:[58] |
00007FFCB6236293 | 48:8BD9 | mov rbx,rcx |
00007FFCB6236296 | 8B15 A4758200 | mov edx,dword ptr ds:[7FFCB6A5D840] |
00007FFCB623629C | B9 18000000 | mov ecx,18 |
00007FFCB62362A1 | 48:8B3CD0 | mov rdi,qword ptr ds:[rax+rdx*8] |
00007FFCB62362A5 | 8B05 2DAE8200 | mov eax,dword ptr ds:[<executor_globals |
00007FFCB62362AB | 48:03F9 | add rdi,rcx |
00007FFCB62362AE | FFC8 | dec eax |
00007FFCB62362B0 | 4C:63C0 | movsxd r8,eax |
00007FFCB62362B3 | 48:8B07 | mov rax,qword ptr ds:[rdi] |
00007FFCB62362B6 | 48:8B10 | mov rdx,qword ptr ds:[rax] |
00007FFCB62362B9 | 4A:8B04C2 | mov rax,qword ptr ds:[rdx+r8*8] |
00007FFCB62362BD | 80B8 12020000 00 | cmp byte ptr ds:[rax+212],0 |
00007FFCB62362C4 | 75 50 | jne php7ts.7FFCB6236316 |
00007FFCB62362C6 | 6666:0F1F8400 00000000 | nop word ptr ds:[rax+rax],ax |
00007FFCB62362D0 | 48:8B03 | mov rax,qword ptr ds:[rbx] |
00007FFCB62362D3 | 48:8BCB | mov rcx,rbx |
00007FFCB62362D6 | 48:8B00 | mov rax,qword ptr ds:[rax] |
00007FFCB62362D9 | FF15 791A4E00 | call qword ptr ds:[7FFCB6717D58] |
00007FFCB62362DF | 85C0 | test eax,eax |
00007FFCB62362E1 | 74 ED | je php7ts.7FFCB62362D0 |
00007FFCB62362E3 | 0F8E D1894000 | jle php7ts.7FFCB663ECBA |
00007FFCB62362E9 | 8B05 E9AD8200 | mov eax,dword ptr ds:[<executor_globals |
00007FFCB62362EF | 4C:8B07 | mov r8,qword ptr ds:[rdi]

As you can see from this fragment, in this implementation, the interpreter is very simple. At the entrance to the RDI, we have a pointer to a structure of the following form:

struct _zend_execute_data {
const zend_op * opline;
zend_execute_data * call;
zval * return_value;
zend_function * func;
zval This; / * this + call_info + num_args * /
zend_execute_data * prev_execute_data;
zend_array * symbol_table;
void ** run_time_cache;
#if ZEND_EX_USE_LITERALS zval * literals;

In this structure, we are interested in the opline command counter and a pointer to an array of literals constants. During the execution cycle, the program counter is loaded into the rbx register, from which the handle of the current instruction is retrieved into the rax register, by which the call is made. In this case, a pointer to the _zend_execute_data structure is passed to the command in the rcx register as a parameter.

There is one more useful little-documented point. There is a func parameter in the zend_execute_data structure (in my case at a relative offset of 0x18). In fact, it points to the parent structure zend_op_array (the structure may change depending on the PHP version).

struct _zend_op_array {
zend_uchar type; zend_uchar arg_flags [3]; uint32_t fn_flags; zend_string * function_name; / * Function name * / zend_class_entry * scope; zend_function * prototype; uint32_t num_args; uint32_t required_num_args; zend_arg_info * arg_info; HashTable * attributes; int cache_size; int last_var; / * number of compiled variables * / uint32_t T; / * number of temporary variables * / uint32_t last; / * number of opcodes * / zend_op * opcodes; / * pointer to our zend_op array * / ZEND_MAP_PTR_DEF (void **, run_time_cache); ZEND_MAP_PTR_DEF (HashTable *, static_variables_ptr); HashTable * static_variables; zend_string ** vars; / * names of CV variables * / uint32_t * refcount; int last_live_range; int last_try_catch; zend_live_range * live_range; zend_try_catch_element * try_catch_array; zend_string * filename; / * Full path with the file name of the active script * / uint32_t line_start; / * First line number * / uint32_t line_end; / * Last line number * / zend_string * doc_comment; int last_literal; / * number of constants * / uint32_t num_dynamic_func_defs; zval * literals; . size_t len; char val[1];

The structure is extremely simple: the first 8 bytes of it are occupied by the refcounted object, it was recently introduced for the system garbage collector and is of no interest to us. The same as the next 8 bytes - a 64-bit hash. And here comes the actual line with a 64-bit counter: in our case, it is function_exists.

With constants sorted out, but what about variables? And with variables, the situation is very harsh. The fact is that all types of variables, both compiled and temporary, are stored in the stack frame right after the zend_execute structure:
| zend_execute_data |
| VAR[0] = ARG[1] | arguments
| ... |
| VAR[num_args-1] = ARG[N] |
| VAR[num_args] = CV[num_args] | remaining CVs
| ... |
| VAR[last_var-1] = CV[last_var-1] |
| VAR[last_var] = TMP[0] | TMP/VARs
| ... |
| VAR[last_var+T-1] = TMP[T] |
| ARG[N+1] (extra_args) | extra arguments
| ... |
The function arguments are addressed first, then the compiled variables, then the temporary ones. It is easy to see that the operands corresponding to the variables in the instruction are offsets from the stack frame. The trouble is that in this area, conventionally marked for 16-byte zval structures, there is no indication of which variable it belongs to. A rough idea of this can be obtained by remembering that the parent structure zend_op_array has a vars field (marked in the figure), which is a pointer to an array of pointers to variable names.

Of course, this only applies to compiled CV variables (type 0x10), because temporary variables of types 2 and 4 have no names. Accordingly, the empirical way of getting the name of a variable from the instruction operand looks like this: we take the offset (in the case of the ZEND_ASSIGN instruction shown in the figure, the first parameter is 0x50), subtract the size of the zend_execute_data structure (in our case, also 0x50), divide the result by the size of the zval structure (0x10 ). The resulting number is used as an index in the table of names vars - by clicking on the corresponding link, we get the name of the zero compiled variable "__v". Accordingly, operand 0x60 of the next ZEND_ASSIGN instruction will be the variable with index 1 in this table and the name "__x" and so on.

So, adding their parameters to the instructions,
ZEND_INIT_FCALL "function_exists"
ZEND_SEND_VAL "sg_load"
ZEND_JMPZ !1,->2C80
ZEND_INIT_FCALL "phpversion"
ZEND_FETCH_DIM_R $__x,0,~6
ZEND_CONCAT ~6,".","~7"
Let's see how this expression looks in the original script:
<?php if(!function_exists('sg_load')){$__v=phpversion();$__x=explode('.',$__v);$__v2=-$__x[0].'.'...

Bingo! Through titanic efforts, we have almost restored the logic of a small part of the already known line of code. However, we did it by hand without any dumpers, by means of the x64dbg debugger, along the way getting an idea of the functioning of the PHP virtual machine right inside it.

What are our next steps? Of course, stepping through the code we know is pretty boring, so we temporarily block the breakpoint on execute_ex and set it just before calling the decrypted code inside the module. As soon as the stop occurs, we block this breakpoint and re-enable the breakpoint inside the virtual machine. Now we are at the very beginning of the decrypted fragment and see all its instructions, constants, variables. You can dump it all to disk and write a parser, you can slowly but surely drag along step by step until we check the condition we need, it all depends on your perseverance and specific task.

I will try to give advice on optimizing the further process. The method is simple and universal for all virtual machines - in fact, we are only interested in the nodal moments of the algorithm branching. Let's set breakpoints exactly in these places, that is, on the handlers of conditional jump instructions ZEND_JMPZ, ZEND_JMPNZ, ZEND_JMPZNZ, ZEND_JMPZ_EX, ZEND_JMPNZ_EX, ZEND_CASE. In fact, it is not necessary to stop at these points, it is enough to write the passage of this point in the log. In addition, it would be nice to write the names of the called functions to the log, since there are few instructions for setting their names: ZEND_INIT_FCALL_BY_NAME, ZEND_INIT_FCALL, ZEND_INIT_NS_FCALL_BY_NAME, ZEND_INIT_METHOD_CALL, ZEND_INIT_STATIC_METHOD_CALL

The resulting tracks can be analyzed for patch locations on the fly. Since the virtual machine does not compile the code, but executes it step by step, the patcher can be hung directly at the input to execute_ex. Perhaps this article will prompt you to write your own universal dumper or even direct PHP encoded unpacker? After all, the above method is suitable for unpacking and debugging scripts protected not only by SourceGuardian, but also by other similar protection systems.

There are a lot of such systems: among them - AROHA PHPencoder, BCompiler, ByteRun Protector for PHP, ByteScrambler, CNCrypto, CodeCanyon PHP Encoder, CodeLock, CodeTangler ... and this is just a small part of the list. Moreover, for most of them, for example, for the same SourceGuardian or the notorious ionCube, there are no public decoders - only paid online services. Looking at the price list of such services, you can see how material knowledge is.
Last edited by a moderator:
  • Watchers 1
  • Top