Polymorphic Games ================= part 1 -- conditional execution w/o jxx --------------------------------------- Lets consider some C-to-asm transformations. Lets assume a is eax, b is ebx, c is ecx, d is edx, and "condition" is a result of some binary comparison, i.e. single bit, 0 or 1. example 1 --------- First, we want to know how the following thing looks in assembly: c = condition ? -1 : 0; this looks like: ; CF <-- condition sbb ecx, ecx or ; ecx.high_bit <-- condition sar ecx, 31 example 2 --------- lets taste it in more real situation: c = a < b ? -1 : 0; or, in simple form: if (a < b) c = -1; else c = 0; so, we have: cmp eax, ebx sbb ecx, ecx or mov ecx, eax sub ecx, ebx sar ecx, 31 example 3 --------- a bit more complex code, simple a=MIN(a,b) function: if (a > b) a = b; which is equivalent to a = a > b ? b : a; which is equivalent to a = a + ((a > b ? -1 : 0) & (b - a)) which is equivalent to a += ((b - a) < 0 ? -1 : 0) & (b - a) which (using examples 1 and 2) results in: sub ebx, eax sbb ecx, ecx and ecx, ebx add eax, ecx example 4 --------- absolute value, abs() function: a = abs(a) if (a < 0) a = -a; a = a < 0 ? -a : a; but, since NEG operation is the same as NOT+INC (we consider typical x86 asm), we can do the following: a = (a ^ (a<0?-1:0)) - (a<0?-1:0) and, in assembly it looks like: mov edx, eax sar edx, 31 xor eax, edx sub eax, edx example 5 --------- some code in C: if (a != 0) a = b; else a = c; a = a ? b : c; a = ((a != 0 ? -1 : 0) & b) | ((a != 0 ? 0 : -1) & c); and so, it looks like this: cmp eax, 1 sbb eax, eax and ecx, eax xor eax, -1 and eax, ebx or eax, ecx As you can see, many basic operations could be encoded without conditional jmps. I.e. if we have some condition, we sometimes can avoid generating Jxx instruction. for example, if (c) a >>= b can be converted to a >>= (c ? b : 0), which is equivalent to example 5 + shr, and so on. But, what if we want to call some subroutine? Okey, here is a solution: func_addr = condition ? real_func : fake_func; func_addr(arg1, arg2, ...); int fake_func() { return 0; } But, what if we want to initialize some variable? Lets do the following then: var_addr = condition ? &real_var : &fake_var; *var_addr = ...; So, the following situation if (condition) { statement1; statement2; statementN; } can be converted into: if (condition) statement1; if (condition) statement2; if (condition) statementN; and each such line of code can be encoded using techniques shown in examples above. The only thing we cant change, is direct jmp, which is used in cycles, like while() or for(;;). However, we can expand conditional execution to the whole program, which will look like: while(1) { if (c) line1; if (c) line2; if (c) c = ...; if (c) line3; if (c) line4; } and then, we need only 1 jmp per whole program. As you can see, there can be many conditions, like if (c) if (d) if (e) lineN - for 3 nested cycles, or we can write a program in state-machine style, like if (c==1) line1; if (c>=2&&c<=3) line2; if (c==4) line3; and so on, and all these programs will be encoded w/o jmps. A program, written with minimal jxx usage will have the following properties: 1. it would be harder to understand its logic 2. it would be easier to permutate it. Okey, now lets imagine that you want to generate some different unique programs using the same C source. Then, you write kind of template, and you code some script to preprocess this template into C. In such a template, you can replace some macros with the following macro definition, choosing them randomly: #define H0(x) (((signed)(x)) >> (sizeof((signed)(x))*8-1)) #define H1(a,b) H0((a)-(b)) #define MIN1(a,b) ((a)+(H1(b,a) & ((b)-(a)))) #define MIN2(a,b) ((a)-(H1(b,a) & ((a)-(b)))) #define MIN3(a,b) ((b)-(H1(a,b) & ((b)-(a)))) #define MIN4(a,b) ((b)+(H1(a,b) & ((a)-(b)))) //#define MIN5(a,b) ((a)<(b)?(a):(b)) //#define MIN6(a,b) ((a)>(b)?(b):(a)) //#define MIN7(a,b) ((b)>(a)?(a):(b)) //#define MIN8(a,b) ((b)<(a)?(b):(a)) #define MAX1(a,b) ((a)+(H1(a,b) & ((b)-(a)))) #define MAX2(a,b) ((a)-(H1(a,b) & ((a)-(b)))) #define MAX3(a,b) ((b)-(H1(b,a) & ((b)-(a)))) #define MAX4(a,b) ((b)+(H1(b,a) & ((a)-(b)))) //#define MAX5(a,b) ((a)<(b)?(b):(a)) //#define MAX6(a,b) ((a)>(b)?(a):(b)) //#define MAX7(a,b) ((b)>(a)?(b):(a)) //#define MAX8(a,b) ((b)<(a)?(a):(b)) #define ABS1(a) (((a)^H0(a))-H0(a)) //#define ABS2(a) ((a)>0?(a):-(a)) //#define ABS3(a) ((a)>=0?(a):-(a)) //#define ABS4(a) ((a)<0?-(a):(a)) //#define ABS5(a) ((a)<=0?-(a):(a)) all these macros should generate different code, and uncommented macros should generate code w/o jmps. part 2 -- generating code ------------------------- In some situations, complex polymorphic decryptors are detected using set-of-instructions technique. This technique can be defined as the following: if (some piece of code consists of some known set of instructions) { return true or emulate these instructions. } As such, when analyzing algorithm encounter some instruction which is out of defined set, it returns false. Possible way of counteracting to such algorithms requires mistfall-like decryptor injection into program's code (for effective entrypoint hiding) plus splitting poly decryptor into some small parts where these parts are linked indirectly (i.e. located in the different places of code section), and order of their exection is pre-calculated using program tracing. When you are generating some code, you can change its instruction statistics using the following ways: example 6 --------- ; at this step we know result of (r1 ? r2) cmp r1, r2 jxx l2 l1: ; never executed really, but can be emulated {random part of random .code segment} l2: ; can be either executed or emulated {part of our code} example 7 --------- ; at this step we know that (r1 % (N+1)) == 2 and r1, N dword ptr [offset table1 + r1 * 4] ... table1: dd l0 dd l1 dd l2 ... dd lN l0: ; never executed really, but can be emulated {random part of random .code segment} l1: ; never executed really, but can be emulated {random part of random .code segment} l2: ; can be either executed or emulated {part of our code} lN: ... As you can see, such constructions (ex.6/7) will allow you to produce code where 1. overall instruction statistics is very close to standard statistics 2. full and correct emulation is required to determine real set of used instructions. axiom: if some algorithm will analyze each conditional jmp, choosing only one of the variants of the execution flow (where only some defined set of instructions is used), it will result in false alarms. However, here should be noted: statistical and more sophisticated detection algorithms are used only when all other more simple ways are impossible. The only way to write near-to-undetectable code is to constantly check detection algorithms and fix all the bugs which resulted in detection.