Opcode Frequency Statistics

Here are the results of the PE_STAT program, which calculates frequences of the PE EXE/DLL opcode usage (x86 32-bit code).

Executable files were parsed into instructions using MISTFALL 1.01 engine; this means that only about half of executable files were processed on analyzing hd; and other part were filtered out because of some restrictions, to guarantee that only and only real opcodes (not data) will be processed.

Results shows us, that there are only few opcodes, that are used in most cases; while most part is used very seldom or doesnt used at all.

Set of these unused opcodes can be now used to increase quality of parsing executable files into instructions, i.e. to distinguish between code and data.

These "unused" opcodes here has low frequency values, 4ex 1-1000. Non-zero frequences can be explained by imperfect disassembly.

NOTE: Frequences of the first opcodes in the following table in some cases may be nonapplicable, because of RTL code present in mostly all analyzed files.

Total files processed:   ~1700
Total opcodes processed: ~41000000

op frequency    %    ---mostly used as:---
8B  6588971    15%     mov modr/m
FF  2736426     6%     push modr/m
E8  2509099     6%     call
83  2240885     5%     cmp/add modr/m (including add esp, xx after call)
89  2045133     4%     mov modr/m
8D  1573296     3%     lea modr/m
50  1423289     3%     push eax
74  1269798     3%     jz
6A  1064820     2%     push xx
85  1001107     2%     test r,r
0F  939376      2%     0F xx
56  882376      2%     push esi
75  845429      2%     jnz
33  781974      1%     xor r,r
53  740703      1%     push ebx
66  738157      1%     operand-size modifier prefix (-->16-bit)
EB  734922      1%     jmp xx
68  705038      1%     push imm32
57  679402      1%     push edi
C7  639613      1%     mov modr/m, imm
E9  616969      1%     jmp
C3  518251      1%     retn
5E  515151      1%     pop esi
3B  503023      1%     cmp r,r
55  467792      1%     push ebp
51  465043      1%     push ecx
59  454977      1%     pop ecx (after call)
C2  423134      1%     retn n
5B  388365      0%     pop ebx
5F  378583      0%     pop edi
B8  361314      0%     mov eax, c
5D  357410      0%     pop ebp
52  303136      0%     push edx
81  242215      0%
03  241530      0%
8A  219404      0%
39  214276      0%
64  208496      0%
80  201614      0%
C6  201273      0%
C1  190927      0%
A1  177274      0%
2B  173151      0%
F6  166445      0%
C9  146955      0%
F7  135687      0%
88  125771      0%
F3  103929      0%
A5  101174      0%
7C  99153       0%
B9  83369       0%
84  79363       0%
D9  73042       0%
72  68183       0%
40  67919       0%
7E  67620       0%
A3  67196       0%
48  66434       0%
7D  66015       0%
76  62449       0%
58  59073       0%
3D  55021       0%
BF  52910       0%
BE  52893       0%
DD  50018       0%
1B  47972       0%
73  45725       0%
01  43249       0%
D1  41029       0%
23  40509       0%
7F  40459       0%
BB  40403       0%
BA  40217       0%
AB  39515       0%
46  39420       0%
0B  39209       0%
77  34632       0%
25  34612       0%
D8  33722       0%
43  33402       0%
3C  29601       0%
05  28960       0%
47  28381       0%
A4  27395       0%
49  27157       0%
5A  27086       0%
99  24504       0%
DB  24317       0%
F2  23894       0%
AE  23725       0%
41  21745       0%
A8  20662       0%
42  20038       0%
DC  19108       0%
B0  18301       0%
3A  17726       0%     ...
A9  17323       0%     test eax, c
4A  16252       0%     dec edx
24  16162       0%     and al, nn
6B  15040       0%     imul modr/m, imm8
DF  14601       0%     fpu
38  14428       0%     cmp modr/m (8-bit)
4E  13731       0%     dec esi
4F  12994       0%     dec edi
D3  12952       0%     shift modr/m, cl
29  12266       0%     sub modr/m
4B  11811       0%     dec ebx
DE  11689       0%     fpu
B2  11646       0%     mov dl, nn
A6  10319       0%     cmpsb
69  9156        0%     imul modr/m, c
32  8539        0%     xor modr/m (8-bit)
AA  8469        0%     stosb
FE  8463        0%
2D  8450        0%     sub eax, c
79  8017        0%     jns
0C  7954        0%     or al, nn
09  7362        0%     or modr/m
BD  6953        0%     mov ebp, c
21  6680        0%     and modr/m
9E  6556        0%     sahf
0A  6409        0%     or modr/m (8-bit)
0D  6277        0%     or eax, c
31  5936        0%     xor modr/m
9B  4925        0%     fwait
A0  4764        0%     mov al, [addr]
90  4757        0%     nop
13  4490        0%     adc modr/m
B3  4484        0%     mov bl, nn
2C  4093        0%     sub al, nn
45  4083        0%     inc ebp
FC  3769        0%     cld
78  3744        0%     js xx
87  3329        0%     xchg modr/m
B1  3247        0%     mov cl, nn
A2  3034        0%     mov [addr], al
67  2995        0%     address-modifier prefix (-->16-bit)
A7  2809        0%     cmpsd
54  2754        0%     push esp
C0  2723        0%     shift modr/m, nn
04  2649        0%     add al, nn
8F  2287        0%     pop modr/m
02  2268        0%     add modr/m (8-bit)
4D  2177        0%   * dec ebp
C8  2108        0%   * enter
E3  1787        0%   * jecxz xx
22  1762        0%     and modr/m (8-bit)
08  1704        0%     or modr/m (8-bit)
AC  1665        0%   * lodsb
20  1643        0%     and modr/m (8-bit)
2A  1563        0%     sub modr/m (8-bit)
DA  1325        0%     fpu
92  1288        0%   * xchg edx, eax
F0  1106        0%     lock
D0  1092        0%     shift, 1
D2  1057        0%     shift, cl
00  988         0%     add modr/m
CC  985         0%   * int3
9C  908         0%   * pushfd
9D  883         0%   * popfd
F8  872         0%   * clc
11  857         0%   * adc modr/m
1A  847         0%   * sbb modr/m (8-bit)
E2  730         0%   * loop xx
86  707         0%     xchg modr/m
F9  652         0%   * stc
30  615         0%   * xor modr/m
7A  562         0%     jp xx
FD  540         0%   * std
91  535         0%   * xchg ecx, eax
B5  512         0%   * mov ch, nn
19  456         0%   * sbb modr/m
34  425         0%   * xor al, cc
B4  393         0%   * mov ah, cc
2E  391         0%   * cs:
28  386         0%   * sub modr/m
CD  362         0%   * int nn
35  281         0%   * xor eax, c
AF  279         0%   * scasd
B7  275         0%   * mov bh, nn
98  273         0%   * cwde
D7  271         0%     xlat
96  185         0%   * xchg esi, eax
F5  178         0%   * cmc
AD  176         0%   * lodsd
CB  168         0%   * retf
E6  158         0%     out port, al
7B  133         0%     jnp xx
44  120         0%     inc esp
B6  116         0%   * mov dh, nn
93  110         0%   * xchg ebx, eax
CA  104         0%     retf n
61  83          0%   * popad
60  75          0%   * pushad
65  72          0%   * gs:
8E  72          0%     mov sr, modr/m
26  71          0%   * es:
1C  68          0%   * sbb al, nn
97  60          0%   * xchg edi, eax
E4  60          0%     in al,port
4C  59          0%     dec esp
5C  56          0%     pop esp
8C  50          0%   * mov r,sr
EC  48          0%     in al,dx
EF  48          0%     out dx, eax
FA  45          0%     cli
1E  43          0%   * push ds
EE  41          0%     out dx,al
BC  40          0%     mov esp, c
10  39          0%     adc modr/m,r8
70  35          0%     jo xx
C4  35          0%     les
C5  34          0%     lds
E0  32          0%   * loopne xx
ED  32          0%     in eax,dx
14  31          0%   * adc al, nn
CE  29          0%     into
18  28          0%     sbb modr/m,r8
36  26          0%     ss:
63  25          0%     arpl
6E  22          0%     outsb
94  20          0%     xchg esp, eax
9F  20          0%     lahf
9A  19          0%   * call seg:offs
E1  19          0%   * loope xx
15  18          0%     adc eax, c
D4  17          0%   * aam nn
FB  17          0%     sti
95  16          0%   * xchg ebp, eax
1F  14          0%     pop ds
82  13          0%   * cmd byte modr/m, imm8
0E  12          0%     push cs
62  12          0%     bound
71  11          0%     jno
D6  10          0%   * setalc
12  9           0%   * adc modr/m
3E  9           0%     ds:
6F  8           0%     outsd
CF  8           0%   * iretd
D5  8           0%     aad nn
F4  8           0%     hlt
06  7           0%   * push es
37  6           0%     aaa
E5  6           0%     in eax, port
E7  5           0%     out port, eax
EA  5           0%   * jmp seg:offs
F1  5           0%     break
6C  4           0%     insb
6D  4           0%     insd
1D  3           0%   * sbb eax, c
27  3           0%     daa
2F  3           0%     das
16  2           0%   * push ss
17  2           0%   * pop ss
07  1           0%   * pop es
3F  1           0%     aas

With (*)-mark here are shown opcodes, that are sometimes used in viruses, but, as you can see, doesnt used in executables enough frequent.