I’m still following the Assembly Primer for Hackers from Vivek Ramachandran of SecurityTube in preparation for Penetration Testing with BackTrack. In this review I’ll cover data types and how to move bytes, numbers, pointers and strings between labels and registers.
Variables (data/labels) are defined in the .data segment of your assembly program. Here are some of the available data types you’ll commonly use.
Data types in assembly; photo credit to Vivek Ramachandran
# Demo program to show how to use Data types and MOVx instructions .data HelloWorld: .ascii "Hello World!" ByteLocation: .byte 10 Int32: .int 2 Int16: .short 3 Float: .float 10.23 IntegerArray: .int 10,20,30,40,50 .bss .comm LargeBuffer, 10000 .text .globl _start _start: nop # Exit syscall to exit the program movl $1, %eax movl $0, %ebx int $0x80
Moving numbers in assembly
Introduction to mov
This is the mov family of operations. By appending b, w or l you can choose to move 8 bits, 16 bits or 32 bits of data. To demonstrate these operations, we’ll be using the example above.
Moving a byte into a register
movb $0, %al
This will move the integer 0 into the lower 8 bits of the EAX register.
Moving a word into a register
movw $10, %ax
This will move the integer 10 into the lower 16 bits of the EAX register.
Moving a word into a register
movl $20, %eax
This will move the integer 20 into the 32-bit EAX register.
Moving a word into a label
movw $50, Int16
This will move the integer 50 into the 16-bit label Int16.
Moving a label into a register
movl Int32, %eax
This will move the contents of the Int32 label into the 32-bit EAX register.
Moving a register into a label
movb %al, ByteLocation
This will move the contents of the 8-bit AL register into the 8-bit ByteLocation label.
Accessing memory locations (using pointers)
In C we have the concept of pointers. A pointer is simply a variable that points to a location in memory. Typically that memory location holds some data that is important to us and that’s why we’re keeping a pointer to it so we can access the data later. This same concept can be achieved in assembly.
Moving a label’s memory address into a register (creating a pointer)
movl $Int32, %eax
This will move the memory location of the Int32 label into the EAX register. In effect the EAX register is now a pointer to the data held by the Int32 label. Notice that we use movl because memory locations are 4 bytes. Also notice that to access the memory location of a label you prepend the $ character.
Dereferencing a pointer (accessing the contents of a memory address)
Moving a word into a dereferenced location
movl $9, (%eax)
This will move the integer 9 into the memory location held in EAX. In other words, if this were C, %eax would be considered a pointer and (%eax) would be the way we dereference that pointer to change the contents of the location it points to. The equivalent in C would like something like this:
int Int32 = 2; int *eax; eax = &Int32; *eax = 9;
The only difference in the C example is that we had to define eax as an int pointer before we could copy the address of Int32. In assembly we can just copy the address of Int32 directly into the EAX register, circumventing the need for an additional variable. But line 4 of this C example is the equivalent of the assembly example shown above.
So to clarify one more time, EAX does not change at all in this example; EAX still points to the same location! However, the data at that location has changed. So if EAX contains the location of the Int32 label, then Int32 now contains 9. So it’s Int32 that has changed, not EAX.
Notice that we use the parentheses to access the memory location stored in the register (dereference the pointer).
Moving a dereferenced value into a register
movl (%eax), %ebx
In effect the EBX register is now a pointer to the data held by EAX. Notice that to access the memory location of the register we’re again enclosing the register name in parentheses.
Moving strings in assembly
I can imagine that reading this you might be thinking, “hey, strings are just bytes of data so why can’t I just move them using the same instructions I just learned?” And the answers to that questions is you can! The problem is that strings are oftentimes much larger. A string might be 1 byte, 5 bytes, or 100 bytes. And none of mov instructions discussed above cover anything larger than 4 bytes. So let’s discuss the string operations that are available to alleviate the pains of copying large strings of data.
A key difference between the standard mov operations and the string series of movs, stos and lods operations is the number of operands. With mov, you specify the source and destination via 2 operands. However, with the movs instructions, the source and destination addresses are placed into the ESI and EDI registers respectively. And with stos and lods, the operations interact directly with the EAX register. This will become more clear with some examples.
The DF flag
DF stands for direction flag. This is a flag stored in the CPU that determines whether to increment or decrement a string’s memory address when string operations are called. When DF is 0 (cleared) the addresses are incremented. When DF is 1 (set) the addresses are decremented. In our examples the DF flag will always be cleared.
The usefulness of the DF flag will make more sense in the examples.
Clearing the DF flag
DF is set to 0. Addresses are incremented where applicable.
Setting the DF flag
DF is set to 1. Addresses are decremented where applicable.
In the example below, the following variables have been defined:
.data HelloWorldString: .asciz "Hello World of Assembly!" .bss .lcomm Destination, 100
movs: Moving a string from one memory location to another memory location
source: %esi; should contain a memory address where the data to be copied resides; the data at this address is not modified, but the address stored in the %esi register is incremented or decremented according to the DF flag destination: %edi; should contain a memory address where the data will be copied to; after copying, the address stored in the %edi register is incremented or decremented according to the DF flag
movsb: move a single byte
movsw: move 2 bytes
movsl: move 4 bytes
movl $HelloWorldString, %esi movl $Destination, %edi movsb movsw movsl
In this example, we first move the address of HelloWorldString into the ESI register (the source string). Then we move the address of Destination into EDI (the destination buffer).
When movsb is called, it tells the CPU to move 1 byte from the source to the destination, so the ‘H’ is copied to the first byte in the Destination label. However, that is not the only thing that happens during this operation. You may have noticed that I pointed out how the address stored in the %esi and %edi registers are both incremented or decremented according to the DF flag. Since the DF flag is cleared, both %esi and %edi are incremented by 1 byte.
But why is this useful? Well, what it means is that the next string operation to be called will start copying from the 2nd byte of the source string instead of the first byte. In other words, rather than copying the ‘H’ a second time, we’ll start by copying the ‘e’ in the HelloWorldString instead. This is what makes the movs series of operations far more useful than the mov operations when dealing with strings.
So, as you might imagine, when calling movsw the next 2 bytes are copied and Destination now holds “Hel”. And finally the movsl operation copies 4 bytes into Destination, which makes it “Hello W”.
Of course, the memory locations held in both %esi and %edi have now been incremented by 7 bytes each. So the final values are..
%esi: $HelloWorldString+7 %edi: $Destination+7 HelloWorldString: "Hello World of Assembly!" Destination: "Hello W"
lods: Moving a string from a memory location into the EAX register
source: %esi; should contain a memory address where the data to be copied resides; the data at this address is not modified, but the address stored in the %esi register is incremented or decremented according to the DF flag destination: %eax; the contents of this register are discarded because the data is copied directly into the register, NOT to any memory address residing in the register; no incrementing or decrementing occurs because the destination is a register and not a memory location
lodsb: move a single byte
lodsw: move 2 bytes
lodsl: move 4 bytes
stos: Moving a string from the EAX register to a memory location
source: %eax; the contents of this register are copied, NOT the contents of any memory address residing in the register; no incrementing or decrementing occurs because the source is a register and not a memory location destination: %edi; should contain a memory address where the data will be copied to; after copying, the address stored in the %edi register is incremented or decremented according to the DF flag
stosb: move a single byte
stosw: move 2 bytes
stosl: move 4 bytes
rep: Repeating an operation so you can move strings more easily
This will continue executing the movsb operation and decrementing the ECX register until it equals 0. So if you wanted to copy a string in its entirety, you could follow this pseudo-code:
* set ESI to the memory address of the source string * set EDI to the memory address of the destination string * set ECX to the length of the source string * clear the DF flag so ESI and EDI will be incremented for each call to movsb * call rep movsb
movl $HelloWorldString, %esi movl $DestinationUsingRep, %edi movl $25, %ecx # because HelloWorldString contains 24 characters + a null terminator cld rep movsb
Here we have movsb being called 25 times (the value of ECX). Because movsb increments both the ESI and EDI register you don’t have to concern yourself with the memory handling at all. So at the end of the example, the values are..
%esi: $HelloWorldString+25 %edi: $Destination+25 %ecx: 0 DF: 0 HelloWorldString: "Hello World of Assembly!" Destination: "Hello World of Assembly!"
More to Come
I hope you enjoyed reviewing data types and mov operations. Stay tuned for more assembly tips!