Assembly Lanuage Programming

  1. Registers
    1. General purpose AX, BX, CX, DX

    1. these are 16 bit registers
    2. can also be used as 2 ´ 8 bit registers e.g. most significant byte of CX is CH and least significant byte is CL
    3. these registers are mostly interchangeable but not always e.g. in the loop instruction, the count must be in CX
    1. Pointer and index regs SP, BP, SI, DI, IP

    1. SP stack pointer - commonly used for subroutine calls and last-in-first-out (LIFO) temporary storage
    2. BP base pointer - array access
    3. SI source index - string operations
    4. DI destination index - string operations
    5. IP instruction pointer - contains address of next instruction to be executed
    1. Flag register

    1. contains condition of various CPU flags which are affected by various logic and arithmetic operations
    2. C carry - carry after addition or borrow after subtraction

    1. Z zero - indicates if the result of an arithmetic or logical instruction is 0
    2. S sign - indicates if result was +ve or -ve
    3. O overflow - when signed numbers are used, indicates an overflow e.g. for an 8 bit add, 7fh (127) + 1 = 80h which is -128 if we are using signed arithmetic
    1. Segment registers CS, DS, SS, ES

    1. these behave differently in "real" and "protected" mode - we will only deal with "real" mode
    2. required so we can access >64K of memory using only 16 bit logical addresses

    1. CS code segment

    1. DS data segment, SS stack segment, ES extra segment

    1. different instructions default to using particular segments

  1. SEGMENT

  1. REGISTER

  1. CS

  1. IP

  1. SS

  1. SP or BP

  1. DS

  1. BX, SI, DI, 16 bit address

  1. ES

  1. DI for string instructions

  1. Addressing Modes
    1. Summary (also see Figure 2-2 in text which shows how these work)

    1. Register mov ax,bx
    2. Immediate mov bl,3ah
    3. Direct mov [1234h],ax
    4. Register indirect mov [bx],ax
    5. Base plus index mov [bx+si],ax
    6. Register relative mov [bx+4],ax
    7. Base relative plus index mov array[bx+si],ax
  1. Data movement
    1. mov - transfers between registers and/or memory

    1. gives examples of different ways of moving
    1. push/pop - stack operations

    1. these implement a last-in-first-out (LIFO) stack.

    1. writes ax to (sp-2)
    2. sp ß sp - 2

i. puts data at sp into ax

ii. sp ß sp + 2

Address Data before Data after

sp à SS:003ah xx xx

SS:0039h yy 34

SS:0038h zz sp à 12

SS:0037h

    1. when you push a 16 bit number in memory on Intel machines they are little-endian so least significant byte goes first e.g. suppose AX=1234h and we "push ax"
    2. stack operations are often used to save registers e.g. after this routine ax,bx,cx will be unchanged
    3. routine: push ax

      push bx

      push cx

      < code which uses ax,bx,cx >

      pop cx

      pop bx

      pop ax

      ret

    4. they are also used for subroutines (call/ret) the stack is used to store the return address during the call so when the ret is executed, it can return to the next instruction
    1. lea - load effective address

    1. loads a register with the address of the data
    2. lea si,data

    1. lea dx,[x][si]+100h
    1. xchg - exchange (swaps the src and dst)

    1. xchg ax,bx ; (tmp ß ax, ax ß bx, bx ß tmp)
    1. in/out - i/o instructions

    1. these read/write to/from a hardware port
    2. in al,300h ; input a byte from i/o port 300h
    3. out 300h,ax ; write ax to i/o port 300h
  1. Arithmetic and logic instructions
    1. add/sub

    1. add cx,1 ; cx ß cx + 1
    2. sub cx,[si+2] ; cx ß cx - (contents of address si+2)
    3. add byte ptr [di],3 ; add 3 to contents of byte at di

    1. the flag registers described earlier are affected by arithmetic and logical operations
    1. inc/dec - increment/decrement

    1. dec cx ; cx ß cx - 1
    2. we often use these for loops e.g. to loop 5 times with ax = 5,4,3,2,1 in the loop
    3. mov ax,5 ; load ax register with the count

      l1: call print ; ax must not be destroyed

      ; in this call

      dec ax ; update counter

      jnz l1 ; jump if not zero

    4. this is equivalent to the following C code
    5. for (a = 5; a != 0; a--)

      print();

    6. the "loop" instruction does the same thing but smaller and faster. However, you must use cx

mov cx,5

l1: call print ; assume cx not destroyed

loop l1

    1. cmp - compare

    1. this is a subtraction which doesn’t change the destination register. It is used for setting flag bits

jl l2 ; jumps if cl < 10 (cl not affected)

    1. mul/div - multiplication/division

    1. mul cl ; ax ß al * cl (unsigned)
    2. imul cl ; ax ß al * cl (signed)
    3. mul cx ; dx ax ß ax * cx (unsigned)
    4. imul word ptr [si] ; dx ax ß ax * [si] (signed)
    5. div cl ; ax / cl (quotient in al, remainder in ah)
    6. idiv si ; dx ax / si (quotient in ax, remainder in dx)
    1. Logical operations AND, OR, XOR, TEST, NOT, NEG

    1. operate as for other arithmetic operations
    2. the fastest way to clear a register (better than mov ax,0) is "xor ax,ax"
    3. NOT inverts all the bits
    4. NEG changes the sign of a twos-complement number
    5. TEST is an AND which doesn’t change the destination register - we can use it to check if a certain bit is set or not

test cx,4h

jnz bitset

    1. shl,sal,shr,sar,rol,rcl,ror,rcr - shift and rotate operations

    1. these are mostly used for arithmetic since we can multiply by powers of two shifting left, and divide by shifting right

shl ax,1

mov ax,bx

shl ax,2

add ax,bx ; now ax = 10 * bx

  1. String instructions
    1. lods, stos, movs, ins, outs

    1. these allow repetitive operations to be done in a single command
    2. the direction flag (D) is used to indicate autoincrement (D=0) or autodecrement (D=1)

    1. during string instructions, DI always uses ES. SI defaults to DS but this can be changed with a segment override
    2. lods - loads al or ax with a byte or word

    1. movs - move string (useful for copying)

    1. ins and outs used to in and out strings

; (dx is a port addr)

 

    1. The advantage of string instructions is that we can combine them with "rep"

les di, buffer ; loads es:di buffer addr

mov cx,5 ; we will use words - faster

mov ax,0

cld ; autoincrement mode

rep stosw ; does "stosw" cx(=5) times

    1. scas and cmps - string scan and string compare
  1. Program control instructions
    1. jumps

    1. these are use to implement loops, ifs, gotos etc
    2. unconditional jump

    1. we also need conditional jumps to implement loops,if etc

    1. different types of jumps

    1. if our code segment is smaller than 64K, we will not need far jumps and do not need to modify cs. Similarly, if our data segment is <64K, we do not need to change ds.
    2. for "call" there are near and far
    1. loop instructions

    1. we have already seen the loop command - there are also conditional loops
    1. set instructions (80386-80486)

    1. e.g. setc mem ; sets mem to 1 if carry set, otherwise 0
  1. PC I/O function calls (see Appendix A)
    1. DOS int 21h

    1. used to perform I/O to keyboard, disk, com ports, set date etc
    2. to use you just load the registers as specified and do an int 21h

mov ah,08h

int 21h ; returns with char in al

    1. the "int" performs a software interrupt which you can think of as a funny subroutine call (we will talk about interrupts in detail later)
    2. int only takes 2 bytes whereas a far call takes 5 bytes
    1. BIOS int 10h-17h

    1. the bios is sometimes used to perform more specific lower level i/o
  1. Using the assembler (tasm - which is mostly compatible with masm)
    1. Structure of a line of assembly language

    1. An assembly language instruction has 4 fields (optional ones are in brackets)

    1. the label gives us a way to refer to an address (try to choose meaningful names for your labels)
    2. the mnemonic is an 80x86 instruction or some assembler directive
    3. operands are those arguments of the instruction or directive
    4. comments make a program much easier to read
    1. Sample program (asm1.asm)

.MODEL SMALL

.STACK 200h

.DATA

 

Message db "This was printed using function 9 "

db "of the DOS interrupt 21h.$"

.CODE

START:

mov ax,seg Message ;moves SEGMENT of `Message' to AX

mov ds,ax ;moves ax into ds (ds=ax)

;you cannot do this -> mov ds,seg Message

 

mov dx,offset Message ;move OFFSET of `Message' to DX

mov ah,9 ;DOS Function 9, int 21h prints a string that

int 21h ;terminates with a "$". Requires FAR pointer to

;what is to be printed in DS:DX

 

mov ax,4c00h ;Returns control to DOS

int 21h ;MUST be here! Program will crash without it!

 

END START

 

    1. .MODEL SMALL - tells the assembler that our code<64K and our data<64K (together they can be 128K)
    2. .STACK 200h - defines stack segment to be 200h in size (strange things can happen if stack overflows)
    3. .DATA - this indicates the start of the data segment
    4. MESSAGE - this is a label for the data so we can reference it later. It gets translated to an address by the assembler
    5. db - this places the bytes corresponding to the message in consecutive addresses in memory, starting at MESSAGE. It allocates memory
    6. .CODE - start of code segment
    7. START: - a label which is the first address that gets executed (data doesn’t get executed)
    8. END START - exit point of the whole program (START) is the entry point
    1. Assembler directives

    1. these give instructions to the assembler but don’t generate machine code
    2. org - sets the beginning of the offset address

    1. db - define byte

    1. equ - is used to define a constant

buf1 db bufz dup (?)

buf2 db bufz dup (?)

    1. Macros

pushregs macro

push ax

push bx

push cx

push dx

push si

push di

endm

xx macro reg

local l1

push dx

l1: …

jmp l1

endm

    1. an example that shows all of this (produced using "tasm /l ex1")

2 0000 .MODEL SMALL

3 0000 .STACK 200h

4 =0064 bufz equ 100

5

6 pushregs macro

7 push ax

8 push bx

9 push cx

10 push dx

11 push si

12 push di

13 endm

14 ; macro with parameters

15 addmac macro x,y,res

16 push ax ; saves ax

17 mov ax,x

18 add ax,y

19 mov res,ax

20 pop ax

21 endm

22

23 0000 .DATA

24 org 200h

25 0200 A0 0A ?? 0A 48 65 6C+ db 0a0h, 1010b, ?, 10, "Hello", 0

26 6C 6F 00

27 020A 0001 0002 0003 dw 1, 2, 3

28 0210 64*(??) buf1 db bufz dup (?)

29 0274 64*(??) buf2 db bufz dup (?)

30

31 02D8 .CODE

32

33 0000 START: pushregs

1 34 0000 50 push ax

1 35 0001 53 push bx

1 36 0002 51 push cx

1 37 0003 52 push dx

1 38 0004 56 push si

1 39 0005 57 push di

40 addmac 2,4,bx

1 41 0006 50 push ax ; saves ax

1 42 0007 B8 0002 mov ax,2

1 43 000A 05 0004 add ax,4

1 44 000D 8B D8 mov bx,ax

1 45 000F 58 pop ax

46 END START

 

    1. proc

; this a procedure to print a block on the screen using

; registers to pass parameters (cursor position of where to

; print it and colour).

.model tiny

.code

org 100h

 

Start:

 

mov dh,4 ; row to print character on

mov dl,5 ; column to print character on

mov al,254 ; ascii value of block to display

mov bl,4 ; colour to display character

 

call PrintChar ; print our character

mov ax,4C00h ; terminate program

int 21h

 

PrintChar PROC NEAR

 

push bx ; save registers to be destroyed

push cx

 

xor bh,bh ; clear bh - video page 0

mov ah,2 ; function 2 - move cursor

int 10h ; row and col are already in dx

 

pop bx ; restore bx

 

xor bh,bh ; display page - 0

mov ah,9 ; function 09h write char & attrib

mov cx,1 ; display it once

int 10h ; call bios service

 

pop cx ; restore registers

ret ; return to where it was called

PrintChar ENDP

 

end Start

 

    1. Passing parameters to procedures

    1. the previous example passed parameters through registers
    2. passing through memory

 

.model tiny

.code

org 100h

 

Start:

 

mov Row,4 ; row to print character

mov Col,5 ; column to print character on

mov Char,254 ; ascii value of block to display

mov Colour,4 ; colour to display character

 

call PrintChar ; print our character

mov ax,4C00h ; terminate program

int 21h

 

PrintChar PROC NEAR

push ax cx bx ; save registers to be destroyed

xor bh,bh ; clear bh - video page 0

mov ah,2 ; function 2 - move cursor

mov dh,Row

mov dl,Col

int 10h ; call Bios service

mov al,Char

mov bl,Colour

xor bh,bh ; display page - 0

mov ah,9 ; function 09h write char & attrib

mov cx,1 ; display it once

int 10h ; call bios service

pop bx cx ax ; restore registers

ret ; return to where it was called

PrintChar ENDP

 

; variables to store data

Row db ?

Col db ?

Colour db ?

Char db ?

end Start

    1. Passing through the stack

.model tiny

.code

org 100h

 

Start:

mov dh,4 ; row to print string on

mov dl,5 ; column to print string on

mov al,254 ; ascii value of block to display

mov bl,4 ; colour to display character

push dx ax bx ; put parameters onto the stack

call PrintString ; print our string

pop bx ax dx ;restore registers

mov ax,4C00h ;terminate program

int 21h

 

PrintString PROC NEAR

push bp ; save bp

mov bp,sp ; put sp into bp

push cx ; save registers to be destroyed

xor bh,bh ; clear bh - video page 0

mov ah,2 ; function 2 - move cursor

mov dx,[bp+8] ; restore dx

int 10h ; call bios service

mov ax,[bp+6] ; character

mov bx,[bp+4] ; attribute

xor bh,bh ; display page - 0

mov ah,9 ; function 09h write char & attrib

mov cx,1 ; display it once

int 10h ; call bios service

pop cx ; restore registers

pop bp

ret ; return to where it was called

PrintString ENDP

 

end Start

    1. Linking

    1. in C we can separately compile files and then link them together

    1. this is why in assembler we separately assemble and link files

    1. extern - much like "external" in C, this tells assembler that a variable or procedure is defined in a different file

    1. public - in the file that you actually define a procedure or variable you need to declare that as being public

public var1

.data

dw var1 ; need an extrn var1:word to use this in the other file

    1. Unsigned numbers (revisited)

    1. recall that we use carry to detect errors in unsigned arithmetic (and the sign bit for signed arithmetic)
    2. thus for branching, we use ja, jae, jb, jbe, jc, jz etc for jumps on errors (all the ones that don’t rely on the S flag) and jl, jle, jg, jge etc for signed branches
    3. suppose we want to do multiprecision arithmetic i.e. 548fb9963ce7h + 3fcd4fa23b8dh with 8 byte precision, we add with carry (adc) which computes DST=DST+SRC +C so if there was overflow from the previous addition, it gets added to the next word
    4. similarly for subtraction, we use subtract with borrow (sbb) which computes DST=DST - SRC - C
  1. Interrupts
    1. Introduction

    1. we use interrupts to stop executing programs for a while and then return to exactly where we left off with the flag the same
    2. the interrupt vectors reside in a fixed table in memory so they are not linked with the program like subroutines
    3. we might use them, for example
    1. interrupts

    1. interrupt sequence:
    1. when we return from an interrupt service routine using "iret" the reverse is done and all flags, CS and IP are restored
    2. address for interrupt n is 0000:4*n since each interrupt vector is a far address and takes 4 bytes (32 bits)

    1. I bit - interrupt enable bit

    1. T bit is the trace bit

    1. the INTO instruction does a type 4 interrupt IF the overflow flag is set