# CENG 3420 Computer Organization and Design

# **Lecture 05: Arithmetic and Logic Unit**

Bei Yu



香港中文大學 The Chinese University of Hong Kong

CEG3420 L05.1 Spring 2016

# **Abstract Implementation View**



CEG3420 L05.2 Spring 2016

#### **Arithmetic**

- Where we've been
  - Abstractions
    - Instruction Set Architecture (ISA)
    - Assembly and machine language
- What's up ahead
  - Implementing the ALU architecture



CEG3420 L05.3 Spring 2016

#### **Review: VHDL**

- Supports design, documentation, simulation & verification, and synthesis of hardware
- Allows integrated design at behavioral & structural levels



CEG3420 L05.4 Spring 2016

#### **Review: VHDL**

- Basic structure
  - Design entity-architecture descriptions
  - Time-based execution (discrete event simulation) model



CEG3420 L05.5 Spring 2016

# **Review: Entity-Architecture Features**

- Entity defines externally visible characteristics
  - Ports: channels of communication
    - signal names for inputs, outputs, clocks, control
  - Generic parameters: define class of components
    - timing characteristics, size (fan-in), fan-out
- □ Architecture defines the internal behavior or structure of the circuit
  - Declaration of internal signals
  - Description of behavior
    - collection of Concurrent Signal Assignment (CSA) statements (indicated by <=); can also model temporal behavior with the delay annotation
    - one or more processes containing CSAs and (sequential)
       variable assignment statements (indicated by :=)
  - Description of structure
    - interconnections of components; underlying behavioral models of each component must be specified

CEG3420 L05.6 Spring 2016

# **ALU VHDL Representation**

```
entity ALU is
  port(A, B: in std logic vector (31 downto 0);
          m: in std logic vector (3 downto 0);
          result: out std logic vector (31 downto 0);
          zero: out std logic;
          ovf: out std logic)
end ALU;
architecture process behavior of ALU is
begin
   ALU: process(A, B, m)
   begin
       result := A + B;
   end process ALU;
end process behavior;
```

CEG3420 L05.7 Spring 2016

## **Machine Number Representation**

- Bits are just bits (have no inherent meaning)
  - conventions define the relationships between bits and numbers
- □ Binary numbers (base 2) integers

  0000 -> 0001 -> 0010 -> 0011 -> 0100 -> 0101 -> . . . .
  - in decimal from 0 to 2<sup>n</sup>-1 for n bits
- Of course, it gets more complicated
  - storage locations (e.g., register file words) are finite, so have to worry about overflow (i.e., when the number is too big to fit into 32 bits)
  - have to be able to represent negative numbers, e.g., how do we specify -8 in

```
addi \$sp, \$sp, -8 \#\$sp = \$sp - 8
```

 in real systems have to provide for more than just integers, e.g., fractions and real numbers (and floating point) and alphanumeric (characters)

CEG3420 L05.8 Spring 2016

## **MIPS Representations**

□ 32-bit signed numbers (2's complement):

```
0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 \ 0000 = 0_{ten}
0000 0000 0000 0000 0000 0000 0001<sub>two</sub> = + 1<sub>ten</sub>
                                                           maxint
1111 \ 1111_{\text{two}} = + 2,147,483,647_{\text{ten}}
    1111 1111 1111 1111 1111
                0000 0000 0000 0000 0000_{\text{two}} = -2,147,483,648_{\text{ten}}
1000
    0000 0000
1000 0000 0000 0000 0000 0000 0001<sub>two</sub> = -2,147, 483,647_{ten}
                           0000 \ 0000 \ 0010_{\text{two}} = -2,147,483,646_{\text{ten}}
1000 0000 0000 0000 0000
                                                           minint
1111 1111 1111 1111 1111 1111 1111 1101_{two} = -3_{ten}
                                1111 \ 1110_{\text{two}} = - 2_{\text{ten}}
          1111
                1111 1111 1111
                1111 \ 1111 \ 1111 \ 1111 \ 1111_{two} = -1_{ten}
```

- What if the bit string represented addresses?
  - need operations that also deal with only positive (unsigned) integers

CEG3420 L05.9 Spring 2016

# **Two's Complement Operations**

- Negating a two's complement number complement all the bits and then add a 1
  - remember: "negate" and "invert" are quite different!
- Converting n-bit numbers into numbers with more than n bits:
  - MIPS 16-bit immediate gets converted to 32 bits for arithmetic
  - sign extend copy the most significant bit (the sign bit) into the other bits

```
0010 -> 0000 0010
1010 -> 1111 1010
```

sign extension versus zero extend (1b vs. 1bu)

CEG3420 L05.10 Spring 2016

## **Design the MIPS Arithmetic Logic Unit (ALU)**

zero ovf

ALU

Must support the Arithmetic/Logic operations of the ISA

```
add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt
```

and, andi, nor, or, ori, xor, xori m (operation) beq, bne, slt, slti, sltiu, sltu

- With special handling for
  - sign extend addi, addiu, slti, sltiu
  - zero extend andi, ori, xori
  - Overflow detected add, addi, sub

CEG3420 L05.11 Spring 2016

# **MIPS Arithmetic and Logic Instructions**



| Type  | ор     | funct |
|-------|--------|-------|
| ADDI  | 001000 | XX    |
| ADDIU | 001001 | XX    |
| SLTI  | 001010 | XX    |
| SLTIU | 001011 | XX    |
| ANDI  | 001100 | XX    |
| ORI   | 001101 | XX    |
| XORI  | 001110 | XX    |
| LUI   | 001111 | XX    |

| Type | ор     | funct  |
|------|--------|--------|
| ADD  | 000000 | 100000 |
| ADDU | 000000 | 100001 |
| SUB  | 000000 | 100010 |
| SUBU | 000000 | 100011 |
| AND  | 000000 | 100100 |
| OR   | 000000 | 100101 |
| XOR  | 000000 | 100110 |
| NOR  | 000000 | 100111 |

| Type | ор     | funct  |
|------|--------|--------|
|      | 000000 | 101000 |
|      | 000000 | 101001 |
| SLT  | 000000 | 101010 |
| SLTU | 000000 | 101011 |
|      | 000000 | 101100 |

CEG3420 L05.12 Spring 2016

## **Design Trick: Divide & Conquer**

- Break the problem into simpler problems, solve them and glue together the solution
- Example: assume the immediates have been taken care of before the ALU
  - now down to 10 operations
  - can encode in 4 bits



| 0 | add  |
|---|------|
| 1 | addu |
| 2 | sub  |
| 3 | subu |
| 4 | and  |
| 5 | or   |
| 6 | xor  |
| 7 | nor  |
| а | slt  |
| b | sltu |

CEG3420 L05.13 Spring 2016

## **Addition & Subtraction**

Just like in grade school (carry/borrow 1s)

$$\begin{array}{r} 0111 \\ + 0110 \\ \hline 1101 \end{array}$$

- Two's complement operations are easy
  - do subtraction by negating and then adding

- Overflow (result too large for finite computer word)
  - e.g., adding two n-bit numbers does not yield an n-bit number

## **Building a 1-bit Binary Adder**



| Α | В | carry_in | carry_out | S |
|---|---|----------|-----------|---|
| 0 | 0 | 0        | 0         | 0 |
| 0 | 0 | 1        | 0         | 1 |
| 0 | 1 | 0        | 0         | 1 |
| 0 | 1 | 1        | 1         | 0 |
| 1 | 0 | 0        | 0         | 1 |
| 1 | 0 | 1        | 1         | 0 |
| 1 | 1 | 0        | 1         | 0 |
| 1 | 1 | 1        | 1         | 1 |

- How can we use it to build a 32-bit adder?
- How can we modify it easily to build an adder/subtractor?

CEG3420 L05.15 Spring 2016

## **Building 32-bit Adder**



■ Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .

- □ Ripple Carry Adder (RCA)
  - advantage: simple logic, so small (low cost)
  - disadvantage: slow and lots of glitching (so lots of energy consumption)

CEG3420 L05.16 Spring 2016

#### **Glitch**

- Glitch: invalid and unpredicted output that can be read by the next stage and result in a wrong action
- Example: Draw the propagation delay



CEG3420 L05.17 Spring 2016

# Glitch in RCA



| Α | В | carry_in | carry_out | S |
|---|---|----------|-----------|---|
| 0 | 0 | 0        | 0         | 0 |
| 0 | 0 | 1        | 0         | 1 |
| 0 | 1 | 0        | 0         | 1 |
| 0 | 1 | 1        | 1         | 0 |
| 1 | 0 | 0        | 0         | 1 |
| 1 | 0 | 1        | 1         | 0 |
| 1 | 1 | 0        | 1         | 0 |
| 1 | 1 | 1        | 1         | 1 |

CEG3420 L05.18 Spring 2016

## **But What about Performance?**

Critical path of n-bit ripple-carry adder is n\*CP



Design trick – throw hardware at it (Carry Lookahead)

CEG3420 L05.19 Spring 2016

# A 32-bit Ripple Carry Adder/Subtractor

- Remember 2's complement is just
  - complement all the bits

control  

$$(0=add,1=sub)$$
  $B_0$  if control = 0  
 $B_0$  if control = 1

 add a 1 in the least significant bit



CEG3420 L05.20 Spring 2016

## Minimal Implementation of a Full Adder

□ Gate library: inverters, 2-input nands, or-and-inverters

```
architecture concurrent behavior of full adder is
   signal t1, t2, t3, t4, t5: std logic;
begin
      t1 <= not A after 1 ns;
      t2 <= not cin after 1 ns;
      t4 <= not((A or cin) and B) after 2 ns;
      t3 \le not((t1 \text{ or } t2) \text{ and } (A \text{ or } cin)) \text{ after } 2 \text{ ns};
      t5 <= t3 nand B after 2 ns;
      S <= not((B or t3) and t5) after 2 ns;</pre>
      cout <= not((t1 or t2) and t4) after 2 ns;</pre>
end concurrent behavior;
```

[Optional] Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?

CEG3420 L05.21 Spring 2016

## Tailoring the ALU to the MIPS ISA

- Also need to support the logic operations (and, nor, or, xor)
  - Bit wise operations (no carry operation involved)
  - Need a logic gate for each function and a mux to choose the output
- Also need to support the set-on-less-than instruction (slt)
  - Uses subtraction to determine if (a b) < 0 (implies a < b)
- Also need to support test for equality (bne, beq)
  - Again use subtraction: (a b) = 0 implies a = b
- Also need to add overflow detection hardware
  - overflow detection enabled only for add, addi, sub
- Immediates are sign extended outside the ALU with wiring (i.e., no logic needed)

CEG3420 L05.23 Spring 2016

# A Simple ALU Cell with Logic Op Support



CEG3420 L05.24 Spring 2016

# Modifying the ALU Cell for slt



CEG3420 L05.25 Spring 2016

# Modifying the ALU for slt

- First perform a subtraction
- Make the result 1 if the subtraction yields a negative result
- Make the result 0 if the subtraction yields a positive result
  - tie the most significant sum bit (sign bit) to the low order less input



#### **Overflow Detection**

- Overflow occurs when the result is too large to represent in the number of bits allocated
  - adding two positives yields a negative
  - or, adding two negatives gives a positive
  - or, subtract a negative from a positive gives a negative
  - or, subtract a positive from a negative gives a positive
- On your own: Prove you can detect overflow by:
  - Carry into MSB xor Carry out of MSB





CEG3420 L05.27 Spring 2016



#### **Overflow Detection and Effects**

- On overflow, an exception (interrupt) occurs
  - Control jumps to predefined address for exception
  - Interrupted address (address of instruction causing the overflow) is saved for possible resumption
- Don't always want to detect (interrupt on) overflow

CEG3420 L05.29 Spring 2016

## **New MIPS Instructions**

| Category                  | Instr                         | Op Code  | Example           | Meaning                                |
|---------------------------|-------------------------------|----------|-------------------|----------------------------------------|
| Arithmetic                | add unsigned                  | 0 and 21 | addu \$s1, \$s2,  | \$s3  \$s1 = \$s2 + \$s3               |
| (R & I                    | sub unsigned                  | 0 and 23 | subu \$s1, \$s2,  | \$s3                                   |
| format)                   | add<br>imm.unsigned           | 9        | addiu \$s1, \$s2, | 6 \$s1 = \$s2 + 6                      |
| Data<br>Transfer          | ld byte<br>unsigned           | 24       | lbu \$s1, 20(\$   | \$s2)                                  |
|                           | ld half unsigned              | 25       | lhu \$s1, 20(\$   | (\$s2)   \$s1 = Mem(\$s2+20)           |
| Cond.<br>Branch<br>(I & R | set on less than unsigned     | 0 and 2b | sltu \$s1, \$s2,  | \$s3 if (\$s2<\$s3) \$s1=1 else \$s1=0 |
| format)                   | set on less than imm unsigned | b        | sltiu \$s1, \$s2, | 6 if (\$s2<6) \$s1=1 else \$s1=0       |

- □ Sign extend addi, addiu, slti
- □ Zero extend andi, ori, xori
- Overflow detected add, addi, sub

CEG3420 L05.30 Spring 2016

## **Multiplication**

- More complicated than addition
  - Can be accomplished via shifting and adding

$$\begin{array}{c} 0010 & \text{(multiplicand)} \\ x\_1011 & \text{(multiplier)} \\ \hline 0010 & \text{(partial product)} \\ 0000 & \text{array)} \\ \hline 00010110 & \text{(product)} \\ \end{array}$$

- Double precision product produced
- More time and more area to compute

CEG3420 L05.31 Spring 2016

# Add and Right Shift Multiplier Hardware



CEG3420 L05.32 Spring 2016

# Add and Right Shift Multiplier Hardware



CEG3420 L05.33 Spring 2016

## **MIPS Multiply Instruction**

Multiply (mult and multu) produces a double precision product

mult \$s0, \$s1 # hi||lo = \$s0 \* \$s1

0 16 17 0 0 0x18

- Low-order word of the product is left in processor register
   lo and the high-order word is left in register
- Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file
- Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders

CEG3420 L05.34 Spring 2016

#### **Division**

Division is just a bunch of quotient digit guesses and left shifts and subtracts



CEG3420 L05.35 Spring 2016

# **Example: Division**

Dividing 1001010 by 1000

CEG3420 L05.36 Spring 2016

#### **MIPS Divide Instruction**

Divide generates the reminder in hi and the quotient in lo

- Instructions mflo rd and mfhi rd are provided to move the quotient and reminder to (user accessible) registers in the register file
- As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

CEG3420 L05.37 Spring 2016

## **Shift Operations**

Shifts move all the bits in a word left or right

```
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits op rs rt rd shamt funct
```

- Notice that a 5-bit shamt field is enough to shift a 32-bit value 2<sup>5</sup> − 1 or 31 bit positions
- Logical shifts fill with zeros, arithmetic left shifts fill with the sign bit
- The shift operation is implemented by hardware separate from the ALU
  - using a barrel shifter (which would takes lots of gates in discrete logic, but is pretty easy to implement in VLSI)

CEG3420 L05.38 Spring 2016

# **A Simple Shifter**



CEG3420 L05.39 Spring 2016

## Parallel Programmable Shifters



CEG3420 L05.40 Spring 2016

## **Logarithmic Shifter Structure**



CEG3420 L05.41 Spring 2016

# **Logarithmic Shifter Structure**



CEG3420 L05.42 Spring 2016

## Representing Big (and Small) Numbers

What if we want to encode the approx. age of the earth?

$$4,600,000,000$$
 or  $4.6 \times 10^9$ 

There is no way we can encode either of the above in a 32-bit integer.

- □ Floating point representation (-1)<sup>sign</sup> x F x 2<sup>E</sup>
  - Still have to fit everything in 32 bits (single precision)

| s E   | (exponent) | F (     | (fraction) |
|-------|------------|---------|------------|
| 1 bit | 8 bits     | 23 bits |            |

- The base (2, not 10) is hardwired in the design of the FPALU
- More bits in the fraction (F) or the exponent (E) is a trade-off between precision (accuracy of the number) and range (size of the number)

CEG3420 L05.43 Spring 2016

# **Exception Events in Floating Point**

- Overflow (floating point) happens when a positive exponent becomes too large to fit in the exponent field
- Underflow (floating point) happens when a negative exponent becomes too large to fit in the exponent field
- □One way to reduce the chance of underflow or overflow is to offer another format that has a larger exponent field
  - Double precision takes two MIPS words



CEG3420 L05.44 Spring 2016

#### **IEEE 754 FP Standard**

- Most (all?) computers these days conform to the IEEE 754 floating point standard (-1)<sup>sign</sup> x (1+F) x 2<sup>E-bias</sup>
  - Formats for both single and double precision
  - F is stored in normalized format where the msb in F is 1 (so there is no need to store it!) called the hidden bit
  - To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation where the bias is -127 (-1023 for double precision) so the most negative is  $00000001 = 2^{1-127} = 2^{-126}$  and the most positive is  $11111110 = 2^{254-127} = 2^{+127}$
- Examples (in normalized format)

  - Largest+: 0 11111110 1.11111111111111111111111 = 2-2-23 x 2254-127

## Wrap-Up

- We can build an ALU to support the MIPS ISA
  - we can efficiently perform subtraction using two's complement
  - we can replicate a 1-bit ALU to produce a 32-bit ALU
- Important points about hardware
  - all of the gates are always working (concurrent)
  - the speed of a gate is affected by the number of inputs to the gate (fan-in) and the number of gates that the output is connected to (fan-out)
  - the speed of a circuit is affected by the speed of and number of gates in series (on the "critical path" or the "number of levels of logic") and the length of wires interconnecting the gates
- Our primary focus is comprehension, however,
  - clever changes to organization can improve performance (similar to using better algorithms in software)

CEG3420 L05.46 Spring 2016