2/6/02 CSE 141 - Single Cycle Datapath The Single Cycle Datapath Registers Register # Data Register # Data memory Address Data Register # PC Instruction ALU Instruction memory Address Note: Some of the material in this lecture are COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGH RESERVED. Figures may be reproduced only for classroom or personal education use in conjunction with our text and only when the above line is included. CSE 141 - Single Cycle Datapath2 The Performance Big Picture • Execution Time = Insts * CPI * Cycle Time • Processor design (datapath and control) will determine: – Clock cycle time – Clock cycles per instruction • Starting today: – Single cycle processor: • Advantage: CPI = 1 • Disadvantage: long cycle time Execute an entire instruction CSE 141 - Single Cycle Datapath3 • We're ready to implement the MIPS “core” – load-store instructions: lw, sw – reg-reg instructions: add, sub, and, or, slt – control flow instructions: beq • First, we need to fetch an instruction into processor – program counter (PC) supplies instruction address – get the instruction from memory Processor Design Clk Data In Write Enable 32 32 DataOut Address PC CSE 141 - Single Cycle Datapath4 • We're ready to implement the MIPS “core” – load-store instructions: lw, sw – reg-reg instructions: add, sub, and, or, slt – control flow instructions: beq • First, we need to fetch an instruction into processor – program counter (PC) supplies instruction address – get the instruction from memory Processor Design Clk Data In Write Enable 32 32 DataOut Address PC 0 instruction appears here CSE 141 - Single Cycle Datapath5 That was too easy • A problem – how will we do a load or store? – remember that memory has only 1 port – and we want to do everything in 1 cycle Clk Data In Write Enable 32 32 DataOut Address PC 0 instruction appears here CSE 141 - Single Cycle Datapath6 Instruction & Data in same cycle? Solution: separate data and instruction memory There will be only one DRAM memory We want a stored program architecture How else can you compile and then run a program?? But we can have separate SRAM caches (We’ll study caches later) Clk Data In Write Enable 32 32 DataOut Address PC instruction appears here Instruction cache address Data Cache CSE 141 - Single Cycle Datapath7 Instruction Fetch Unit Updating the PC for next instruction – Sequential Code: PC <- PC + 4 – Branch and Jump: PC <- “something else” • we’ll worry about these later PC Instruction memory Read address Instruction 4 Add CSE 141 - Single Cycle Datapath8 The MIPS core subset • R-type – add rd, rs, rt – sub, and, or, slt • LOAD and STORE – lw rt, rs, imm – sw rt, rs, imm • BRANCH: – beq rs, rt, imm op rs rt rd shamt funct 061116212631 6 bits 6 bits5 bits5 bits5 bits5 bits op rs rt immediate 016212631 6 bits 16 bits5 bits5 bits op rs rt displacement 016212631 6 bits 16 bits5 bits5 bits 1. Read registers rs and rt 2. Feed them to ALU 3. Update register file 1. Read register rs (and rt for store) 2. Feed rs and immed to ALU 3. Move data between mem and reg 1. Read registers rs and rt 2. Feed to ALU to compare 3. Add PC to disp; update PC CSE 141 - Single Cycle Datapath9 • Generic Implementation: – all instruction read some registers – all instructions use the ALU after reading registers – memory accessed & registers updated after ALU • Suggests basic design: Processor Design Registers Register # Data Register # Data memory Address Data Register # PC Instruction ALU Instruction memory Address CSE 141 - Single Cycle Datapath10 Datapath for Reg-Reg Operations • R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt – Ra, Rb, and Rw come from rs, rt, and rd fields – ALUoperation signal depends on op and funct op rs rt rd shamt funct 061116212631 6 bits 6 bits5 bits5 bits5 bits5 bits Instruction Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Zero RegWrite ALU operation3 CSE 141 - Single Cycle Datapath11 Datapath for Load Operations R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 op rs rt immediate 016212631 6 bits 16 bits5 bits5 bits Instruction 16 32 Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Data memory Write data Read data Write data Sign extend ALU result Zero ALU Address MemRead MemWrite RegWrite ALU operation3 CSE 141 - Single Cycle Datapath12 Datapath for Store Operations Mem[R[rs] + SignExt[imm16]] <- R[rt] Example: sw rt, rs, imm16 op rs rt immediate 016212631 6 bits 16 bits5 bits5 bits Instruction 16 32 Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Data memory Write data Read data Write data Sign extend ALU result Zero ALU Address MemRead MemWrite RegWrite ALU operation3 CSE 141 - Single Cycle Datapath13 Combining datapaths • How do we allow different datapaths for different instructions?? Instruction Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Zero RegWrite ALU operation3 Instruction 16 32 Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Data memory Write data Read data Write data Sign extend ALU result Zero ALU Address MemRead MemWrite RegWrite ALU operation3 R-type Store CSE 141 - Single Cycle Datapath14 Combining datapaths • How do we allow different datapaths for different instructions?? • Use a multiplexor! Instruction Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data ALU result ALU Zero RegWrite ALU operation3 Instruction 16 32 Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Data memory Write data Read data Write data Sign extend ALU result Zero ALU Address MemRead MemWrite RegWrite ALU operation3 Instruction 16 32 Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Data memory Write data Read data Write data Sign extend ALU result Zero ALU Address MemRead MemWrite RegWrite ALU operation3 ALUscr CSE 141 - Single Cycle Datapath15 Datapath for Branch Operations beq rs, rt, imm16 We need to compare Rs and Rt op rs rt immediate 016212631 6 bits 16 bits5 bits5 bits 16 32 Sign extend ZeroALU Sum Shift left 2 To branch control logic Branch target PC + 4 from instruction datapath Instruction Add Registers Write register Read data 1 Read data 2 Read register 1 Read register 2 Write data RegWrite ALU operation 3 CSE 141 - Single Cycle Datapath16 Computing the Next Address • PC is a 32-bit byte address into the instruction memory: – Sequential operation: PC<31:0> = PC<31:0> + 4 – Branch: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4 • We don’t need the 2 least-significant bits because: – The 32-bit PC is a byte address – And all our instructions are 4 bytes (32 bits) long – The 2 LSB's of the 32-bit PC are always zeros CSE 141 - Single Cycle Datapath17 All together: the single cycle datapath MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite 4 16 32Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 0 M u x 1 0 M u x 1 0 M u x 1 Instruction [15–11] ALU control Shift left 2 PCSrc ALU Add ALUresult CSE 141 - Single Cycle Datapath18 The R-Format (e.g. add) Datapath MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite 4 16 32Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 0 M u x 1 0 M u x 1 0 M u x 1 Instruction [15–11] ALU control Shift left 2 PCSrc ALU Add ALUresult Need ALUsrc=1, ALUop=“add”, MemWrite=0, MemToReg=0, RegDst = 0, RegWrite=1 and PCsrc=1. CSE 141 - Single Cycle Datapath19 The Load Datapath MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite 4 16 32Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 0 M u x 1 0 M u x 1 0 M u x 1 Instruction [15–11] ALU control Shift left 2 PCSrc ALU Add ALUresult What control signals do we need for load?? CSE 141 - Single Cycle Datapath20 The Store Datapath MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite 4 16 32Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 0 M u x 1 0 M u x 1 0 M u x 1 Instruction [15–11] ALU control Shift left 2 PCSrc ALU Add ALUresult CSE 141 - Single Cycle Datapath21 The beq Datapath MemtoReg MemRead MemWrite ALUOp ALUSrc RegDst PC Instruction memory Read address Instruction [31–0] Instruction [20–16] Instruction [25–21] Add Instruction [5–0] RegWrite 4 16 32Instruction [15–0] 0 Registers Write register Write data Write data Read data 1 Read data 2 Read register 1 Read register 2 Sign extend ALU result Zero Data memory Address Read data M u x 1 0 M u x 1 0 M u x 1 0 M u x 1 Instruction [15–11] ALU control Shift left 2 PCSrc ALU Add ALUresult CSE 141 - Single Cycle Datapath22 Key Points • CPU is just a collection of state and combinational logic • We just designed a very rich processor, at least in terms of functionality • Execution time = Insts * CPI * Cycle Time – where does the single-cycle machine fit in? CSE 141 - Single Cycle Datapath23 Computer of the Day • The IBM 1620 (1959) – A 2nd generation computer: transistors & core storage (First generation ones used tubes and delay-based memory) – Example of creative architecture – ~ 2000 built. Relatively inexpensive ( < $1620/month rental) • A decimal computer – 6 bits per digit or character – 4 bits, flag (for +/- and end-of-word), ECC – Variable-length data – fields terminated by flag • Arithmetic by table lookup! • Codenamed CADET – “Can’t Add, Doesn’t Even Try”