CMOS VLSI DesignVerilog & MIPS0: Slide 1 Introduction to CMOS VLSI Design MIPS in Verilog Lecture 1 Lecture by Peter Kogge Fall 2009, 2010 University of Notre Dame Using slides by Jay Brockman Notre Dame 2008, and David Harris, Harvey Mudd College http://www.cmosvlsi.com/coursematerials.html CMOS VLSI DesignVerilog & MIPS0: Slide 2 MIPS Architecture Example: subset of MIPS processor architecture – Drawn from Patterson & Hennessy MIPS is a 32-bit architecture with 32 registers – Consider 8-bit subset using 8-bit datapath – Only implement 8 registers ($0 - $7) – $0 hardwired to 00000000 – 8-bit program counter David Harris has developed labs to implement – Uses Electric CAD tools – Illustrate the key concepts in VLSI design CMOS VLSI DesignVerilog & MIPS0: Slide 3 Instruction Set CMOS VLSI DesignVerilog & MIPS0: Slide 4 Instruction Encoding 32-bit instruction encoding – Requires four cycles to fetch on 8-bit datapath format example encoding R I J 0 ra rb rd 0 funct op op ra rb imm 6 6 6 65 5 5 5 5 5 16 26 add $rd, $ra, $rb beq $ra, $rb, imm j dest dest CMOS VLSI DesignVerilog & MIPS0: Slide 5 Fibonacci (C) f0 = 1; f-1 = -1 fn = fn-1 + fn-2 f = 1, 1, 2, 3, 5, 8, 13, … CMOS VLSI DesignVerilog & MIPS0: Slide 6 Fibonacci (Assembly) 1st statement: n = 8 How do we translate this to assembly? CMOS VLSI DesignVerilog & MIPS0: Slide 7 Fibonacci (Assembly) CMOS VLSI DesignVerilog & MIPS0: Slide 8 Fibonacci (Binary) 1st statement: addi $3, $0, 8 How do we translate this to machine language? – Hint: use instruction encodings below format example encoding R I J 0 ra rb rd 0 funct op op ra rb imm 6 6 6 65 5 5 5 5 5 16 26 add $rd, $ra, $rb beq $ra, $rb, imm j dest dest CMOS VLSI DesignVerilog & MIPS0: Slide 9 Fibonacci (Binary) Machine language program CMOS VLSI DesignVerilog & MIPS0: Slide 10 MIPS Microarchitecture Multicycle μarchitecture from Patterson & Hennessy PC M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Instruction [15: 11] M u x 0 1 M u x 0 1 1 Instruction [7: 0] Instruction [25 : 21] Instruction [20 : 16] Instruction [15 : 0] Instruction register ALU control ALU result ALU Zero Memory data register A B IorD MemRead MemWrite MemtoReg PCWriteCond PCWrite IRWrite[3:0] ALUOp ALUSrcB ALUSrcA RegDst PCSource RegWrite Control Outputs Op [5 : 0] Instruction [31:26] Instruction [5 : 0] M u x 0 2 Jump addressInstruction [5 : 0] 6 8Shift left 2 1 1 M u x 0 3 2 M u x 0 1 ALUOut Memory MemData Write data Address PCEn ALUControl CMOS VLSI DesignVerilog & MIPS0: Slide 11 Multicycle Controller PCWrite PCSource = 10 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 ALUSrcA =1 ALUSrcB = 00 ALUOp= 10 RegDst = 1 RegWrite MemtoReg = 0 MemWrite IorD = 1 MemRead IorD = 1 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 RegDst= 0 RegWrite MemtoReg =1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 MemRead ALUSrcA = 0 IorD = 0 IRWrite3 ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Instruction fetch Instruction decode/ register fetch Jump completion Branch completionExecution Memory address computation Memory access Memory access R-type completion Write-back step (Op = 'LB ') or (Op = 'SB ' ) (O p = R-ty pe) (O p = 'B EQ ') ( O p = ' J ' ) (Op = 'SB') ( O p = ' L B ' ) 7 0 4 121195 1086 Reset MemRead ALUSrcA = 0 IorD = 0 IRWrite2 ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 1 MemRead ALUSrcA = 0 IorD = 0 IRWrite1 ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 2 MemRead ALUSrcA = 0 IorD = 0 IRWrite0 ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 3 CMOS VLSI DesignVerilog & MIPS0: Slide 120: Introduction Logic Design Start at top level – Hierarchically decompose MIPS into units Top-level interface reset ph1 ph2 crystal oscillator 2-phase clock generator MIPS processor adr writedata memdata external memory memread memwrite 8 8 8 CMOS VLSI DesignVerilog & MIPS0: Slide 13 Block Diagram datapath controller alucontrol ph1 ph2 reset memdata[7:0] writedata[7:0] adr[7:0] memread memwrite op[5:0] zero pcen regw rite irw rite[3:0] m em toreg iord pcsource[1:0] alusrcb[1:0] alusrca aluop[1:0] regdst funct[5:0] alucontrol[2:0] PC M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Instruction [15: 11] M u x 0 1 M u x 0 1 1 Instruction [7: 0] Instruction [25 : 21] Instruction [20 : 16] Instruction [15 : 0] Instruction register ALU control ALU result ALU Zero Memory data register A B IorD MemRead MemWrite MemtoReg PCWriteCond PCWrite IRWrite[3:0] ALUOp ALUSrcB ALUSrcA RegDst PCSource RegWrite Control Outputs Op [5 : 0] Instruction [31:26] Instruction [5 : 0] M u x 0 2 Jump addressInstruction [5 : 0] 6 8Shift left 2 1 1 M u x 0 3 2 M u x 0 1 ALUOut Memory MemData Write data Address PCEn ALUControl CMOS VLSI DesignVerilog & MIPS0: Slide 14 Hierarchical Design mips controller alucontrol datapath standard cell library bitslice zipper alu and2 flopinv4x mux2 mux4 ramslice fulladder nand2nor2 or2 inv tri CMOS VLSI DesignVerilog & MIPS0: Slide 15 Physical Design Floorplan Standard cells – Place & route Datapaths – Slice planning Area estimation CMOS VLSI DesignVerilog & MIPS0: Slide 16 MIPS Floorplan datapath 2700 λ x 1050 λ (2.8 Mλ2) alucontrol 200 λ x 100 λ (20 kλ2) zipper 2700 λ x 250 λ 2700 λ 1690 λ wiring channel: 30 tracks = 240 λ mips (4.6 Mλ2) bitslice 2700 λ x 100 λ control 1500 λ x 400 λ (0.6 Mλ2) 3500 λ 3500 λ 5000λ 5000 λ 10 I/O pads 10 I/O pads 10 I/O pads 10 I/O pads CMOS VLSI DesignVerilog & MIPS0: Slide 17 MIPS Layout CMOS VLSI DesignVerilog & MIPS0: Slide 18 Standard Cells Uniform cell height Uniform well height M1 VDD and GND rails M2 Access to I/Os Well / substrate taps Exploits regularity CMOS VLSI DesignVerilog & MIPS0: Slide 19 Synthesized Controller Synthesize HDL into gate-level netlist Place & Route using standard cell library CMOS VLSI DesignVerilog & MIPS0: Slide 20 MIPS Datapath Multicycle μarchitecture from Patterson & Hennessy PC M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 Instruction [15: 11] M u x 0 1 M u x 0 1 1 Instruction [7: 0] Instruction [25 : 21] Instruction [20 : 16] Instruction [15 : 0] Instruction register ALU control ALU result ALU Zero Memory data register A B IorD MemRead MemWrite MemtoReg PCWriteCond PCWrite IRWrite[3:0] ALUOp ALUSrcB ALUSrcA RegDst PCSource RegWrite Control Outputs Op [5 : 0] Instruction [31:26] Instruction [5 : 0] M u x 0 2 Jump addressInstruction [5 : 0] 6 8Shift left 2 1 1 M u x 0 3 2 M u x 0 1 ALUOut Memory MemData Write data Address PCEn ALUControl CMOS VLSI DesignVerilog & MIPS0: Slide 21 Slice Plans Slice plan for bitslice – Cell ordering, dimensions, wiring tracks – Arrange cells for wiring locality CMOS VLSI DesignVerilog & MIPS0: Slide 22 Pitch Matching Synthesized controller area is mostly wires – Design is smaller if wires run through/over cells – Smaller = faster, lower power as well! Design snap-together cells for datapaths and arrays – Plan wires into cells – Connect by abutment • Exploits locality • Takes lots of effort A A A A A A A A A A A A A A A A B B B B C C D CMOS VLSI DesignVerilog & MIPS0: Slide 23 MIPS Datapath 8-bit datapath built from 8 bitslices (regularity) Zipper at top drives control signals to datapath CMOS VLSI DesignVerilog & MIPS0: Slide 24 MIPS ALU Arithmetic / Logic Unit is part of bitslice CMOS VLSI DesignVerilog & MIPS0: Slide 25 Area Estimation Need area estimates to make floorplan – Compare to another block you already designed – Or estimate from transistor counts – Budget room for large wiring tracks – Your mileage may vary! CMOS VLSI DesignVerilog & MIPS0: Slide 26 Design Verification Fabrication is slow & expensive – MOSIS 0.6μm: $1000, 3 months – State of art: $1M, 1 month Debugging chips is very hard – Limited visibility into operation Prove design is right before building! – Logic simulation – Ckt. simulation / formal verification – Layout vs. schematic comparison – Design & electrical rule checks Verification is > 50% of effort on most chips! Specification Architecture Design Logic Design Circuit Design Physical Design = = = = Function Function Function Function Timing Power CMOS VLSI DesignVerilog & MIPS0: Slide 27 Fabrication & Packaging Tapeout final layout Fabrication – 6, 8, 12” wafers – Optimized for throughput, not latency (10 weeks!) – Cut into individual dice Packaging – Bond gold wires from die I/O pads to package CMOS VLSI DesignVerilog & MIPS0: Slide 28 Testing Test that chip operates – Design errors – Manufacturing errors A single dust particle or wafer defect kills a die – Yields from 90% to < 10% – Depends on die size, maturity of process – Test each part before shipping to customer