May 9, USB 2.0 High Bandwidth Peripheral Design Challenges Robert Shaw Cypress Semiconductor Robert Shaw Cypress Semiconductor
May 9, USB 2.0 in a Nutshell w Runs 40X faster than USB 1.1 – Low speed: 1.5Mb/s – Full speed: 12Mb/s – High speed: 480Mb/s w Fully supports existing USB devices – Forward compatible—plug existing 1.1 devices into new 2.0 hosts – Backward compatible—plug new 2.0 devices into existing 1.1 hosts w Uses the same cables as USB 1.1 w Runs 40X faster than USB 1.1 – Low speed: 1.5Mb/s – Full speed: 12Mb/s – High speed: 480Mb/s w Fully supports existing USB devices – Forward compatible—plug existing 1.1 devices into new 2.0 hosts – Backward compatible—plug new 2.0 devices into existing 1.1 hosts w Uses the same cables as USB 1.1
May 9, S O F S O F USB 2.0 Bandwidth S O F S O F S O F S O F S O F S O F S O F usec 1 msec S O F S O F S O F S O F ISO INT ISO INT BULK 512 CTL
May 9, Packet Sizes Control Bulk Interrupt Isochronous Control Bulk Interrupt Isochronous 8, 16, 32, 64 1– , 16, 32, 64 1– USB 1.1 USB 2.0 Transfer Type Packet Size
May 9, w w USB 2.0 – 13 Bulk packets per microframe max – 13 * 512 * 8 * 1000 = 53 MB/s w w USB 2.0 – 13 Bulk packets per microframe max – 13 * 512 * 8 * 1000 = 53 MB/s Bandwidth Example w ATA Hard Drive 7200 RPM, 2Mbyte Internal Buffer – Transfer rate, Interface: up to 100MB/s – Transfer rate, Media: up to 57 MB/s – Typical system transfer rates 39 MB/s w ATA Hard Drive 7200 RPM, 2Mbyte Internal Buffer – Transfer rate, Interface: up to 100MB/s – Transfer rate, Media: up to 57 MB/s – Typical system transfer rates 39 MB/s
May 9, USB Host Buffer Head 57 Disk Drive USB 2.0 Controller USB –53 IF 39 Sustained * Bandwidth Analysis *1.5GHz P4 Host,7200 ATA 100 Drive USB Hard Drive
May 9, Bandwidth Conclusions w Both sides, USB and Interface, must support high bandwidth w USB – Large endpoint buffers – At least double buffering w Interface – Internal processor should not touch 480 Mbit/sec data. Use the CPU for USB housekeeping & I/O u Optimize the data channel using specialized logic – Fast data transfers require fast control logic u Interface logic should be programmable u ATA, EPP, etc. w Both sides, USB and Interface, must support high bandwidth w USB – Large endpoint buffers – At least double buffering w Interface – Internal processor should not touch 480 Mbit/sec data. Use the CPU for USB housekeeping & I/O u Optimize the data channel using specialized logic – Fast data transfers require fast control logic u Interface logic should be programmable u ATA, EPP, etc.
May 9, Low level protocol CRC, PID encode- decode, chirp Deliver WORDS Token Processor EP0, Ping, ACK/NAK/ STALL/ NYET "Chapter 9" Outside Interface High speed logic clock extraction serialize/ deserialize bit stuff NRZI SYNC, EOP 16 Endpoints Endpoint FIFOS & control logic 16 CPU 48 MHz 8051 Program & Data RAM Program & Data RAM Download Code Data Channel GPIF Single-Chip Solution FX2
May 9, USB BW: Endpoint Buffers
May 9, EndpointFIFOSEndpointFIFOSMicroprocessorMicroprocessor USB Outside World Outside World (a) Low to Medium Speed Data Transfer Speed Evolution
May 9, Data Transfer Speed Evolution EndpointFIFOSEndpointFIFOSInterfaceFIFOInterfaceFIFO DMADMAUSBUSBOutsideWorldOutsideWorld (b) Faster MicroprocessorMicroprocessor RAM/FIFORAM/FIFO
May 9, Data Transfer Speed Evolution (c) Fastest EndpointFIFOSEndpointFIFOS USB Outside World Outside WorldMicroprocessorMicroprocessorRAM/FIFORAM/FIFO
May 9, Quantum FIFO 256 x x x x x x16 256x16256x16 USB 256 x x16 I/O 256x16256x16 256x16256x16 256x16256x16 1 clock
May 9, Quantum FIFO 256 x x x x16 256x16256x16 USB 256 x x16 I/O 256x16256x16 256x16256x16 256x16256x16 256x16256x16
May 9, GPIF Control Structure State Machine 6 Outputs 6 Inputs RDY Waveform Descriptor 28 bytes define up to 7 programmable intervals 00 CTL RDY(FLG) RDY(CPU) 8051 Register addr 9 Outputs EPnFLGSEL EP2 EF FF PF EP4 EP6 EP8 Transaction Count= 64K
May 9, GPIF: UDMA Read Example (Data in) DMARQ DMACK STOP HDMARDY DSTROBE DATA DMARQ DMACK STOP HDMARDY DSTROBE DATA N1 N2 N3 N4 N5 N6 FLOW STATE CRC DATA 17ns D1 D2 D3 D4
May 9, Architectural Summary w Don’t let the CPU be a bottleneck – Use fast logic to do the transfers w Some type of DMA is essential – Even better--”Zero time” DMA transfers with programmable control signals w GPIF = General-Programmable Interface w Don’t let the CPU be a bottleneck – Use fast logic to do the transfers w Some type of DMA is essential – Even better--”Zero time” DMA transfers with programmable control signals w GPIF = General-Programmable Interface
May 9, Putting It All Together ATAPI Throughput Analysis 38MB/ssustained38MB/ssustained Mass Storage Device Mass Storage Device FX2 USB 2.0 Host 100MB/s 53MB/s ~17MB/s 96 MB/s Winbench 99 Disk Test
May 9, Host Data Transfer w Data transfers are divided into 64K Byte packets w Host sends packet read request – Command Block Wrapper (CBW) w Host sends 128 IN packet requests (Data reads) – 128 * 512 = 64 KBytes w Host requests status using IN request w Device provides termination status – Command Status Wrapper (CSW) w Data transfers are divided into 64K Byte packets w Host sends packet read request – Command Block Wrapper (CBW) w Host sends 128 IN packet requests (Data reads) – 128 * 512 = 64 KBytes w Host requests status using IN request w Device provides termination status – Command Status Wrapper (CSW)
May 9, K Block Read Analysis Activity Delay CBW Data CSW 43% 56% 4% 17% 79% (a) (b)
May 9, (a) Data Phase of Read 8.8us 11.6us Typ. No NAKS! No NAKS!
May 9, (B) Read Command, CSW 662uS
May 9, USB Disk Drive Summary w USB 2.0 is a significant improvement over 1.1 w Room for improvement – Increase number of packets per uFrame u Biggest improvement in data transfer stage Ô Now 5.5 BULK packets per microframe, Spec allows 13. – Reduce latencies u Improvement in CSW status phase w USB 2.0 and FX2 have the headroom when the host BW bottleneck is improved w USB 2.0 is a significant improvement over 1.1 w Room for improvement – Increase number of packets per uFrame u Biggest improvement in data transfer stage Ô Now 5.5 BULK packets per microframe, Spec allows 13. – Reduce latencies u Improvement in CSW status phase w USB 2.0 and FX2 have the headroom when the host BW bottleneck is improved
May 9, Conclusion w Bandwidth will improve – USB Controller programmability is important w New ATA modes are possible – Many ‘disk-like’ standards u Compact Flash, etc. – GPIF-performance and flexibility is required to support w Other non-disk interfaces must be supported – EPP, PCMCIA, UTOPIA, etc. – Device programmability and GPIF flexibility w Bandwidth will improve – USB Controller programmability is important w New ATA modes are possible – Many ‘disk-like’ standards u Compact Flash, etc. – GPIF-performance and flexibility is required to support w Other non-disk interfaces must be supported – EPP, PCMCIA, UTOPIA, etc. – Device programmability and GPIF flexibility