F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation for Multi-processor SoC Design gabriela:
Memory for SoC l SoC: a single chip Heterogeneous components (CPU, IP, …) Application-specific architecture l Integration of standard Memory IP Adaptation of memory protocols to the specific network (N processors) DSP CPU IP Communication Network SRAM Memory FLASH Memory GLUE
Memory for SoC l SoC: a single chip Heterogeneous components (CPU, IP, …) Application-specific architecture l Integration of standard Memory IP Adaptation of memory protocols to the specific network (N processors) DSP CPU IP Communication Network Wrapper SRAM Memory FLASH Memory
Outline l Introduction Memory IP based design Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper Generic architecture Automatic generation l Experiments l Conclusion
Memory IP based design l Steadily Increasing Capacity l Memory Reuse Based Design to close the gap between capacity and productivity MEMORY INTERFACE DESIGN IS A DOMINANT PROBLEM
Memory integration issues l Complex system design Heterogeneous components n Several logical ports and specific communication protocols Standard Memory components n Limited physical ports and standard access protocols l Large memory design space exploration Different memory characteristics (Type, Size, Consumption) l Multi-masters SoC Parallel accesses to the global memory
Memory integration issues l Complex system design PORT ADAPTATION is needed l Large memory design space exploration WRAPPER FLEXIBILITY is required l Multi-masters SoC SOPHISTICATED SYNCHRONIZATION MECHANISMS are required
Related Work l Port adaptation CoWare Polis Cadence (VCC) l Wrapper flexibility Marie Curie COSY l Synchronization mechanisms Fixed priority (PalmChip) TDMA and Round-Robin (Sonics) None of the existing strategies has fully addressed the problems of memory IP integration already described
Our Contributions l Generic memory wrapper architecture Port adaptation Memory flexibility Arbitration between parallel memory accesses l Automatic generation of memory wrapper by assembling library components
Outline l Introduction Memory IP based design Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper Generic architecture Automatic generation l Experiments l Conclusion
Architectural models l Virtual architecture model Abstract modules (Virtual modules) Abstract channels Implicit communication procedures Wrapper specification but no implementation M1 M2 MEMORY Channels Virtual architecture M1 OS Wrapper Physical Communication Network MEMORY Micro-architecture Module implementation l Micro-architecture model Modules implementation Physical communication network Explicit communication procedures HW wrapper implementation and synthesis
Basic concepts: virtual module l Separation between behavior and communication interface Memory access must be independent of the memory type l Hiding the abstraction level of memory description Memory integration must be independent of these abstraction levels n Logical and physical accesses To adapt these accesses, we use a wrapper Memory IP External port (logic port) Internal port (physical memory port) virtual port Wrapper Channel 1 Channel 2
Outline l Introduction Memory IP based design Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper Generic architecture Automatic generation l Experiments l Conclusion
Memory wrapper architecture l Generic wrapper architecture Memory dependent part n Memory port adapter (MPA) Communication dependent part n Channel adapter (CA) Internal bus (IB) n Address, data and control Arbiter Memory IP Memory Bus IB MPA CA3CA1 arbiter memory wrapper CA2 channels Communication network
Flexibility of the memory architecture l Flexible memory wrapper architecture for a large design space exploration l Flexibility is ensured by generic and modular models CA: customized with communication network specific parameters MPA: customized with memory specific parameters We change only the Memory Port Adapter part MPA2MPA1 Single port memory IP Memory Bus IB MPA CA3CA1 arbiter CA2 memory wrapper Communication network Memory Busses Dual port memory IP IB CA3 arbiter CA1CA2 memory wrapper Communication network
Memory wrapper generation flow l Wrapper generation Input : n Memory IP library n Wrapper components library (CA, MPA) n Architectural parameters –Number of ports, channels, protocols Action n Customizing the generic CA and MPA from library using the architectural parameters n Instantiation of customized CA and MPA n Interconnection to the rest of system Output : n Micro-architecture Virtual Architecture Annotated with Parameters Memory IP Library CA MPA library Wrapper Generation Micro-architecture
Outline l Introduction Memory IP based design Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper Generic architecture Automatic generation l Experiments l Conclusion
Image Filtering Process Input/Output Image Input image Output image
Experiments l Low level image processing for digital camera The initial specification is n Memory rich (2 Mbytes Flash, 2Mbytes ROM, 256 Kbytes SRAM) n Processor poor (only one 8 bit RISC processor) l Acceleration by adding an other processor We use 2 ARM7 processors 1 global memory Point-to-point communication network l 2 Experiments to prove the memory flexibility ensured by wrapper Experiment 1: using a dual port SRAM Experiment 2: using a single port SDRAM
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … …
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port MEMORY WRAPPER
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port SRAM MPA SRAM MPA
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SRAM MPA SRAM MPA
Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER IB1(32) IB2(32) SRAM MPA SRAM MPA
Experience 1: Dual port memory l MPA services Test Address decoding Access mode n burst mode –burst seq (4 words) Bank control Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER IB1(32) IB2(32) SRAM MPA SRAM MPA
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … …
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA MEMORY WRAPPER
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA
Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA
Experience 2: Single port memory l MPA services Test Address decoding Access mode n classic R/W mode Bank control Initialization Refresh Conversion bits IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA
Results l SystemC code size for the memory wrapper Experience 1 : 1438 lines Experience 2 : 1335 lines l Latency (without memory latency) Write : 3 CPU cycles Read : 7 CPU cycles (send/receive) l Simulation results of an image of 387 x 222 : Experience 1: 2.05 millions of CPU cycles Experience 2: 2.97 millions of CPU cycle Fast design exploration with different memories thanks to automatic memory wrapper generation
Outline l Introduction Memory IP based design Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper Generic architecture Automatic generation l Experiments l Conclusion
Conclusion l Systematic method to integrate Memory IP in the multi-processors SoC architectures at system level l Generic memory wrapper architecture Port adaptation Flexibility of the memory architecture Parallel accesses arbitration l Automatic memory wrapper generation is done by assembling library components l Fast memory design exploration l Application for low-level image processing
Perspectives l Generalization of IP wrapper architecture based on generic wrapper model l Using a sophisticated communication network like AMBA bus and packet switch communication network l Configurable memory test bench