Double buffer SDRAM Memory Controller Presented by: Yael Dresner Andre Steiner Instructed by: Michael Levilov Project Number: D0713.

Slides:



Advertisements
Similar presentations
I/O Management and Disk Scheduling
Advertisements

I/O Management and Disk Scheduling
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
System Integration and Performance
INPUT-OUTPUT ORGANIZATION
Chapter 5 Internal Memory
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
The 8085 Microprocessor Architecture
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
What is memory? Memory is used to store information within a computer, either programs or data. Programs and data cannot be used directly from a disk or.
IP Telephony Project By: Liane Lewin Shahar Eytan Guided By: Ran Cohen - IBM Vitali Sokhin - Technion.
Performed by: Andre Steiner Yael Dresner Instructor: Michael Levilov המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
DIRECT MEMORY ACCESS CS 147 Thursday July 5,2001 SEEMA RAI.
1 Foundations of Software Design Lecture 3: How Computers Work Marti Hearst Fall 2002.
Processor Technology and Architecture
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
6-1 I/O Methods I/O – Transfer of data between memory of the system and the I/O device Most devices operate asynchronously from the CPU Most methods involve.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Part A Final Presentation.
Chapter 12 Pipelining Strategies Performance Hazards.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
Device Management.
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Khaled A. Al-Utaibi  8086 Pinout & Pin Functions  Minimum & Maximum Mode Operations  Microcomputer System Design  Minimum Mode.
Higher Computing Computer Systems S. McCrossan 1 Higher Grade Computing Studies 2. Computer Structure Computer Structure The traditional diagram of a computer...
Final Year Project A CMOS imager with compact digital pixel sensor (BA1-08) Supervisor: Dr. Amine Bermak Group Members: Chang Kwok Hung
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Survey of Existing Memory Devices Renee Gayle M. Chua.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Chapter 5 Internal Memory. Semiconductor Memory Types.
MODULE 5: Main Memory.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
LZRW3 Decompressor dual semester project Part A Mid Presentation Students: Peleg Rosen Tal Czeizler Advisors: Moshe Porian Netanel Yamin
Memory Cell Operation.
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
ALU (Continued) Computer Architecture (Fall 2006).
Introduction to Microprocessors - chapter3 1 Chapter 3 The 8085 Microprocessor Architecture.
Unit 1 Lecture 4.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
REGISTER TRANSFER LANGUAGE (RTL) INTRODUCTION TO REGISTER Registers1.
CS 1410 Intro to Computer Tecnology Computer Hardware1.
Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Chapter 5 Internal Memory
The 8085 Microprocessor Architecture
The 8085 Microprocessor Architecture
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
TerraForm3D Plasma Works 3D Engine & USGS Terrain Modeler
William Stallings Computer Organization and Architecture 8th Edition
Interfacing Memory Interfacing.
Chapter 6: CPU Scheduling
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
Memory Organization.
The 8085 Microprocessor Architecture
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

Double buffer SDRAM Memory Controller Presented by: Yael Dresner Andre Steiner Instructed by: Michael Levilov Project Number: D0713

Project Description Implementation of a device that receives a video stream from a digital video camera, performs a simple pixel processing and transfers it to a CPU through a double buffer SDRAM memory.

Blocks Diagram Pixel Processing Unit SDRAM controller write part SDRAM controller read part Data Bus Control signals CPUFPGA CAMERA FIFO Switch SDRAM A SDRAM B

Camera Module This module is implemented by a test-bench process, which simulates a digital video camera. Test Bench reset Clock – 100MHz synch Pixels Data 1024 pixels * t clock period 2 us interval between lines Start pulse

Pixel Processing Module reset clock synch Pixels Data Processed Pixels Data synchSimple image processing

Write Controller Module reset clock synch Pixels Data Control Addresses Data Path Input Data Moore State Machine Start pulse Switch control lines

Write Controller Module

Refresh Algorithm One refresh cycle : 80ns (8 clock periods) Time interval between 2 lines : 2us 2us/80ns = 25 refresh cycles In order to refresh all the 4096 lines we need 4096/20 (204.8) time intervals Refreshing the whole memory takes 204.8*(2us *10ns) = 2.5msec Each line should be refreshed every 64 msec

Read Controller Module Data Path Moore State Machine Switch control lines clock reset Start pulse Control Addresses Wrreq to FIFOWrusedw

Read Controller Module

Switching Procedure Pixels from APixels from B Data to FIFO Output Mux SDRAM ASDRAM B Switching Mux Data & Control lines from Write Controller Control lines from Read Controller Switch Data & Control lines from Write Controller Control lines from Read Controller

CPU Module This module is implemented by a test-bench process, which simulates a CPU Test Bench Pixels Data Cpu_Clock 100MHz Read Request to FIFO rdempty from FIFO

Design considerations BMP file is loaded into a buffer and cyclically transferred Simulates a stream of photo images, as a digital video cam. we used a small BMP file. The ModelSim performs poorly as bigger the source files are, therefore we used a small BMP file (similar to a medium quality video image). CAMERA simulator:

Design considerations Always reads (even if burst size is bigger than data stored in FIFO) Achieves maximum throughput Efficiency of 84% equal to the camera’s. achieved by specific burst lengths and delay time Ensures Best performance under the same frequency as the camera’s CPU simulator:

Design considerations Double Clocked, can store up to 4k (four lines). Adapter unit between 2 clock domains. considering the small difference between the CPU clock and the camera clock, the FIFO in that size doesn’t get full Always kept full by the Read Controller. Ensures maximum availability of data for the CPU FIFO:

Design considerations One bank used simplicity Full page mode Minimum clock cycles for both reading and writing access Switching during intervals when camera is idle The simplest way not to loose data SDRAMs management

Design considerations The write controller refreshes the SDRAM lines during the camera’s idle interval. refreshing is done with no cost. Controllers management

Testing Procedures Internal white box testing for each module Black box testing System Performance testing

Black Box Testing We check the system’s correctness by comparing between : Data from the camera represented as a source bmp file Data received by the CPU represented as a target bmp file This comparison is done for different CPU clock rates, which were chosen in order to cover the ordinary and some special cases that affect the behavior of the internal design

System Performance Testing Test 1 : Activity under normal circumstances Camera’s clock rate: 100 MHZ CPU’s clock rate: 100 MHZ or more Since the CPU Reads the same or faster than the camera, switching will occur after every line, each SDRAM will contain one line. Expected Results:

System Performance Testing Test 1 : Activity under normal circumstances Actual Results:

System Performance Testing Test 2 : Activity under extreme circumstances Camera’s clock rate: 100 MHZ CPU’s clock rate: 66.6 MHZ Since the CPU Reads slower than the camera, the FIFO gets full, the read controller gets stuck and the switching occurs less frequently. Each switching, more data is written to the SDRAM than before. Expected Results :

System Performance Testing Test 2 : Activity under extreme circumstances Actual Results:

System Performance and Design Considerations CPU clock When CPU’s clock is smaller than the camera’s, pixels accumulate in the system The number of pixels that accumulate is constant Example: CPU’s clock = 99.5MHZ. Therefore, a clock cycle lasts ns While camera writes 1024 pixels, CPU with same efficiency as camera’s will read total of 1010 pixels: ((1024*10ns ns) / ) * 0.83 = 1010 Transfer of 1024 pixels 2us idle time CPU’s clock cycle CPU’s efficiency Total pixels that CPU reads, while camera sends a line of 1024 pixels For every written line, the system accumulates 14 pixels!

System Performance and Design Considerations The accumulated pixels eventually fill the FIFO. Once the FIFO gets full, the SDRAMS start to fill to higher line numbers. The following table examples how many pixels are accumulated in the system for each line of 1024 pixels that is received from the camera CPU’s clock Accumulated bytes for each line received from camera Number of lines that will be received until FIFO gets full

System Performance and Design Considerations Putting this details into a graph shows us that once the CPU’s clock frequency is more similar to the camera’s, the systems performance is better by several orders of magnitude. The following graph shows for different CPU clock frequencies, the number of line received in the system, when the FIFO got full. Once FIFO is full, switching occurs less frequently and the system accumulates pixels in the SDRAM. Therefore better performance is measured by how long it takes to the FIFO to get full. The longer, The better.

System Performance and Design Considerations

FIFO size Since for every line that is received from the camera the system accumulates a constant number of pixel (dependent of the CPU’s clock cycle), the time until the new FIFO will get full is linearically proportional to it’s size. The following chart presents the performance under a constant CPU clock and different FIFO sizes. The X axis represents the size of the FIFO and the Y axis represents the number of line that was written, when the FIFO got full. We can see that the change in time until FIFO fills is linearic and proportional to the FIFO’s size.

Alternatives Instead of two SDRAMs, we could have used a two clocked FIFO which could enable simultaneous reading and writing. Drawback: FIFO is usually a lot smaller than SDRAM Instead of two SDRAMs, we could have used one and alternately perform writing and reading. Drawbacks: SDRAM’s clock has to be faster than the camera’s and less efficient

Summary & Conclusions High consideration has to be given to right scheduling between different modules Dividing each unit into separate modules and each module into separate simple processes simplifies both the implementation and the debugging

Summary & Conclusions The design permits use of a camera and a CPU that work with different clocks. The data stream passed without overflow will be longer when: - Clock rates difference is smaller - SDRAMs and/or FIFO are bigger - CPU reads longer bursts and has shorter delay time. We enjoyed, ahla project!