Block 3 (from SSRAM 1 of each PTA) Block 2 (from SSRAM 2 of each PTA) Authors: G. Chiodini 1, B. Hall 2, S. Magni 3, D. Menasce 3, L. Uplegger 3, D. Zhang.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
System Integration and Performance
Categories of I/O Devices
1 1999/Ph 514: Channel Access Concepts EPICS Channel Access Concepts Bob Dalesio LANL.
1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
Introduction To System Analysis and Design
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Computer System Overview
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
PRASHANTHI NARAYAN NETTEM.
1 A Student Guide to Object- Orientated Development Chapter 9 Design.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Section 6.1 Explain the development of operating systems Differentiate between operating systems Section 6.2 Demonstrate knowledge of basic GUI components.
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
An Introduction to Software Architecture
Leo Greiner IPHC testing Sensor and infrastructure testing at LBL. Capabilities and Plan.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
A PCI Card for Readout in High Energy Physics Experiments Michele Floris 1,2, Gianluca Usai 1,2, Davide Marras 2, André David IEEE Nuclear Science.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Project D1427: Stand Alone FPGA Programmer Final presentation 6/5/10 Supervisor: Mony Orbach Students: Shimrit Bar Oz Avi Zukerman High Speed Digital Systems.
17 January 2002The Beam Test DAQ Design Status of the design and implementation of the DAQ for the beam test G. Alimonti, G. Chiodini, S. Magni, D. Menasce,
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
The XML approach for the DAQ initialization files (plus update on the status of the DAQ project) G. Chiodini, S. Magni, D. Menasce, L. Uplegger, D. Zhang.
Stefano Magni I.N.F.N Milano Italy - NSS 2003 Pomone, a PCI based data acquisition system Authors: G.Alimonti 3, G. Chiodini 1, B. Hall 2, S. Magni 3,
11/11/2003Update on Pomone1 Dario Menasce, Stefano Magni, Lorenzo Uplegger.
An Overview of Support of Small Embedded Systems with Some Recommendations Controls Working Group April 14, 2004 T. Meyer, D. Peterson.
Guirao - Frascati 2002Read-out of high-speed S-LINK data via a buffered PCI card 1 Read-out of High Speed S-LINK Data Via a Buffered PCI Card A. Guirao.
Pixel DAQ status G. Chiodini, S. Magni, D. Menasce, L. Uplegger, D. Zhang.
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Chapter – 8 Software Tools.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
Amdahl’s Law & I/O Control Method 1. Amdahl’s Law The overall performance of a system is a result of the interaction of all of its components. System.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Introduction to Operating Systems Concepts
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Unified Modeling Language
Chapter 2: System Structures
Review of computer processing and the basic of Operating system
Introduction to Operating System (OS)
#01 Client/Server Computing
Chapter 2: The Linux System Part 1
An Introduction to Software Architecture
Chapter 2: Operating-System Structures
Chapter 2: The Linux System Part 5
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Channel Access Concepts
Universal Serial Bus (USB)
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems.
#01 Client/Server Computing
Presentation transcript:

Block 3 (from SSRAM 1 of each PTA) Block 2 (from SSRAM 2 of each PTA) Authors: G. Chiodini 1, B. Hall 2, S. Magni 3, D. Menasce 3, L. Uplegger 3, D. Zhang 2 1 I.N.F.N. Lecce, 2 FNAL, 3 I.N.F.N. Milano Pomone is a general purpose, low-cost and portable read-out system based on the industry-standard PCI protocol. It has been developed in the context of the BTeV experiment at the Fermilab Collider and is meant to be used in the pixel test beam of Summer Pomone is a general purpose, low-cost and portable read-out system based on the industry-standard PCI protocol. It has been developed in the context of the BTeV experiment at the Fermilab Collider and is meant to be used in the pixel test beam of Summer A PCI Test Adapter (PTA) plug-in card, compliant with the PCI protocol A PCI Test Adapter (PTA) plug-in card, compliant with the PCI protocol Pomone has been designed to meet the requirements of a general Test Stand hardware for testing detectors both in a laboratory environment and at test beam facilities for the BTeV experiment at Fermilab. Current implementation focuses on following components: A Programmable Mezzanine Card (PMC) A Programmable Mezzanine Card (PMC) An FPIX read-out chip (ROC) An FPIX read-out chip (ROC) A host computer A host computer External data source subsystem, a Fermilab Pixel readout chip (FPIX) External data source subsystem, a Fermilab Pixel readout chip (FPIX) The PMC is intended to work in conjunction with a PTA card to serve as a flexible platform for building small DAQ systems for testing detectors and subsystems. The PMC is designed around the Xilinx Virtex II FPGA which serves as an interface between the the PTA card resources and the external subsystem/detector. The PMC is intended to work in conjunction with a PTA card to serve as a flexible platform for building small DAQ systems for testing detectors and subsystems. The PMC is designed around the Xilinx Virtex II FPGA which serves as an interface between the the PTA card resources and the external subsystem/detector. PCI-compliant card featuring: an Altera FPGA controlling all functions A PCI target interface (slave only) two SSRAM banks (1 Mb each) Daughter card interface for all links using IEEE1386 Mezzanine connectors JTAG interface to upload FPGA code USB interface PCI-compliant card featuring: an Altera FPGA controlling all functions A PCI target interface (slave only) two SSRAM banks (1 Mb each) Daughter card interface for all links using IEEE1386 Mezzanine connectors JTAG interface to upload FPGA code USB interface A host computer acting as a data-sink (The current implementation supports the Linux operating system) A host computer acting as a data-sink (The current implementation supports the Linux operating system) Both PMC and PTA cards have been developed by the ESE group at Fermilab. For more information on each of these hardware components check the following site: Data are produced by the external subsystem at a variable rate (depending upon beam and interaction rate) while the host computer receives them at his own rate (depending upon the CPU clock and the processor current activity). The PTA has therefore been programmed to act as an intermediate buffer to hold events, thus allowing for a continuous sustained data-rate. The PTA can receive data up to 30 MHz and the comptuer can digest data at about 2 Mb/s. This holds for the current FPGA configuration which not allow for DMA transfer. These value are well within specification for the coming test beam. Principle of operation of the read-out The FPGA on the PTA is programmed to direct the incoming data flow to one of the two internal 1 Mb SSRAM memory banks (1). When a bank reaches a user defined limit (size or timeout), data flow is switched to the other bank (2) and an interrupt is generated for the host PC. At that point data are flushed out from bank 1 to the host computer, while bank # 2 continues to receive data from the external source. When bank 2 is full, another memory swap occurs, an interrupt is generated and the whole process iterates again. FPGA From PMC From PMC To host PC To host PC Interrupt a Two independent processes run on the host computer to manage the read-out, the producer and the consumer: the producer waits for interrupts: when one is received it fetches data from the SSRAM which has reached the programmed limit and transfers them to a statically allocated memory on the host computer. The consumer continually checks for data available in the shared memory and transfers them to an external storage. A block of data is ready to be transferred when the producer finally marks it as complete. Here is were the event-building actually takes place. FPGA SSRAM bank 1 SSRAM bank 2 From PMC From PMC producer Interrupt b Statically allocated shared memory consumer Mass storage Host PC PTA card PTA card There are thus three processors working togheter to pipeline data out of the detector: the PMC, the PTA and the host PC. Both PMC and PTA are initialized by commands issued by the producer from the host PC. PMC FPGA PTA FPGA ROC PC Four programs on the host PC cooperate in the read-out: the logger centralizes messages received by both the producer and the consumer, while a GUI allows interaction with a user (the GUI is an optional component!)! SSRAM bank 1 SSRAM bank 2 The read-out code has been designed upon the following guidelines: Code must be robust, in principle able to whitstand change of operating system environment, extensive refurbishing and additions of algorithms. Developers should be able to track down changes over time. Code must be highly modular, able to accommodate different detectors, with different hardware and software specifications. The backbone should essentially provide virtual methods to allow biodiversity. Functionality of the code must be guaranteed also in environments with minimal resources (eg. no X11 graphics is available). The system should be able to perform even without a GUI for user interaction. The system should provide mechanisms to both initialize and read-out the system. Since the incoming data rate is asynchronous from the host computer clock, a data-rate compensation buffer must be provided to accommodate for fluctuations in the sustained data-rate. Components of the system should be loosely coupled: this allows for upgrades and changes of individual elements with little effort. It further insures smoother deployment due to minimal cross dependencies. To allow for robustness and modularity, the code has been implemented in C++, and the overall code management is under the supervision of CVS in a FERMILAB based, periodically backed-up, repository. Particular care has been exercized in the object-oriented design in order to efficiently achieve an optimal decoupling of components. Should X11 resources not be available at any given time during run (eg. from remote institutions with bad network connection), users can still efficiently run the system by means of a command-line oriented interface. A more sophisticated GUI is provided for convenience but it’s not crucial to successfully operate the read-out. In order to accommodate an intermediate data-rate compensation buffer, the system, as described before, is split in two main processes with a shared memory in the middle: the first process (called producer) gets hits out of the PTA card placing them in the shared memory, while the second (called consumer) continuosly browses the shared memory to fetch completed blocks for the event-builder to assemble hits into events. Events are finally written by the consumer on an external data-storage. Additional elements of the system are: a logger (centralized message logger), which receives messages from producer and consumer and writes them to a configurable output stream. a controller (a command-line user interface). The producer, consumer and logger get user commands from a message-bus: users type commands at the controller prompt which feeds them to a reserved message-bus which is constantly monitored by the above processes. A sophisticated GUI is provided to allow users to efficiently interact with the system. It drives the behaviour of producer, consumer and logger using the same message-bus mechanism of the controller Loose coupling among these components is thus accomplished by the use of intermediate buffers, shared memories statically allocated in the host computer and the message-bus. No direct transaction occurs between them. Once defined a public interface that describes their internal behaviour, producer, consumer and logger are then essentially independent programs. producer Interrupt Statically allocated data shared-memory Statically allocated data shared-memory consumer Mass storage Host PC logger Detector (PMC + PTA) Detector (PMC + PTA) controller or GUI Statically allocated message-bus Statically allocated message-bus The end-user talks to producer and to consumer by means of commands sent to the message-bus, which is polled constantly by them to get orders. There is therefore no direct connection between the two processes. data-flow producer SSRAM 1 FPGA Interrupt handler Reset interrupt Detector (PMC) Detector (PMC) SSRAM 2 Shared memory Shared memory consumer Mass storage Host PC Time evolution of the read-out producer FPGA Interrupt handler Reset interrupt Detector (PMC) Detector (PMC) SSRAM 2 Shared memory Shared memory consumer Mass storage Host PC SSRAM 1 t 0 Events begin to flow from the detector to the PTA. This goes on till the first SSRAM overflows a user selectable threshold t 1 An interrupt is raised by the FPGA to flag memory-full status. Incoming data is then redirected to second SSRAM bank and producer can start to read-out hits to shared memory producer FPGA Interrupt handler Reset interrupt Detector (PMC) Detector (PMC) SSRAM 2 Shared memory Shared memory consumer Mass storage Host PC SSRAM 1 t 2 Interrupt is reset. While new data feed SSRAM 2, SSRAM 1 is emptied by producer and shared memory receives data. producer FPGA Interrupt handler Reset interrupt Detector (PMC) Detector (PMC) Shared memory Shared memory consumer Mass storage Host PC t 3 As long as SSRAM 2 becoms full, the system cycles again through steps from t 0 to t 2. producer shifts data from SSRAM 2 while SSRAM 1 gets fresh data. SSRAM 2 SSRAM 1 producer FPGA Interrupt handler Reset interrupt Detector (PMC) Detector (PMC) consumer Mass storage Host PC t 4 As soon as two blocks become available in the shared memory, the consumers fetches them and collects hits into events (event-build) Events are then shuffled onto mass-storage. SSRAM 2 SSRAM 1 Shared memory With this approach, the shared-memory acts as a compensating-buffer between an incoming data rate and an outgoing one. This architecture comes in very handy when events are defined as ‘collection of hits with same time stamp’. In this case the dual-buffer approach becomes essential to develop an efficient event-builder, since the event builder needs only to keep sorting hist from at most two adjacent data blocks in the shared memory to be sure no hits belonging to an event is lost. See next page for an explanation of the principle of operation of a real test-bench where incoming data are categorized on a time-stamp basis rather than position in the output buffer (this is the case of the anticipated use-case with pixel detectors at the coming BTeV test beam). Our actual test-stand consists of several detectors to be read-out in a single data stream: an event is, in this case, defined as a collection of hits generated from different detectors at the same time (they are marked by the same time-stamp). In order to vastly improve the efficiency and speed of event-builder PMC FPGA Slot 1 Slot 2 PTA SSRAM 1 SSRAM 2 FPGA ROC A ROC B Detector A Detector B PMC FPGA Slot 1 Slot 2 PTA SSRAM 1 SSRAM 2 FPGA ROC C ROC D Detector C Detector D PMC FPGA Slot 1 Slot 2 PTA SSRAM 1 SSRAM 2 FPGA ROC Y ROC Z Detector Y Detector Z Interrupt handler Interrupt handler Interrupt handler Interrupt handler Interrupt handler Interrupt handler producer PCI extender Shared memory Shared memory consumer mass storage Host computer Beam GUI stage, a mechanism has been provided to synchronize the swapping of the SSRAM banks in the PTA cards. The first bank reaching the limit raises an interrupt and immediately the producer issues a command to all other PTA cards to swap their SSRAM banks. In this way the consumer has to deal with blocks from the shared memory that contain hits with the same time-stamp that are spread out, at most, among two consecutive blocks (corresponding to a swap operation). Building an event becomes thus just a time-stamp reordering of two 1 Mb buffers at most, a task which imposes no particular heavy burdens on the read out computer. time stamphit … … n n n n n n n n n … … time stamphit time stamphit a a a d b b c c d d d d e e Block 1 (from SSRAM 1 of each PTA) Shared memory … … … In this artist conception of the shared memory, hits are shown with a color-coded representation. Hits reach the shared memory in a loosely sparsified mode: nonetheless, hits with the same time stamp are not too far between them in a single block (a block is an image of all SSRAM # 1 or #2 contents before a collective swap has occurred). Therefore, in this example, events a, b and c are fully contained in block #1, event d, on the contrary is half split between block #1 and block #2 (because transfer from PMC to PTA was not completed when a swap was issued by the first PTA reaching the limit). Event e etc… will be fully contained in block #2. Event builder will thus use blocks 1 and 2, then discard #1, use #2 and 3, discard 2 and so on c d The basic unit of read-out is a PTA card, intermediate component between a PMC and a host computer. A complete read-out system consist of many replicas of this elementary unit (as shown in box, where a complete setup is detailed) Both FPGA are pre-programmed, and in the case of the PTA the code has been custom designed and implemented by our group using the Altera Quartus product. producer consumer logger Graphical user interface Graphical user interface Hardware data-source Hardware data-source External mass-storage External mass-storage PTA card Software technologies used in the read-out To properly insure ease of maintanance, portability and a smooth evolution-schema, the system has been built upon a collection of libraries (almost all Open Software licenced) to handle the following: WinDriver : a commercial Device Driver builder, by Jungo ( upon which out ownhttp:// PCI device driver is built. Xerces : an xml parser. Our system configuration files are in xml syntax: methods are provided to parse validate and transfer to memory initialization constants, geometrical or electrical parameters etc. ( Qt : a library of classes to build complex GUIs ( Root : data analyis package ( Nienet : a GPIB device driver ( The Pomone read-out, written in C++, is maintained on a centralized CVS source code repository at Fermilab. It has been designed and tested to work on dual-processor workstations: in general the producer feeds the shared memory running on one processor while the consumer fetches hits and executes event-building (which is a CPU intensive task) on the other processor. Histograms for monitoring purpose are served through an IP socket: a histogram presenter client has been provided to allow users to monitor the DAQ activity from remote locations, without placing additional burden on the DAQ cpu load. Pomone has been provided with extensive on-line documentation. We use Doxigen to produce a browsable Reference Guide. We have configured the Doxigen parser in order to provide both a Reference as well as a User Guide in one single document, available through the Web both in HTML and pdf formats. This has proven to be an extremely valuable tool to help collaborators to develop the code. Since documentation is embedded in the source code as suitably formatted comment lines, it is insured, this way, that code and documentation are always in synch. Links to file, classes and compound elements of the Pomone Reference Guide Links to file, classes and compound elements of the Pomone Reference Guide Extensive User Guide, with schematics and drawings Extensive User Guide, with schematics and drawings Doxigen produces nice graphical inheritance tree schematics, with hyperlinks to class definitions. Doxigen produces nice graphical inheritance tree schematics, with hyperlinks to class definitions. Every device used by the read-out system is accurately described and referenced in the on-line guide (were appropriate, links are provided to the original web site with up-to-date documentation) Every device used by the read-out system is accurately described and referenced in the on-line guide (were appropriate, links are provided to the original web site with up-to-date documentation) The whole source code is suitably hyperlinked to allow easy and efficient browsing of the code. The whole source code is suitably hyperlinked to allow easy and efficient browsing of the code. Snapshots of the Pomone GUI Schematics of the PMC card Schematics of the PTA card The PTA has been programmed using the QUARTUS software to generate suitable code for the Xilinx FPGA 4