Ontogenetic hardware Ok, so the Tom Thumb algorithm can self- replicate an arbitrary structure within an FPGA But what kind of structures is it interesting to self- replicate
Ontogenetic hardware Embryonics = embryonic electronics: Drawing inspiration from growth processes of living organisms to design complex computing systems Phylogeny (P) [Evolvability] Epigenesis (E) [Adaptability] Ontogeny (O) [Scalability] PO hw POE hw OE hw PE hw
Bio-Inspired Approaches Growth Self-organization Massive parallelism (multicellular systems) Issues that growth can potentially address: Complexity Scalability Fault tolerance
Caenorhabditis Elegans 11 December 1998
Caenorhabditis Elegans From S.F. Gilbert, Developmental Biology, Sinauer, 1991
Multicellular Organization 959 somatic cells
Cellular Differentiation Pharynx Intestine
Embryonics: How? Iterative electronic circuit based on 3 features: multicellular organization cellular division cellular differentiation
Embryonics Landscape Population level (population = organisms) Organismic level (organism = cells) Cellular level (cell = molecules) Molecular level (basic FPGA's element)
StopWatch
Multicellular Organization
StopWatch
First step: design of a totipotent cell (stem cell) (of course, in practice it can be optimized)
StopWatch
Cellular Differentiation
Cloning
Self-Repair
BioWatch The application can of course be anything… But then, the size and structure of the cell will vary from application to application: we need programmable logic!
MUXTREE Molecule The “molecular” layer of Embryonics is an FPGA
Cellular Self-Replication But if we use FPGAs, then we need to CREATE the array of cells in the first place, before differentiation can take place (self-replication)
Cellular Self-Replication But if we use FPGAs, then we need to CREATE the array of cells in the first place, before differentiation can take place (self-replication)
Cellular Self-Replication But if we use FPGAs, then we need to CREATE the array of cells in the first place, before differentiation can take place (self-replication)
Cellular Self-Replication Self-replication will allow the same FPGA partial configuration to be duplicated as many times as needed
Cellular Self-Repair But self-replication, and custom FPGAs, can ALSO be used to improve the reliability of the system
Cellular Self-Repair But self-replication, and custom FPGAs, can ALSO be used to improve the reliability of the system … within limits
Operation of the Cell
Kill a Molecule
Recovered Molecule
Kill Again (Kill a Cell)
Recovered Cell
Implementation - The BioWall
Genotype Layer Phenotype Layer Example – Automatic Synthesis Application-specific (parallel) functions Developmental algorithm Genetic code Mapping Layer
Example – Automatic Synthesis Phenotype Layer Mapping Layer Genotype Layer Totipotent Cell
Example – Automatic Synthesis Totipotent Cell Programmable Logic
Example – Automatic Synthesis Programmable Logic Cellular Array
What kind of applications can take advantage of this kind of system? Complex "real-world" streaming applications computation is carried out sequentially can be represented by a DAG of computation nodes each node processes data locally then forwards them to the next node in the graph Applications ×+÷≠ FFT + × DCT INOUT
READDCTQNTZCMPRWRT Example: JPEG Specialized MOVE functional units can be designed for each of these steps INOUT
Programmable substrate ×+÷≠ FFT + × DCT Context INOUT Problem: task or resource allocation – i.e. how do we map the graph nodes to the array? Specifically: dynamic allocation
Self-Scaling Stream Processing Source Funct A Funct B Funct CJoinFunct A Funct C Funct A Funct C Funct A Funct C
SSSP The MJPEG application consists of a four-stage computation pipeline. The data to be compressed are composed of 192 bytes corresponding to an 8x8 array of pixels using 24- bit colour. The maximum rate achievable (determined by the input rate) is of 700 packets per second - roughly 1 MBit/second. With a single pipeline, the performance tops at about 60 packets per second.
SSSP When performance peaks, the average output rate is of 675 packets per second (out of a maximum of 700): this technique allows to multiply the throughput by a factor of 11 using 28 processors.