IBM Research Division © 2007 IBM Corporation July 22, 2008 The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin - Madison IBM T.J. Watson Research Center July 22 and 23, 2008
IBM Research Division 50B Transistors on a Chip? History –1997 IEEE Computer Special Issue, 1B T/chip by papers advocate single fast core – CMU, Michigan, Wisconsin IRAM – Berkeley RAW – MIT SMT – Washington Multicore – Stanford 11 years later, 50x more transistors –We still need faster cores : computation Fundamentally constrained by power –Will get more than one core : communication Need efficient interconnects and coherent caches –Will get lots of on-chip memory Need to think about new algorithms and new approaches to use it 2July 22, 2008
IBM Research Division (1) What Will We Do With 50B Transistors? 50B transistors/chip dramatically alters data centers E.g. Nokia moving aggressively into services –Google, Yahoo, MSN each provision ~1M servers –Now provision for 10x installed base (phone vs. PC) Witness recent problems with Iphone/MobileMe Impossible to anticipate applications –Youtube/Facebook/Flickr/Twitter –Unstructured real world data –Organize, search, extract semantic knowledge, mashups, … Existing and future server apps all benefit 3July 22, 2008
IBM Research Division (2) How Will We Design Chips with 50B Transistors Three things that processors need to be good at: –Computation –Communication –Storage/Memory Focus on cost and nature of computation Focus on cost of communication Shift emphasis to memory 4July 22, 2008
IBM Research Division Cost of Computation Less than 10% of energy spent on useful work –EPI overhead has gotten out of hand –Need to rethink operand delivery [ICCD’07], queues [ISPLED’07], caches, register files, control, … Exploit program attributes –Solve hard problems via elimination Macro-ops : no single-cycle operations [MICRO’03, HPCA’06] –Do the hard parts with narrow values [JILP’07] Eliminate redundancy, excessive pipelines –Clever clock gating [ISLPED’06, ICCD’07] –Remove renaming, register file, clocked scheduler, pipelines [submitted] Goal: reduce EPI by 10x at fixed process technology and MIPS 5July 22, 2008
IBM Research Division Cost of Communication Reduce coherence overhead and speculation –Region coherence [ISCA’05, ASPLOS’06, HPCA’08] Exploit locality of communication patterns –Switched circuits [CALetters’07, NOCS’08] –On-chip multicasting [ISCA’08] –Multicast coherence [submitted] New technologies –Nanophotonic rings [HP Labs collaboration] –Massive bandwidth, speed-of-light latency –Lots of interesting problems to solve 6July 22, 2008
IBM Research Division Emphasis on Memory In future processes, memory will be easier than logic –Reliability, variability: well-known solutions (ECC, sparing) –Interesting new technologies (PCRAM, etc.) –Not caches -- diminishing returns Return to more regular, “memory-like” devices and logic? –Gate array, LUT, PLA Majority of 50B T must not be switching –Remembering is cheaper than computing Revisit value locality/reuse/memoization? –New search algorithms: TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM! 7July 22, 2008
IBM Research Division Unstructured Real-World Data Internet is exploding with data –Text –Semantic knowledge –Photo, video, audio It is all in digital form but all we can do is view and copy it Algorithms for analysis range from poor to nonexistent –Machine learning? Why not learn from nature? 8July 22, 2008
IBM Research Division Brains Human brain Von Neumann machine –Face recognition: <500ms –Neurons are slow: Critical path is a handful of “gates” –Fundamentally different computational model Made of shoddy, unreliable parts “…neurons are noisy, unreliable devices, … the nervous system averages over many cells to compensate for these shoddy components.” -Christof Koch We can build it. We have the technology. Dec. 3, 2007MICRO’-40 Panel: Computing Beyond Von Neumann9
IBM Research Division Brains (2) Human neocortex: –~20B neurons, ~200T synapses –Structurally homogenous –Hypothesis: runs common algorithm Apply architecture 101? –Abstraction layers –Hierarchy and replication –Simulation/analysis/synthesis –Massively parallel fault-tolerant hardware Best news: no need for parallel programming –Train vs. program –Let’s Build Brains! Dec. 3, 2007MICRO’-40 Panel: Computing Beyond Von Neumann10
IBM Research Division Summary Computation : –Reduce cost (EPI) by 10x –New algorithms Communication –Streamline coherence protocols, interconnects –Exploit new technologies Storage/Memory –Reliability/variability –Logic in memory/new algorithms Brain computing for unstructured real-world data 11July 22, 2008
IBM Research Division Questions?