CS 194 Research Results Paul Salzman Advisor: Professor Glenn Reinman Winter 2007 - Spring 2007.

Slides:



Advertisements
Similar presentations
Time Demand Analysis.
Advertisements

Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.
Dynamic Branch Prediction
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Access Region Locality for High- Bandwidth Processor Memory System Design Sangyeun Cho Samsung/U of Minnesota Pen-Chung Yew U of Minnesota Gyungho Lee.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Reference: Message Passing Fundamentals.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
KAIST CS780 Topics in Interactive Computer Graphics : Crowd Simulation A Task Definition Language for Virtual Agents WSCG’03 Spyros Vosinakis, Themis Panayiotopoulos.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.
Student – Nathan Beckmann Advisor – Glenn Reinman
February 21, 2008 Center for Hybrid and Embedded Software Systems Mapping A Timed Functional Specification to a Precision.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
CS 194 Research Proposal Paul Salzman Advisor: Professor Glenn Reinman Winter 2007.
CS 194 Research Checkpoint Paul Salzman Advisor: Professor Glenn Reinman Winter 2007.
DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003.
SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder.
A Tool for Describing and Evaluating Hierarchical Real-Time Bus Scheduling Policies Author: Trevor Meyerowitz, Claudio Pinello, Alberto DAC2003, June 24,2003.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.
MapleSim and the Advantages of Physical Modeling
DYNAMICS Part I Physics Engine By Willis (The Magnificent) Louie Fei (The Coyote) Liao.
Final Project Presentation& Demo Zhi Dong Real Time FEM of Elasto-Plastic Simulation.
Data Structures & AlgorithmsIT 0501 Algorithm Analysis I.
Operating Systems for Reconfigurable Systems John Huisman ID:
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Analysis of Branch Predictors
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison
Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.
Lyra – A service-oriented and component-based method for the development of communicating systems (by Sari Leppänen, Nokia/NRC) Traditionally, the design,
Adrian Treuille, Seth Cooper, Zoran Popović 2006 Walter Kerrebijn
Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for.
An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co-Processing Units Grant Jenks UCLA.
Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
CS 147 Virtual Memory Prof. Sin Min Lee Anthony Palladino.
Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
现代计算机体系结构 主讲教师:张钢天津大学计算机学院 2009 年.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Page Replacement FIFO, LIFO, LRU, NUR, Second chance
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Dynamic Branch Prediction
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Exploring Value Prediction with the EVES predictor
Improved schedulability on the ρVEX polymorphic VLIW processor
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
CSE 153 Design of Operating Systems Winter 2019
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Phase based adaptive Branch predictor: Seeing the forest for the trees
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

CS 194 Research Results Paul Salzman Advisor: Professor Glenn Reinman Winter Spring 2007

Outline  Motivation  Initial Project: Polymorphism Idea Progress and Problems  Final Project: Object Level Locality Idea Previous Work Methodology Results

Motivation: Interactive Entertainment  Simulates virtual worlds with objects and characters that interact.  Large computational requirements with the increasing demand for realism.  Interactions previously relied on predefined animations.  Real time physics engines currently used to dynamically calculate interactions.  These systems can use all the performance boosts possible to make them more feasible.

Initial Project: Polymorphism  Explore a currently proposed dedicated physics architecture, ParallAX.  Explore how different facets of interactive entertainment can be applied to a dedicated architecture.

Initial Project: Progress  Investigating SESC as a viable architectural simulator.  Creating an architecture with heterogeneous cores in SESC.  Hard coding in the ParallAX topology into SESC.

Initial Project: Difficulties  Lack of up to date simulator documentation.  Key out-of-the-box simulator feature, network communication latency modeler, removed without mention.  Simulator construction occupying far too much time.

Initial Project: Experience  Working in a large, open-source code base with bad documentation.  Academic and industry research projects do not always end successfully.  Learn when to pursue a new idea.

Object Level Locality in Real-Time Physics Applications  Idea: Objects in motion stay in motion.  Can this lend to locality at the object level in physics simulation?  If so, how can this be harness to speed up real-time physics simulation?

Value Prediction in Physics Simulation  We will observing load values pertaining to physical objects in the simulator.  Loads are long latency instructions.  Accurately predicting loads can increase instruction level parallelism and in turn performance.

Instruction Level Parallelism (ILP)  Independent instructions can be executed simultaneously.  Data dependencies prevent the processor from working on the chain of dependent instructions.  Predictions allow the processor to attempt useful work past data dependencies.

Previous Work  Value Prediction has been shown as a viable option for performance enhancement.  Various implementations of value predictors have been explored.  Methods to improve which instructions to predict and how to predict with confidence.  Correlations between High-Level information and lower level locality have been found

Methodology  Profile object info in a real-time physics simulator  Observe locality in values associated with physical objects  Construct predictors based on the locality information  Observe the performance of these predictors

Open Dynamics Engine (ODE)  Open source physics engine.  Used commercially on the PC, XBOX 360, and other IE platforms.  Large code library, hone in on hotspot functions using gprof.  Use a complex benchmark to exercise the physics engine’s functionality.

Benchmark  Many entities in the enviornment.  Collisions between multitudes of stacked boxes, rigid bodies, and rag doll constraint humanoid objects.  Used with permission from Dr. Thomas Yeh.

Profiling ODE  gprof utility  Place trace code in the hotspot function Track each object’s values Associate values with the objects id (address) Maintain an index via the PC as well as with the (PC XOR object id) Maintain chronological order

Observing and Analyzing Locality  Parse through the trace code observing locality with respect to the two indexing methods.  Check for adjacent values  Stride values  Trivial values (0,1,-1)

Constructing/Analyzing Predictors  Use the locality data to construct pertinent predictors.  Run the various predictors through the trace data to observe their performance.

ODE Hotspots  43%: dBoxBox - Bounding Box collision function  18%: collideAABBs – General geometric collision function  These two functions were chosen to profile.  The load instructions observed scale up to ~40% (rough value)

Locality Results  Charts

Locality Results  Charts

Locality Results  Adjacent values appear very often.  Stride values do not appear (intuitively runs with the idea of a physical object)  Trivial values appear in varying forms across the functions. Different trivial values may appear for different physics engines.

Predictor Construction  Adjacent Values Last Value Predictor  Simple Implementation, Finite Context Method Predictor (FCM)  Maintains small table of previous values  Tracks appearance of tables values  Chooses most likely candidate  Trivial Values 0 and -1 predictors.  Extremely simply implementation.

Predictor Construction (cont’d)  Same two functions will be profiled.  The predictors will be indexed in the same fashion as the locality data By PC By (PC XOR object ID)  No limit will be placed on the size of the predictor tables Avoids constructive and destructive aliasing.

Predictor Results

 FCM2 appears to function most accurately.  Predictors indexed by (PC XOR object ID) act exactly as zero value predictors By convention, a predictor that has not seen a value will guess zero.  As expected, trivial value predictors have hit rates equal to the appearance of their trivial values in the locality data.

Summary – Objects in Motion…  Object level locality does appear in real-time physics simulators.  This data can be leveraged by further research to increase ILP in IE architecture.  Next Step: Pursuing the connection between the high-level data to architecture.

References  [1] Brad Calder, Glenn Reinman, and Dean Tullsen. Selective Value Prediction. In 26th International Symposium on Computer Architecture, May 1999  [2] Mikko Lipasti, Christopher Wilkerson, and John Shen. Value locality and load value prediction. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems,  [3] Open Dynamics Engine.  [4] Yiannak Sazeides and James E. Smith. The predictability of data values. In 30 th International Symposium on Microarchitecture, pages , December1997.  [5] Thomas Y. Yeh, Petros Faloutsos, and Glenn Reinman. Accelerating Real-Time Physics Simulation by Leveraging High-Level Information. In UCLA CSD-TR , 2006.