December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004
2 Team Organization
December 13,
December 13,
December 13,
December 13, Problem Statement Computationally intensive environments underutilize Graphical Processing Units.
December 13, Background Information Discussed since 1996, but never implemented. GPU Performance –Multiplied at a rate of 2.8 times per year since 1993 –Expected to increase at this rate for another 5 years The more performance increases the more helpful our product becomes
December 13, Solution: G.A.P. Create a usable, extendable, and maintainable API to leverage the unused computing power of graphics processors that will result in increased performance of scientific, database, and other processor- intensive applications.
December 13, Solution: G.A.P. Utilizing existing hardware: –Improve computing power –Improve computing time –Improve computing responsiveness
December 13, Solution Implementation By creating a: –SDK to utilize the GPU –Selling that SDK to NVIDIA
December 13, What is an SDK? S.D.K. – Software Development Kit A set of programs that allows software developers to create products to run on a particular platform or to work with an API. Include: Manual, Examples, Libraries Other examples, both free and commercial: –Java, OS/2, AW, Windows, DirectX
December 13, Phase 1 Product Goals Demonstrate amount of power in current GPUs –Also: Ability to utilize power Secure funding to continue development Secure interested parties – universities and research labs Take first steps towards NVIDIA partnership
December 13, Phase 1 Product Objectives Leverage the GPU for additional power Improve throughput on workstation machines Ease programming difficulty for utilizing the GPU Maintain current program compatibility Preserve system stability
December 13, Product Risks & Mitigations Vendor Support –NVIDIA sets aside $1billion to use on Acquisitions R&D Writing the Software –Time intensive product “Build first and optimize later”
December 13, Product Functional Diagram
December 13, Product Dataflow Diagram
December 13, Product Dataflow Diagram
December 13, Product Dataflow Diagram
December 13, Product Dataflow Diagram
December 13, Dataflow Diagram for Product
December 13, Prototype
December 13, Navier-Stokes Equations used to refer to the incompressible form of the momentum equation. a full and general set of differential equations governing the motion of a fluid
December 13, Navier-Stokes Equations Simulation of Fluid Like Behavior –Example of applications used within Computational Intensive Environments –Multiple Old Dominion PHD candidate’s thesis topics focus on Navier Stokes Will serve as a basis application to prove efficiency of GPU over CPU –Shows an average 60% gain in efficiency
December 13, Prototype Functional Diagram
December 13, Dataflow Diagram for Prototype
December 13, Dataflow Diagram for Prototype
December 13, Demonstration Two versions of an executable –CPU vs GPU Navier Stokes on a vector field with four jets –Demonstration will consist of firing the jets for different lengths of time and observing performance –Observe CPU alone –Observe GPU alone –Observe Simultaneously
December 13, On the CPU
December 13, With GAP on the GPU
December 13, Risks Main research issues include quality of floating point –The numbers are ‘single precision’ not double. Works best when ‘batched,’ which requires a relatively ‘parallel’ system –Already a multithreading issue. Solutions both in programmer practice and compiler design exist.
December 13, Risks Mitigated (Prototype) Floating Point Quality: –Distributed the field thickly enough that floating point was accurate. Batching: –Used “Stream” operator that ensured a command size was sufficient before it flushed the results.
December 13, Risk Mitigation (Product) Floating Point –NVIDIA says cards will include double precision upon demand –NVIDIA partnership will expedite. Batching –The Context system has an internal, self optimizing queue, with the “flush” instruction for programmer flexibility.
December 13, Testing and Evaluation 20 Frames to 1 “real world second” –Translates: speed on GPU –Faster than a “real world second”! speed on CPU
December 13, Suitability What does this prove? –Gives magnitude of performance increase –Efficiency gain with no new hardware –“Real world” problem solved –Standard interface any program could use
December 13, Degree of Completeness Similarities Prototype General access functions “Context” based input Demonstrated performance gain Utilizes GPU for as much work as possible Release General access functions “Context” based input Demonstrated performance gain Utilizes GPU for as much work as possible
December 13, Degree of Completeness Differences Prototype Specific to GF5 platform Limited GAP Commands “All or Nothing” GPU use Release General platform Wide array of GAP commands Dynamic GPU use based on capabilities
December 13, Budget Reports
December 13, Phase I Funding Phase I SBIR –Completed at the end of Phase 0
December 13, Phase I Budget Staff
December 13, Phase I Budget
December 13, Major Milestones Phase I Organize Project Group Produce Project Descriptive Paper Develop Contracts Produce Budget White Paper Produce Project User Manual Develop Prototype Produce SBIR Phase II Proposal Produce Project Website
December 13, Phase I Schedule
December 13, Phase II Funding Phase II SBIR –Completed at the end of Phase I
December 13, Phase II Budget Staff *4 programmers needed
December 13, Patent Acquisition
December 13, Phase II Budget * Purchased in Phase 1
December 13, Major Milestones Phase II Production Marketing Legal Negotiation Final Preproduction Alterations
December 13, Phase II Schedule
December 13, Phase III We plan to sell the product to NVIDIA at the end of Phase II Doing so would mitigate all responsibilities and risk factors that may arise on the market –While we increase the companies profit by over $6.5 million
December 13, Profit Margin/Break Even Immediate Profit $70 million average profit for acquisitions –If we obtain 1/10(average) –We would still make a $6.5 million gain
December 13, Profit Margin/Break Even Phase 1 Budget Phase 2 Budget Total GAP Acquisition by NVIDIA $7,000,000 NET PROFIT$6,592,000
December 13, Conclusion Through our prototype we have achieved “proof of concept” The overall efficiency gain obtained within computationally intensive environments proves a need for GAP
December 13, G raphical A symmetric P rocessing