Advanced Computer Graphics (CS & SE 233.420) Lecture 7 CREDITS • Bill Mark, “NVIDIA Programmable Graphics Technology,” SIGGRAPH 2002 Course. • David Kirk, “GPUs and CPUs:The Uneasy Alliance”, Panel Presentation, ACM Workshop on General Purpose Computing on Graphics Processors. • David Kirk, “The Future: Programmable GPUs and Cinematic Computing”, NVidia Developer Documents, developer.nvidia.com/docs/io/4106/Technology_Directions.pdf -Doug James • GPU History taken from CG Tutorial Book • www.nvidia.com • Teaching CG powerpoint presentation provided by NVIDIA • ACM Workshop on General Purpose Computing on Graphics Processors presentations, http://www.cs.unc.edu/Events/Conferences/GP2/program.shtml Graphics Processing Units History Processors • The processor is the heart of any normal computer. – Intel Pentium, a 64-bit AMD Opteron, or any of the many other brands and types of processors • Most microprocessors are central processing units (CPUs) – Complete computation engines that is fabricated on a single chip – General purpose and execute applications written in general purpose languages, such as C or Java • Until a few years ago, graphics programmers options – CPU to process all the transformation and rasterization algorithms – Very expensive hardware Graphics Processing Unit (GPU) • Technically – a GPU is a single chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second NVIDIA • In practice – computationally intensive transform and lighting calculations are now offloaded from the CPU onto the GPU GPUs • Designed is specialized for graphics tasks • Process – tens of millions of vertices per second – rasterize hundreds of millions of fragments per second • Process certain tasks much faster than CPUs – Dot Products – Vector-Matrix calculations Pre-GPU • Graphics acceleration relied on – Integrated graphics architectures – Specialized and expensive graphics hardware pipeline • Silicon Graphics • Evans & Sutherland • These architectures introduced concepts that are still relevant in modern graphics – Vertex transformations, texture mappings First Generation (pre-1998) • NVIDIA TNT2, ATI Rage, 3dfx, Voodoo3 • Capabilities – Rasterizing pre-transformed triangles – Applying one or two textures – Implement the DirectX 6 feature set – When running 2D and 3D applications • Completely relieve the CPU from updating individual pixels • Limitations – Lack the ability to transform 3D vertices • Still transformed on the CPU – Limited set of math operations for combining textures Second Generation (1999 – 2000) • NVIDIA GeForce 256, GeForce2, ATI Radeon 7500, S3 Savage 3D • Capabilites – Offloaded 3d Vertex Transformation & Lighting from the CPU – Fast vertex transformation now capable on a PC – Can implement OpenGL and DirectX 7 feature set in Hardware – Math ops for combing textures and coloring pixels expanded • Cube map textures • Signed math operations • Limitations – Architecture configurable, but not truly programmable – Still have a limited set of math operations for combining textures Third Generation (2001) • NVIDIA’s GeForce3 and GeForce4 Ti, Microsoft’s Xbox, and ATI’s Radeon 8500 • Capabilities – Provides vertex programmability rather than merely offering more configurability • These GPUs let the application specify a sequence of instructions for processing vertices – DirectX 8 pixel shaders and various vendor-specific OpenGL extensions expose this generation’s fragment-level configurability • Limitations – Pixel-level configurability is available, but these modes are not powerful enough to be considered truly programmable. – Because these GPUs support vertex programmability but lack true pixel programmability, this generation is transitional Fourth Generation (2002-2004) • GeForce FX family with the CineFX architecture and ATI’s Radeon 9700 • Capabilities – Provide both vertex-level and pixel-level programmability. • This level of programmability opens up the possibility of offloading complex vertex transformation and pixel-shading operations from the CPU to the GPU. – DirectX 9 and various OpenGL extensions expose the vertex-level and pixel-level programmability of these GPUs. Fifth Generation (2004 - ) • GeForce 6 with PCI-Express bus architecture • Capabilities – New features include GDDR3 memory – Compatibility with the new Pixel Shader 3.0 programming model – Microsoft DirectX 9.0 Shader Model 3.0 features set – An on-chip video processing engine • Allows for high-definition video and DVD playback – 8 operations per pixel at a 60th of a second – Support for OpenEXR • The open standard for filtering in graphics as well as support for MPEG encode and decode, as well as support for Windows Media Video 9 • The GPU's loops and branches are also programmable and are able to write video code to the chip Transistor Count of CPU & GPU Processors Year Product Name Transistors 1995 Pentium® Pro Processor 5.5 Million 1997 Pentium® II Processor 7.5 Million 1998 Pentium® II Xeon Processor 7.5 Million 1999 Celeron Processor 7.5 Million 1999 Pentium® III Processor 9.5 Million 1999 Pentium® III Xeon Processor 9.5 Million 1999 Pentium® III E 28.1 Million 2000 Pentium® 4 42 Million 2001 GeForce 3 57 Million 2002 GeForce4 Ti 63 Million 2003 GeForceFX 125 Million 2004 GeForce6 220 Million PCI Express™ Bus Architecture • Interface eliminates the bottlenecks caused by previous bus architectures • Allows for maximum system performance in a multi- GPU configuration • Double the bandwidth of the AGP 8X graphics bus • 4GB per second in both upstream and downstream data transfers • Opens the door to a truly parallel graphics bus architecture Programmable GPU Pipeline Evolution Graphics Pipeline Vertex Transformation Primitive Assembly And Rasterization Fragment Texturing and Coloring Raster Operations Vertices Transformed Vertices Pixel Positions Vertex Connectivity Colored Fragments Pixel UpdatesFragments GPU Programmable Pipeline Application VertexProcessor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer GPUCPU Textures Programmable Components Fifth Generation NVidia GeForce 6 Pipeline vertex setup rasterizer pixel texture image programmable per-pixel texture, and fp16 blending programmable vertex processing (fp32) programmable per- pixel math (fp32) polygon polygon setup, culling, rasterization Z-buf, fp16 blending, anti-alias (MRT) memory So what does this mean in terms to true graphics capability? Nvidia Dawn Demo GeForce FX Demonstration Video • Two key vertex shaders drive her motion: – Branching skeletal shader • the body mesh is driven by several different combinations of internal bones – Blend shape shader • deforms her face based on control parameters. • Skin Shader – A complex combination of color maps, specular maps, and blood characteristic maps to produce very realistic skin. – Lighting subtleties are accomplished with a series of cube maps for diffuse specular and "highlight" skin lighting. • Wing Shader – A translucent shader is used for the wings. – Modifies both the reflected color off the wings as well as the amount of light passing through the wings based on viewing and light angles. Video courtesy of Nvidia and CG Tutorial CD http://www.nzone.com/object/nzone_downloads_nvidia.html Dawn.mpg Nvidia Toys GeForce FX 5800 Demonstration Video • Cinematic Camera Effects – Pixel shaders are used to simulate camera effects like depth of field and full-scene blurring from an auto-focus lens • Realistic Material Shaders – Special pixel shaders add realism to the toy models in the scene – Special plastic shader for the tank, robot and other plastic models. – Painted wood shader is used for the wood blocks – Brushed metal shader for the flying saucer Video courtesy of Nvidia and CG Tutorial CD http://www.nzone.com/object/nzone_downloads_nvidia.html Nvidia Time Machine GeForce FX 5800 Demonstration Video • Time-based Shaders – Each of the aging materials has a single pixel shader associated with it. – Shaders use a variety of texture map inputs (color, bump, specular, reflection, surface reflectivity, and reveal maps) to produce a seamless transition of surface material effects over time. Video courtesy of Nvidia and CG Tutorial CD http://www.nzone.com/object/nzone_downloads_nvidia.html NVIDIA Nalu GeForce6 Mascot • Dense hair is simulated in real time and lit by a technique called "deep shadows" where the topmost hairs glow brightly from exposure to the light, while the lower hairs are darker. • Her skin is lit by the light refracted through the water's surface, her body and hair casting soft shadows on her as she swims. • Soft shafts of light filter down from the surface and are blocked by her silhouette using world-class, render-to- texture capabilities. • Her high-resolution skin transitions into a highly detailed scale shader that features the same soft shadowing as the skin, but adds a more noticeable bumpmap, iridescence, and bio-luminescence. • The final render pass (19 in all) provides a soft glow that allows the bright light on her hair and skin to bloom on the screen. • See GeForce 6 Series Demo Collage from web page below Video courtesy of Nvidia and CG Tutorial CD http://www.nzone.com/object/nzone_downloads_nvidia.html How do you Program a GPU? Need the Right Hardware • Anything greater that a GeForce 4 – Know what you can do with the card – Make sure you have the right driver • Bus architecture will affect how you data from CPU to GPU – AGP 4x is more limiting that 8x – PCI-Express quickly making AGP card/mother boards obsolete • NVIDIA leaders in the Programmable GPU card development Need a Programming Language or API • Programming powerful hardware with assembly code is hard • GeForce 6 supports programs more than 1,000 assembly instructions long • Programmers need the benefits of a high-level language: – Easier programming – Easier code reuse – Easier debugging • Available APIs or Programming Languages – OpenGL, OpenGL Shader Language, CG Low-Level APIs • Low-level APIs – Similar to assembler language • Close to hardware functionality • Input: Vertex/fragment attributes • Output: new vertex/fragment attributes • Sequence of instructions on registers • Platform dependent • Current Low-level APIs – OpenGL extensions (OpenGL 2.0) • GL_ARB_vertex_program • GL_ARB_fragment_program • DirectX: – Vertex Shader, Pixel Shader CG - The High-Level Language for Graphics • Cg is an open-source high-level shading language to make graphics programming faster and easier • Cg replaces assembly code with a C-like language and a compiler • Cg is cross-API (OpenGL & DirectX) and cross- platform (Windows, Linux, and Mac OS) • Cg is a key enabler of cinematic computing • http://developer.nvidia.com/CgTutorial/ OpenGL 2.0 Shading Language • A high-level procedural shading language for OpenGL – Part of the core OpenGL 2.0 specification • Designed to allow application programmers to create shaders for programmable vertex processing and fragment processing – Allows developers to take total control over the most important stages of the graphics-processing pipeline • Based on ANSI C • http://www.opengl.org/documentation/oglsl.html Books • OpenGL® Shading Language, Randi J. Rost • Cg Tutorial, The: The Definitive Guide to Programmable Real-Time Graphics, Randima Fernando, Mark J. Kilgard. • GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics, Randima Fernando • NVIDIA GPU Programming Guide, http://developer.nvidia.com/object/gpu_programming_guide.html Project, Paper, and Presentation Project • Focus on the aim of your project • Main point about these projects is to gain knowledge in a specific area of computer graphics that is of interest to you • It is up to you to make the project interesting • Be practical Paper • Conference style paper – Web page provides link to the ACM/Siggraph paper submission – 6-8 pages • Suggested format – Abstract (do this last) 1. Introduction 2. Background 3. The Model 4. Simulation Results (very dry) a. List graphics card, compilers, code (OpenGL 1.?, C/C++, CPU, glut Libs) 5. Discussion (of results) 6. Conclusion 7. Acknowledgments 8. References Presentation • 10 minutes (5 Minutes for questions) – 10 slides max !!! • Format – Tell ‘em what your are going to say • Introduction – Say it • Background, Model, Simulation, Discussion – Tell ‘em what you said • Conclusion • Demonstration – Go over to the lab and demonstrate your code Due Date • Week of 13-17 June – Need 3 hours for presentations • Tues - 14 Jun, 2pm-5pm – Need 3 hours for demonstrations • Thur - 16 Jun, 2pm-5pm • Papers/Web page due at beginning of demonstrations