New Pascal based Quadro GPUs

Mike McCarthy   February 5, 2017   Comments Off on New Pascal based Quadro GPUs

NVidia announced a number of new professional graphic cards today, filling out their entire Quadro lineup with models based on their newest Pascal architecture. At the absolute top end, there is the new Quadro GP100, which is a PCIe card implementation of their supercomputer chip. It has similar 32bit (graphics) processing power to the existing Quadro P6000, but adds 16bit (AI) and 64bit (simulation). It is intended to combine compute and visualization capabilities into a single solution. It has 16GB of new HBM2 (High Bandwidth Memory) and two cards can be paired together with NVLink at 80GB/sec to share a total of 32GB between them.

This powerhouse is followed by the existing P6000 and P5000 announced last July. The next addition to the lineup is the single-slot VR-Ready Quadro P4000. With 1792 CUDA cores running at 1200Mhz, it should outperform a previous generation M5000 for less than half the price. It is similar to its predecessor the M4000 in having 8GB RAM, 4 Displayport connectors, and running on a single 6pin power connector.  The new P2000 follows next with 1024 cores at 1076Mhz and 5GB RAM, giving it similar performance to the K5000, which is nothing to scoff at. The P1000, P600, and P400 are all low profile cards with Mini-Displayport connectors.

All of these cards run on PCIe Gen3 x16, and use DisplayPort 1.4, which adds support for HDR and DSC.  They all support 4Kp60 output, with the higher end cards allowing 5K and 4Kp120 displays.  In regards to high resolution displays, NVidia continues to push forward with that, allowing up to 32 synchronized displays to be connected to a single system, provided you have enough slots for eight Quadro P4000 cards and two Quadro Sync II boards.  Under the hood, the P4000 appears to be based on the GP104 chip, and is similar to the consumer GTX1070, but scaled back to allow a single slot solution, with a 105W TDP.  The P2000 has a lot of changes from its closest consumer equivalent,  the GP107 based GTX1060, including a 160bit memory bus and 5GB RAM in a single slot 75W package.  NVidia also announced a number of Pascal based mobile Quadro GPUs last month, with the mobile P4000 having roughly comparable specifications to the desktop version, which is an improvement over their previous misleading naming approach.

But you can read the paper specs for the new cards elsewhere on the internet. More importantly, I have had the opportunity to test out some of these new cards over the last few weeks, to get a feel for how they operate in the real world. I was able to run tests and benchmarks with the P6000, P4000, and P2000 against my current M6000 for comparison. All of these test were done on a top end Dell 7910 workstation, with a variety of display outputs, primarily using Premiere Pro, since I am a video editor after all.

I ran a full battery of benchmark tests on each of the cards using Premiere Pro 2017.  I measured both playback performance and encoding speed, monitoring CPU & GPU utilization, as well as power usage throughout the tests.  I had HD, 4K, and 6K source assets to pull from, and tested monitoring with an HD projector, a 4K LCD, and a 6K array of TVs.  I had assets that were RAW R3D files, compressed MOVs, and DPX sequences.  I wanted to see how each of the cards would perform at various levels of production quality, and measure the differences between them, to help editors and visual artists determine which option would best meet the needs of their individual workflow.

I started with the intuitive expectation that the P2000 was probably be sufficient for most HD work, but that a P4000 would be required to effectively handle 4K.  I also assumed that a top end card would be required to playback 6K files and split the image between my 3 Barco Escape formatted displays.  And I was totally wrong.

Besides when utilizing the higher end options within the Lumetri based color corrector, all of the cards were fully capable of every editing task I threw at them.  To be fair, the P6000 usually renders out files about 30% faster than the P2000, but that is a minimal difference compared to the costs.  Even the P2000 was able to playback my uncompressed 6K assets onto my array of Barco Escape displays without issue.  It was only when I started making heavy color changes in Lumetri that I began to observe any performance differences at all.

Color correction is an inherently parallel, graphics related computing task, so this is where GPU processing really shines.  Premiere’s Lumetri color tools are based on Speedgrade’s original CUDA processing engine, and it can really harness the power of the higher end cards.  The P2000 can make basic corrections to 6K footage, but it is possible to max out the P6000 with HD footage if I adjust enough different parameters.  Fortunately most people aren’t looking for more stylized footage than the “300” had, so in this case, my original assumptions seem to be accurate.  The P2000 can handle reasonable corrections to HD footage, the P4000 is probably a good choice for VR and 4K footage, while the P6000 is the right tool for the job if you plan to do a lot of heavy color tweaking or are working on massive frame sizes.

The other way I expected to be able to measure a difference between the cards would be in playback while rendering in Media Encoder.  By default Media Encoder pauses exports during timeline playback, but this behavior can be disabled by reopening Premiere after queuing your encode.  Even with careful planning to avoid reading from the same disks as the encoder was accessing from, I was unable to get significantly better playback performance from the P6000 compared to the P2000.  This says more about the software than it says about the cards.

The largest difference I was able to consistently measure across the board was power usage, with each card averaging about 30 watts more as I stepped up from the P2000 to the P4000, to the P6000.  But they all are far more efficient than the previous M6000, which frequently sucked up an extra 100 watts in the same tests.  While “watts” may not be a benchmark most editors worry too much about, among other things it does equate to money for electricity.  I estimate that a swapping my M6000 for a new P6000 would save me 3KwH/day if I left my system on all the time for remote access, which would be about $300/year.  Lower wattage also means less cooling is needed, which results in quieter systems that can be kept closer to the editor without being distracting from the creative process or interfering with audio editing.  It also allows these new cards to be installed in smaller systems with smaller power supplies, using up fewer power connectors.  My Z420 workstation only has one 6pin PCIe power plug, so the P4000 is the ideal GPU solution for that system.

It appears that we have once again reached a point where hardware processing capabilities have surpassed software capacity to use them, at least within Premiere Pro.  This leads to the cards performing relatively similar to one another in most of my tests, but true 3D applications might reveal much greater differences in their performance.  Further optimization of CUDA implementation in Premiere Pro might also lead to better utilization of these higher end GPUs in the future.

For those who are interested, this is the exact breakdown of how much each of these Premiere based playback configurations taxed the various GPU options:

And this is the data from the encoding speed tests:

Regardless of the exact numbers, the new Pascal Quadro Cards are more powerful and energy efficient across the board than their previous generation counterparts, and my productions and clients will benefit from their new features.