A First Look at AMD’s Threadripper Pro in Lenovo’s ThinkStation P620

AMD’s Threadripper line of processors has been available for High-End Desktop users since August 2017.  They compete with Intel’s Core-X lineup, for users who want higher performance, quad channel memory, and more cores than gaming focused systems.  And during the last three years, numerous smaller vendors have sold Threadripper based desktop systems as “workstations” due to their high performance.  But in July, AMD announced their Threadripper PRO lineup, which brings a number of new professional features to the Threadripper lineup, making them more comparable, (but far more powerful and flexible) to Intel’s Xeon-W line of processors.  These new features include double the overall memory bandwidth at 8 channels, twice as many PCIe 4.0 lanes at 128, as well as a number of enterprise level security and system management features, branded AMD PRO Security.  Currently AMDs Threadripper PRO line of chips is only available in Lenovo workstations.

There are four chips in the Threadripper PRO lineup, with 12, 16, 32, or 64 cores.  The highest end 3995WX, with a ridiculous 64 physical cores, and 128 threads, is composed of 8x Zen2 based 7nm chiplets, each tied to memory, I/O and each other, with Infinity Fabric interconnect.  Those chiplets are each composed of 2 quad core processing unit clusters, each having 16MB of cache, totaling 256MB of L3 cache and 32MB of L2 cache for the whole unit, communicating with the rest of the system over the new sWRX8 socket.  I received one of these mammoth CPUs inside one of Lenovo’s new ThinkStation P620 systems.

The ThinkStation P620 is workstation in the strictest sense of the word.  It incorporates years of engineering from the rest of Lenovo’s ThinkStation lineup, including their Trusted Platform Module implementation, and other system management tools.  It is built around a custom motherboard designed for maximum power and reliability.  This is fit into a well designed case, that borrows much from the design of Lenovo’s existing Intel based P520 & P720 models.  The tool-less chassis has two external 5.25 “Flex bays,” up to four 3.5″ internal hard drive bays, and two PCIe 4.0 M.2 slots.  It’s six PCIe 4.0 slots, all fully wired, with four x16 slots and two x8 slots.  It has front and rear USB and audio, an internal USB3 port, and integrated 10Gb ethernet.  The system I reviewed had a DVD-RW drive and media card reader installed in one Flex bay, leaving room for me to add a BluRay writer or other device if desired.

With six slots, I had ample room to install my 3 slot beast of a GPU GeForce 3090, as well as a Kona 5 for HDR output, and a 40Gb Ethernet card for connection to my storage server.  There was no puzzle to figure out what goes where, because all of the slots are wired at full bandwidth, and directly to the CPU.  (Other systems may only wire an x16 slot at x4, or only activate certain slots depending on which CPU is installed.)  This is also the first time I have been able to utilize my Ampere GPU’s PCIe 4.0 interface for increased bandwidth from the system.  My Kona card and 40GbE card are limited to PCIe 3.0 speeds, so it makes no difference there.

The removable 1000W power supply is incorporated directly to a slot on the motherboard, with plugs on the board in the right locations, allowing you to customize the power connections to your PCIe cards and drives.  It came configured with two 6+2pin connectors and two plain 6pin connectors.  This allows for their advertised two Quadro RTX 8000 or four Quadro RTX 4000 cards.  In my case I adapted the two 8pin connectors to NVidia’s new 12pin power connector on my GeForce 3090, and used one of the 6pin connectors to power my Kona5 card.  I imagine we might eventually see a cable that gives users a 12pin power connector directly from the motherboard, as more Ampere cards make it to market with that format.

My system came with 32GB of DDR4 ECC RAM, on two sticks.  I would not recommend that configuration, both because it is not enough to feed a 64core CPU, and because it only offers one quarter of the potential eight channel memory bandwidth.  16GB is the smallest stick size Lenovo offers for this system, so filling all eight slots and channels would give you 128GB, which is an appropriate amount for many users.  My last two dual socket systems with many fewer total cores were configured with 128GB.  If you are using this system with a lower core count CPU, you could get away with less RAM, but might want to consider sourcing 8 smaller sticks to maximize bandwidth and value.  The included 1TB SSD was a Samsung PM981, which is an OEM version of their top PCIe 3.0 M.2 drive.  This was a bit disappointing, as I had been looking forward to trying out my first PCIe 4.0 SSD.  But really this SSD’s read and write speed of 3GB/s admittedly does more than I really need, even for 8K assets.  I anticipate that these systems will ship with PCIe 4.0 SSDs at some point in the future.

Turning the system on, I was greeted with a Lenovo logo and spinning dots as Windows loaded, much cleaner than most other system, and preferable until something goes wrong, when all that post code might be useful.  Boot up time with my config is about 60sec, which is similar to other workstations and servers I have tested.  One of the first things I did, was run the Cinebench R15 benchmarking tool, while waiting for my other apps to download and install.  This system’s multi-core score of 9872 more than tripled my previous records under 3200 from both my Ryzen 3 tests last year, and my older generation dual socket Xeon systems.  And looking online, this CPU holds pretty much all of the records from other various benchmarking tools.  I still use R15 because I can compare it to many previous tests I have run over the years.  Using the newer R23 version, which scores on a different scale, I got over 60,000, which far exceeds my previous system, but is still below what some other Threadripper users are reporting online, probably due to my limited RAM in this unit.

Once I got the software I needed installed and configured, I did some real world tests, exporting 8K RED files and other assets to HEVC in Premiere Pro and Adobe Media Encoder.  Most of my exports completed in half of the time of my previous tests, which were last month when reviewing the GeForce 3090 in my older workstation. Further testing with my Quadro P6000 in this Threadripper workstation revealed some interesting GPU trends that I will explore further in a separate article, but can be summarized with the advice to “match your GPU to your CPU.”  The 3090 is only marginally faster than an older generation P6000 in my old system, but twice as fast at many of those tasks in this workstation, when the rest of the system can keep up with it.

My initial playback tests in Premiere Pro were a bit disappointing, with big performance hits when enabling High Quality Playback (for HDR) and also when enabling Mercury Transmit to output from my Kona 5 card.  While trying to figure out what was causing those unexpected results, I installed the Lenovo Performance Tuner utility, which has a profile optimized specifically for Premiere Pro.  Now I have never experienced these system tuning and optimization apps making a noticeable difference in my system performance. (I like to think that is because I am pretty good at tuning system settings myself).  But in this case, once the Premiere Pro performance profile was activated, my playback greatly improved.  I could see certain effects it had, including setting the process affinity for Premiere threads to not use the first 25% of my cores, which in this system is 16, thus freeing them up to keep my UI running smoothly.  But I have not discovered what else it changes, and when I disables the profile, I still get good playback performance in Premiere.  I plan to follow up more with Lenovo’s team to learn more about what happened, but it is running smoothly now, so I recommend the Performance Tuning app to Adobe users.

And for the record, my playback tests are all pretty much torture tests, playing back RAW digital cinema files, and other 8K assets, with effects applied, in various 8K sequences, at full resolution.  So when I say I was having playback issues, that is because I test systems in such a way that if they play back my sequences, they should play back nearly anyone else’s.  When I created these projects three years ago, for my 8K workflow investigations, they could only be played back at half or quarter res.  Last year, newer systems could play some of them smoothly at full res, and now this system can play them back over Mercury Transmit, at maximum bit depth, which is important as workflows move towards HDR.  It still drops frames on occasion, and most users would probably edit at half resolution for a smoother experience, still seeing their work at 4K, but this system is capable of full resolution 8K editing in Premiere Pro.

I also did some After Effects render tests, in which this ThinkStation rendered out 5K composites for my Grounds of Freedom animation series more than twice as fast as my previous system.  This equates to about 3 hours of time saved per episode.  What’s more, Adobe showed off some multi-frame rendering tools in AE at MAX last year, and I am confident that once they are implemented in the shipping program, this 64-core system will really stand apart from the competition.

Blender is an open source 3D modeling and animation program that I started using to benchmark NVidia GPUs, but it is also a good way to measure CPU performance.  It can render using CPU, CUDA, or Optix, and usually with a good GPU, Optix is faster than CUDA, which is way faster than CPU rendering.  With this Threadripper PRO system, the CPU is on par with CUDA rendering, beating it in certain tests, but Optix is still about twice as fast.  But that is a 6x improvement over CPU rendering on my older 16-core Xeon system, similar to my results in Cinebench R23.

One issue I did run into after connecting up my 8K monitor to the GeForce 3090, is that the system had trouble booting.  It works correctly if I connect the display after the system has successfully booted, or if I use a different GPU.  But I of course want to use my highest res display with my most powerful graphics card.  I avoid that issue for the time being by sleeping the system at night, and it wakes up without issue.  But I am confident that working with Lenovo, we can find a solution to that, since this isn’t the first time I have discovered compatibility issues with that 8K display, which is exactly why I test with it.  That only affects a small number of users, but for those of you who use or need an 8K display, check on that before purchasing a system.  I will post an update once we find a solution to that problem.

RedCineX and Red Player are able to playback all of my Red files at full resolution in real-time, to my 8K monitor, once I selected CUDA processing over the default of OpenCL.  It played everything from 8K anamorphic assets on down to older 4K R3Ds, even ones that have dual image streams for HDRx, without breaking a sweat.

So who needs this system?  The biggest benefit across the entire line, is bandwidth.  It has more PCIe bandwidth and more memory bandwidth, which helps when you are pushing large volumes of data around inside the system.  So this is a bigger deal with 8K assets than 4K or UHD assets, and hardly applicable to HD workflows.  Higher frame-rates like 60p and 120p also require more bandwidth throughout the system.  So anyone processing high resolution video would benefit from the improvements in system architecture.  Is the 64-core CPU worth the $10K or so it adds to the price?  Only if you can fully utilize those cores, which besides people who otherwise offload their processing to a supercomputer, is currently pretty much limited to animation and certain VFX apps, although After Effects may get there soon.  At GTC in September I watched a demo from David Helmly run on one of these with the 16-core 3955WX CPU option, which he claimed was the optimal choice for Premiere Pro.  This makes sense as it has a higher clock speed, and Premiere rarely threads enough tasks to occupy even a quarter of this 64 core CPU.  So that processor would provide a better value in this system for most editors, while still offering all of the bandwidth and other architectural benefits of this top end chip that I am using.  I would definitely recommend one of these Threadripper systems over Intel’s similar options, just because of the increased PCIe bandwidth, with more lanes at twice the speed with PCIe 4.0.  I have been an Intel user for many years, and when I tried a 3rd gen Ryzen system last year, I thought AMD was catching up to Intel’s dominance in the workstations space.  Now I would say that this Lenovo P620 system is evidence AMD has clearly surpassed them.

When I tried to match the configuration they sent me with a custom order online, it came to $15,200 MSRP, on sale for $8860.  (But I am also writing this the week of Cyber Monday sales.)  Switching to the 16-core processor lowers the sale price to $3260, which is very reasonable for a system this powerful.  Keep in mind that I added my own GPU, Output card, Monitors, and storage solution, which are important components to budget for when setting up a complete editing solution.  I usually add lots of components to workstations I buy, but I learned years ago that starting with an ISV certified solution, based on a motherboard and chassis from a major system manufacturer, is usually well worth the added initial expense, in the form of system reliability and support.  And this system is currently the only one I am aware of, that allows professional users to take advantage of AMD’s architecture.  But based on the performance and bandwidth that it offers, I am sure many more will follow in the years to come.  And even this one will only get better as AMD and Lenovo continue to develop their drivers and firmware, and software vendors optimize their applications, to further leverage what is currently a brand new product, with a lot of potential.

Leave a Reply

Your email address will not be published. Required fields are marked *