火花奇遇记:自动化冒险

火花奇遇记:自动化冒险

评价数不足
UE5 crashes on 13th and 14th Gen Intel CPUs
由 Mazian 制作
Reference material-- not a troubleshooting guide.
   
奖励
收藏
已收藏
取消收藏
Preface
WARNING: Not all symptoms are indicative that your problem is the Intel 13th and 14th Gen Intel CPU, and I've seen reports of these symptoms from AMD Ryzen CPU users which are obviously not related to this problem.

So don't assume that just because you're having similar problems that it is a hardware problem.

For my part, I can play lots of non-Unreal Engine games without tripping over this defect and I was highly skeptical until I followed the rabbit hole far enough. My personal journey included Oodle bundled with UE5, and Oodle is a heavy CPU user for asset decompression, which is heavy math. When Oodle crashed it gave me a link to an Intel announcement including a root cause analysis. After following that rabbit hole far enough, it led me back to my PC manufacturer who was able to recommend BIOS tweaks which resolved both the Oodle crash and the UE5 crashes.

I am sharing in case others have a similar experience and need compelling evidence before they'll contact their PC manufacturer support team (or motherboard support channels if you self assembled.)
Summary resources
Intel Root Cause
https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Intel-Core-13th-and-14th-Gen-Desktop-Instability-Root-Cause/post/1633239

Related crash discussion on the official Oddsparks discord server
https://discord.com/channels/601335542029746176/1333087982709903512
"Consistently getting exception_access_violation UE crashes to desktop" thread
MMT - Robert shared multiple reddit threads (which didn't help me, but are supporting evidence.)

"Whocrashed" app
https://www.resplendence.com/whocrashed

My specs & related crashes
WhoCrashed reported

  • SYSTEM_SERVICE_EXCEPTION
  • VIDEO_DXGKRNL_LIVEDUMP
  • STATUS_SYSTEM_PROCESS_TERMINATED
  • VIDEO_TDR_ERROR

Note that WhoCrashed did not report the application level crashes such as EXCEPTION_ACCESS_VIOLATION that Oddsparks would kick out, or the Oodle crash that happened with the Unknown Worlds game that led me down this rabbit hole.

WhoCrashed did report that "SYSTEM_SERVICE_EXCEPTION" was described as "This indicates that an exception happened while executing a routine that transitions from non-privileged code to privileged code. " AND "STATUS_SYSTEM_PROCESS_TERMINATED" was described as "This is possibly a software problem. This is likely a case of memory corruption.
Memory corruption can be caused by a faulty driver, faulty RAM, overheating and more. Read this article on memory corruption. Read this article on thermal issues"

Neither of those descriptions are accurate in my scenario. The first three were identified by my PC manufacturer support as "memory related" and likely associated with this problem.

My specs

Intell i9-13900K
ASUS Prime Z790-P Wifi
nVidia GTX 3090

Other specs omitted as not-relevant.
Summary of my fix
WARNING: I am not engaging in PC support for others and do not have any additional information than what is in this guide. I will ignore most comments as this is intended purely as a breadcrumb to start you on your journey towards your PC or motherboard support teams and not a channel for community support.

Along those lines I'm intentionally omitting the specific fix for my hardware as I am not expecting many (or any) others to have my exact set up-- however, if anyone writes in the comments that they have my exact hardware set up I'll be happy to update this section with the specific settings my PC manufacturer support had me apply. (I didn't write them down but I'm savvy enough to copy them from the BIOS if that ultimately becomes necessary.)

  • Verify BIOS is latest version for my motherboard (source: ASUS support website)
  • Verify crash indicators using WhoCrashed (source: resplendence website)
  • Verify Display drivers are latest (source: NVIDIA website)
  • Verify MEI and Chipset drivers are latest (source: ASUS support website)
  • Verify Windows OS is patched to relatively recent patch level (source: Windows Update)
  • Apply BIOS tweaks to sync CPU cores, limit RAM speed to the speed of my RAM sticks, limit power draw for CPU (source: PC manufacturer support team)
  • Re-apply MEI And Chipset Drivers
  • Registry scan using SFC /SCANNOW (source: Microsoft CLI)
  • Clean uninstall of display drivers (source: download and run DDU - Display Driver Uninstall)
  • Clean install of latest NVIDIA display drivers (source: NVIDIA website)
Closing comments
It's important to understand that just because you get EXCEPTION_ACCESS_VIOLATION errors doesn't mean that the problem is your PC. One Early Access game I am playing generated a consistent error when I performed specific actions in a specific order, but didn't generate any OS errors, even after applying the above fix. When I changed the order of my actions, the error stopped happening.

The fact that we have multiple systems and platforms interacting in concert makes finding a specific problem like this very difficult.

There are no tools available for consumers to assess if the CPU is requesting too much wattage.

If you have a 13th or 14th Gen Intel CPU and similar problems, then it might be beneficial to make sure that you're on BIOS versions after this Intel fix was published and that you're using the BIOS tweaks recommended by your PC manufacturer or Motherboard manufacturer (depending on whether you purchased your system assembled or self assembled your system.)

I didn't include my tweaks as they are only applicable to my specific hardware configuration. I didn't include them for that reason, but if there is enough demand (subjective assessment criteria) I might include my BIOS fixes for comparison purposes-- regardless, you should definitely contact your available support channels.

Check with your PC manufacturer or Parts Supplier for extended warranty and terms
Also-- my PC manufacturer informed me that my CPU hardware warranty had been extended because of this problem, and if it was determined that my CPU had been damaged by this defect that I could RMA the CPU (I am still within my extended warranty period.) However, my PC manufacturer also informed me that only a tiny percentage of affected users have had their CPUs damaged. The vast majority of their customers had aberrant operation without physical CPU damage.

I don't think I was one of the customers who experienced CPU damage based on the last 48 hours of testing.