幸福工厂

幸福工厂

评价数不足
Recovering Dedicated Server Saves with Orphaned Drone Data
由 RhythmRelax24 制作
Fixing drone data corruption when receiving an EXCEPTION_ACCESS_VIOLATION error. This is part of a long-standing problem within the game itself: the Null Pointer Crash. This has been documented for many years related to numerous parts of the game. This is still an issue with drones in V1.1.1.4.
   
奖励
收藏
已收藏
取消收藏
🛑 Save-Bricking Fatal Crash on Dedicated Server: Drone Data Corruption at Scale (v1.1.1.4)
The Problem (Quick Version)
After unexpected server crashes or shutdowns, drones can become "orphaned" where they exist in your save file without valid references to their home drone ports. This can prevent server loading and they create warnings in logs and the drones won't function.
Symptoms

In my case, I didn't notice the error until the game crashed and the server would not restart. However, I did notice in game problems with deleting and creating drones, and drones that wouldn't dock. In summary:

  • Log warnings: "Drone has no home station and will not function properly"
  • Save file integrity warnings
  • Non-functional drones (never docking)

Server Log Output

Here's the error:
LogDrones: Error: AFGDroneVehicle::PostLoadGame_Implementation - Drone "BP_DroneTransport_C_2144895073" has no home station and will not function properly. Save may have been edited.

The Fix
Tool needed: Satisfactory Calculator Interactive Map [satisfactory-calculator.com]

Steps

Quick fix:
  • Open the Satisfactory Calculator Interactive Map
  • Click or drag your save file to "Click/Drop Your Save Game Here"

Save file location (Windows 11) is at:

{C:}\Users\<username>\AppData\Local\FactoryGame\Saved\SaveGames\server

This tool is not without issues, and is not ideal in that the tool causes unintended problems with the save file. After using SCIM to delete only orphaned drones, my server logs revealed missing mercer shrines, a deleted storage box, and other issues.

Find orphaned drones

Look for drones not docked to any station and check for drones mid-flight. Delete the problematic drones:

  • Right-click the drone
  • Select "Delete Building"

Then download the corrected save file and rename it to match your latest autosave name.

Example: YourWorld_autosave_2.sav

Replace the corrupted save file and restart your server. Check logs.

Technical Background
When servers crash during drone operations, the save file can capture drones in states where their home station references are broken or missing. The game loads these objects but they're non-functional. Manual save file editing is currently the only solution.

Longer Explanation
The Technical Problem: A Corrupted Drone Reference

My server logs trace the immediate crash to the drone logic: an EXCEPTION_ACCESS_VIOLATION within Windows 11. An access violation occurs in unmanaged or unsafe code when the code attempts to read or write to memory that has not been allocated, or to which it does not have access. This usually occurs because a pointer has a bad value. An access violation usually indicates that several reads or writes have occurred through bad pointers, and that memory memory might be corrupted. Thus, access violations almost always indicate serious programming errors.

What it means
A Drone object in my save file tried to execute a core function (like moving or docking), but its required link to its Home Port (`mHomeStation`) was invalid (a null pointer). The game code failed to safely check if the link was there before trying to use it, causing the entire program to be terminated by the operating system.

Why This is a Complex Problem for Developers (and Why It Matters to You)

This isn't about simple programming mistakes; it's likely about scale and timing:

  • The Root Cause: A Race Condition: My factory has dozens of drones, which is key. These crashes often start as a Race Condition—a bug where two processes conflict because they run simultaneously (e.g., the Autosave function starts writing data just as a drone is deleting/updating its port reference).
  • The Failure Point: With massive factories and countless objects running in parallel, the chances of the unexpected crash occurring at the exact millisecond a drone object is in this unsynchronized state skyrocket. The corrupted drone is then permanently saved with a broken link.
  • The Result: The next time the server loads, that single corrupted link causes a fatal, unhandled exception that stops the game before it can even draw the map.

This means we aren't just looking for a simple bug fix; we're dealing with a difficult, low-probability timing issue in the core game/engine serialization code that becomes a high-probability disaster on large saves.

FOR DEVELOPERS

The fix requires addressing both the runtime logic (handling deletion) and the loading logic (handling corruption).
Eliminate Dangling Pointers (The Structural Fix)
The core issue is a dangling pointer—a field pointing to a memory address that no longer contains the expected object. In Unreal Engine (UE), this can often be prevented by leveraging the engine's built-in memory management features.
Use UPROPERTY() on References

  • The mHomeStation field on the \text{Drone} object should be a pointer to the \text{Drone Port} (likely an \text{AActor} or \text{UObject})
  • Crucially: This pointer must be marked with the \text{UPROPERTY()} macro
  • Why? When an object that is referenced by a \text{UPROPERTY} field is destroyed, the Unreal Engine garbage collector automatically clears the reference, setting the pointer to \text{NULL} (or \text{nullptr}). This prevents the dangling pointer and is the engine's primary defense against this type of crash

Use TWeakObjectPtr for Non-Critical References

  • If the reference should not prevent the \text{Drone Port} from being garbage collected, the \text{Drone}'s reference should be a \text{TWeakObjectPtr}
  • Why? A \text{TWeakObjectPtr} does not keep the referenced object alive and is automatically set to an invalid state when the referenced object is destroyed, which can be safely checked using its \text{IsValid()} method

Defensive Runtime and Loading Logic (The Code Fix)
Even with the correct pointer types, code must always assume a state of corruption is possible.
Safety Check Before Dereference (The \text{NULL} Check)

  • The fatal crash occurs because the code trusts the pointer before using it.
  • Fix: Every piece of \text{Drone} logic that relies on \text{mHomeStation} must include a null/validity check.

<!-- end list --> // Current (CRASHES on invalid pointer): mHomeStation->GetDockingCoordinates(); // Fixed (Handles invalid pointer safely): if (mHomeStation && mHomeStation->IsValid()) { // Execute core logic safely mHomeStation->GetDockingCoordinates(); } else { // Execute recovery logic HandleOrphanedDrone(); // See point 2 }

Implement \text{HandleOrphanedDrone()} Recovery


The \text{else} block above must contain logic to safely clean up the orphaned \text{Drone}.
Recovery Actions
  1. Log a warning (which is already happening).
  2. Mark the \text{Drone} for deletion after the next game tick.
  3. As a temporary measure, force the \text{Drone} to despawn or return to the player inventory instead of crashing the entire server.

Addressing the Race Condition (The Concurrency Fix)
The race occurs during the multi-threaded interaction between gameplay and saving.
  • Atomic Transactions on Deletion: When a \text{Drone Port} is deleted, the entire process of finding all referencing \text{Drones} and updating their pointers must be treated as an atomic transaction relative to the save routine.
  • This usually means acquiring a write lock on the affected \text{Drone} objects (or the system manager that holds all references) before the deletion/update, and releasing it after the deletion/update is complete. This ensures the save thread cannot snapshot the system mid-update.
Lessons Learned for Developers
The persistence of this bug offers several critical lessons about developing large-scale, persistent simulation games (see comments)
3 条留言
RhythmRelax24  [作者] 9 月 30 日 上午 1:01 
Lesson 3: The Cost of an Unhandled Exception is the Highest ​The Problem: An \text{EXCEPTION\_ACCESS\_VIOLATION} is an unhandled, low-level error that forces a hard crash. ​The Lesson: For critical, user-generated content like save files, developers must wrap dangerous operations in try/catch blocks or, more commonly in game engines, use validity checks to divert the error into a handled, non-fatal event (e.g., logging an error and deleting the object) that allows the rest of the game to continue loading. When a crash occurs on load, it bricks the entire save file, which is the most damaging user experience possible. The priority must always be: The game must load.
RhythmRelax24  [作者] 9 月 30 日 上午 1:01 
Lesson 2: Embrace Engine Features for Memory Management ​The Problem: Relying on raw C++ pointers (\text{AActor} or \text{UObject} pointers without \text{UPROPERTY}) is dangerous in an engine like Unreal. ​The Lesson: Always use the engine's smart pointer and garbage collection mechanisms (\text{UPROPERTY()}, \text{TWeakObjectPtr}, etc.) for managed objects. These tools exist specifically to solve the dangling pointer problem and prevent corruption stemming from the multi-threaded nature of the engine.
RhythmRelax24  [作者] 9 月 30 日 上午 1:00 
Lesson 1: Defensive Programming is Mandatory, Not Optional ​The Problem: The "happy path" assumption (that the pointer is always valid) is a weakness. In a complex, persistent simulation, external events (OS shutdown, power loss, server crash, mod corruption) will lead to invalid states in the saved file. ​The Lesson: Developers must prioritize data integrity checks on load. The first thing an object should do when deserialized is validate its state. If a required reference is broken, the object should self-correct or gracefully self-destruct, rather than throwing an unhandled exception that stops the whole world.