Skip to content

perf: Reduce cost of retrieving labels from Waypoint and PolygonTrigger by 90%#2761

Merged
xezon merged 2 commits into
TheSuperHackers:mainfrom
Mauller:fix-waypoint+trigger-label-overhead
Jun 18, 2026
Merged

perf: Reduce cost of retrieving labels from Waypoint and PolygonTrigger by 90%#2761
xezon merged 2 commits into
TheSuperHackers:mainfrom
Mauller:fix-waypoint+trigger-label-overhead

Conversation

@Mauller

@Mauller Mauller commented May 31, 2026

Copy link
Copy Markdown

This PR reduces the allocation overhead seen when waypoint path labels and polygontrigger names are retrieved.

In mod maps with a lot of path scripted ai, such as Cobal Rush, waypoint labels are retrieved and compared every frame.
This will occur for every AI unit that is pathing along the waypoint. Polygon triggers also have a similar issue where they can be repeatedly compared every frame per scripted unit.

The following shows the side effects of this change for waypoint path labels.

Before:
image

After:
image

In game the most significant observation was that the first wave in Cobalt Rush went from stuttering to smooth.
The average FPS also significantly increased when observing the first wave and the camera also stopped stuttering.

The above flame graphs were captured in a debug build, but the stuttering still occurs in a release build and is alleviated with the same fix.

@Mauller Mauller self-assigned this May 31, 2026
@Mauller Mauller added Performance Is a performance concern Fix Is fixing something, but is not user facing Gen Relates to Generals ZH Relates to Zero Hour labels May 31, 2026
@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown

Greptile Summary

This PR eliminates per-call heap allocation overhead for AsciiString getters on Waypoint (path labels) and PolygonTrigger (trigger name) by changing their return type from by-value to const AsciiString&. Call sites that previously received a copy now bind a const reference, avoiding string construction entirely. The change is applied symmetrically across both the Generals/ and GeneralsMD/ code trees.

  • getPathLabel1/2/3() and getTriggerName() in both header trees are updated to return const AsciiString& instead of AsciiString, eliminating a copy-construction on every call in per-frame AI pathing loops.
  • All updated call sites correctly bind to const AsciiString& locals whose lifetimes stay within the scope where the owning Waypoint/PolygonTrigger object is guaranteed alive.
  • GeneralsMD/Code/Tools/WorldBuilder/src/LayersList.cpp receives additional identical fixes for its four direct getTriggerName() call sites that were missed in the Generals/ tree (no parallel file exists there).

Confidence Score: 5/5

Safe to merge — all references are bound to members of objects that remain alive for the full duration of each reference's scope.

The change is a narrow, mechanical transformation: six getter methods across two header trees switch from returning AsciiString by value to returning const AsciiString&. Every updated call site either compares the reference inline or passes it to a method within the same enclosing block, so there is no risk of a dangling reference. The change compiles cleanly against all existing callers (a non-const lvalue reference binding would be a compile error, catching any misuse). No logic, data layout, or ownership model is altered.

No files require special attention. The symmetry between Generals/ and GeneralsMD/ is maintained, and the one GeneralsMD-only file (LayersList.cpp) has no Generals/ counterpart to worry about.

Important Files Changed

Filename Overview
Generals/Code/GameEngine/Include/GameLogic/TerrainLogic.h getPathLabel1/2/3() return types changed from AsciiString (by value) to const AsciiString& — correct and safe.
Generals/Code/GameEngine/Include/GameLogic/PolygonTrigger.h getTriggerName() return type changed to const AsciiString& — correct and safe.
Generals/Code/GameEngine/Source/GameLogic/Map/TerrainLogic.cpp getTriggerAreaByName now binds a const ref instead of copying; reference lifetime is safe within the loop iteration.
GeneralsMD/Code/GameEngine/Include/GameLogic/TerrainLogic.h Mirror of Generals change — getPathLabel1/2/3() updated to return const AsciiString&.
GeneralsMD/Code/GameEngine/Include/GameLogic/PolygonTrigger.h Mirror of Generals change — getTriggerName() updated to return const AsciiString&.
GeneralsMD/Code/GameEngine/Source/GameLogic/Map/TerrainLogic.cpp Mirror of Generals change — getTriggerAreaByName updated to use const ref local.
GeneralsMD/Code/Tools/WorldBuilder/src/LayersList.cpp Four getTriggerName() call sites updated to use const AsciiString& locals; all references are safely scoped within their enclosing block.
Generals/Code/Tools/WorldBuilder/src/WaypointOptions.cpp getTriggerName() call site updated to const ref; safe.
Generals/Code/Tools/WorldBuilder/src/WaterOptions.cpp getTriggerName() call site updated to const ref; safe.
GeneralsMD/Code/Tools/WorldBuilder/src/WaypointOptions.cpp Mirror of Generals WaypointOptions change; safe.
GeneralsMD/Code/Tools/WorldBuilder/src/WaterOptions.cpp Mirror of Generals WaterOptions change; safe.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[AI Unit per-frame pathing] --> B[isPurposeOfPath / getTriggerAreaByName]
    B --> C{Loop over Waypoints / PolygonTriggers}
    C -->|Before| D["AsciiString label = getPathLabel1()\n(heap alloc + copy per call)"]
    C -->|After| E["const AsciiString& label = getPathLabel1()\n(zero-copy reference)"]
    D --> F[String comparison]
    E --> F
    F -->|match| G[Return result]
    F -->|no match| C
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[AI Unit per-frame pathing] --> B[isPurposeOfPath / getTriggerAreaByName]
    B --> C{Loop over Waypoints / PolygonTriggers}
    C -->|Before| D["AsciiString label = getPathLabel1()\n(heap alloc + copy per call)"]
    C -->|After| E["const AsciiString& label = getPathLabel1()\n(zero-copy reference)"]
    D --> F[String comparison]
    E --> F
    F -->|match| G[Return result]
    F -->|no match| C
Loading

Reviews (3): Last reviewed commit: "chore: Make retrieved data const at Poly..." | Re-trigger Greptile

@xezon xezon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this improve performance? Because of the global ref counter lock?

PolygonTrigger *getNext() {return m_nextPolygonTrigger;}
const PolygonTrigger *getNext() const {return m_nextPolygonTrigger;}
AsciiString getTriggerName() const {return m_triggerName;} ///< Gets the trigger name.
const AsciiString& getTriggerName() const {return m_triggerName;} ///< Gets the trigger name.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not allocate. But it does touch the string ref counter.

Callers should also use references then, for example:

PolygonTrigger *TerrainLogic::getTriggerAreaByName( AsciiString name )
{
	for (PolygonTrigger* pTrig = PolygonTrigger::getFirstPolygonTrigger(); pTrig; pTrig = pTrig->getNext()) {
		AsciiString trigName = pTrig->getTriggerName(); // <---- can take const ref then
		if (name == trigName)
			return pTrig;
	}
	return nullptr;
}

@Mauller

Mauller commented May 31, 2026

Copy link
Copy Markdown
Author

Why does this improve performance? Because of the global ref counter lock?

This is where the problem lies, a copy is passed into compareNoCase which causes an allocation.

Waypoint *TerrainLogic::getClosestWaypointOnPath( const Coord3D *pos, AsciiString label )
{
	Real distSqr = 0;
	Waypoint *pClosestWay = nullptr;
	if (label.isEmpty()) {
		DEBUG_LOG(("***Warning - asking for empty path label."));
		return nullptr;
	}

	for( Waypoint *way = m_waypointListHead; way; way = way->getNext() ) {
		Bool match = false;
		if (label.compareNoCase(way->getPathLabel1())==0) match = true;
		if (label.compareNoCase(way->getPathLabel2())==0) match = true;
		if (label.compareNoCase(way->getPathLabel3())==0) match = true;
		if (match) {
			Coord3D curPos = *way->getLocation();
			Real newDistSqr = (curPos.x-pos->x)*(curPos.x-pos->x) + (curPos.y-pos->y)*(curPos.y-pos->y);
			if (pClosestWay==nullptr) {
				pClosestWay = way;
				distSqr = newDistSqr;
			} else if (newDistSqr < distSqr) {
				pClosestWay = way;
				distSqr = newDistSqr;
			}
		}
	}

	return pClosestWay;
}

And this is done hundreds to thousands of times per frame if there are a lot of path scripted units in a scene.

It is a night and day difference in wave 1 of cobalt rush when a reference is passed from the function instead.

CompareNoCase already takes a reference as well.

@xezon

xezon commented May 31, 2026

Copy link
Copy Markdown

I still don't see it.

Can you explain how it allocates?

if (label.compareNoCase(way->getPathLabel1())==0) match = true;

@Mauller

Mauller commented May 31, 2026

Copy link
Copy Markdown
Author

I still don't see it.

Can you explain how it allocates?

if (label.compareNoCase(way->getPathLabel1())==0) match = true;

Acsii strings are being allocated and deleted, you can see it when comparing the flame graphs since in the non reference variant there are calls to delete ascii strings and to release buffers.

image

@Caball009

Copy link
Copy Markdown

FWIW make sure to measure / benchmark with release builds.

@Mauller

Mauller commented May 31, 2026

Copy link
Copy Markdown
Author

FWIW make sure to measure / benchmark with release builds.

i know, this does affect release too.

@Mauller

Mauller commented May 31, 2026

Copy link
Copy Markdown
Author

With the fix in place, i guess the performance also comes from the critical section is not being hit.

image

@xezon

xezon commented May 31, 2026

Copy link
Copy Markdown

Acsii strings are being allocated and deleted, you can see it when comparing the flame graphs since in the non reference variant there are calls to delete ascii strings and to release buffers.

It is not allocating/deallocating. The cost comes from the critical section, which was added in #799 last year.

Better replace the critical section with atomic counting.

image

@Mauller

Mauller commented Jun 1, 2026

Copy link
Copy Markdown
Author

Acsii strings are being allocated and deleted, you can see it when comparing the flame graphs since in the non reference variant there are calls to delete ascii strings and to release buffers.

It is not allocating/deallocating. The cost comes from the critical section, which was added in #799 last year.

Better replace the critical section with atomic counting.

There was always a critical section, even in the original code, they just called it a "fast critical section".
I just simplified the code to match the original Generals implementation.

There still needs to be a lock as it is protecting the allocated data and not just the reference counter.
I can see if a mutex per string helps, otherwise the reference to the string works perfectly fine and doesn't hit the critical section in the hot path.

@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

There was always a critical section, even in the original code, they just called it a "fast critical section". I just simplified the code to match the original Generals implementation.

There still needs to be a lock as it is protecting the allocated data and not just the reference counter. I can see if a mutex per string helps, otherwise the reference to the string works perfectly fine and doesn't hit the critical section in the hot path.

This is not quite right. EA swapped the Critical Section in AsciiString for Atomic Counting. I expect this was deliberate, to tackle the same kind of performance culprits that you are observing now.

inline AsciiString::AsciiString(const AsciiString& stringSrc) : m_data(stringSrc.m_data)
{
  // don't need this if we're using InterlockedIncrement
  // FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
	if (m_data)
		// ++m_data->m_refCount;
    // yes, I know it's not a DWord but we're incrementing so we're safe
    InterlockedIncrement((long *)&m_data->m_refCount); // <----- Atomic counter
	validate();
}

inline void AsciiString::releaseBuffer()
{
  // FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);

	validate();
	if (m_data)
	{
    InterlockedDecrement((long *)&m_data->m_refCount); // <----- Atomic counter
		if (!m_data->m_refCount)
			freeBytes();
		m_data = 0;
	}
	validate();
}

Using atomic counter is the right thing to do.

@Mauller

Mauller commented Jun 1, 2026

Copy link
Copy Markdown
Author

There was always a critical section, even in the original code, they just called it a "fast critical section". I just simplified the code to match the original Generals implementation.
There still needs to be a lock as it is protecting the allocated data and not just the reference counter. I can see if a mutex per string helps, otherwise the reference to the string works perfectly fine and doesn't hit the critical section in the hot path.

This is not quite right. EA swapped the Critical Section in AsciiString for Atomic Counting. I expect this was deliberate, to tackle the same kind of performance culprits that you are observing now.

inline AsciiString::AsciiString(const AsciiString& stringSrc) : m_data(stringSrc.m_data)
{
  // don't need this if we're using InterlockedIncrement
  // FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
	if (m_data)
		// ++m_data->m_refCount;
    // yes, I know it's not a DWord but we're incrementing so we're safe
    InterlockedIncrement((long *)&m_data->m_refCount); // <----- Atomic counter
	validate();
}

inline void AsciiString::releaseBuffer()
{
  // FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);

	validate();
	if (m_data)
	{
    InterlockedDecrement((long *)&m_data->m_refCount); // <----- Atomic counter
		if (!m_data->m_refCount)
			freeBytes();
		m_data = 0;
	}
	validate();
}

Using atomic counter is the right thing to do.

I was probably remembering the fast critical section bit that was not used, the thread safety is still a problem though which the atomic counters do not provide.

As far as i am aware ascii strings are used between threads in the audio system.

Might be wrong though, need to check.

The ref counter could always be placed within the Ascii/Unicode String instead of within the data that gets allocated to it.

Edit - actually the ref conter cannot be on the ascii/Unicode string object otherwise there could be out of sync copies of it.

@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

What is the concern regarding thread safety? The only write shared access point should be the counter, not the string data. The Atomic Counter is totally sufficient. The only problem currently is that VC6 only has 4 byte atomic counter, but we have a 2 byte counter here. I think 2 byte counter is available in assembler though.

@Mauller

Mauller commented Jun 1, 2026

Copy link
Copy Markdown
Author

What is the concern regarding thread safety? The only write shared access point should be the counter, not the string data. The Atomic Counter is totally sufficient. The only problem currently is that VC6 only has 4 byte atomic counter, but we have a 2 byte counter here. I think 2 byte counter is available in assembler though.

The reference counter is part of the allocated data object for the string, so if one thread releases the data as another one accesses it, then you have a race condition.

@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

The reference counter is part of the allocated data object for the string, so if one thread releases the data as another one accesses it, then you have a race condition.

This is impossible. Ref counter will not reach 0 when 2 threads hold a reference.

@Mauller

Mauller commented Jun 1, 2026

Copy link
Copy Markdown
Author

The reference counter is part of the allocated data object for the string, so if one thread releases the data as another one accesses it, then you have a race condition.

This is impossible. Ref counter will not reach 0 when 2 threads hold a reference.

One thread might be trying to gain a reference to the object just as another is releasing it, that's the situation i am considering.

@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

One thread might be trying to gain a reference to the object just as another is releasing it, that's the situation i am considering.

This is also impossible. If the unique owner is releasing it, there is will be no one else having any opportunity to take ownership. The Atomic counter approach is rock solid - do not worry.

@Mauller

Mauller commented Jun 1, 2026

Copy link
Copy Markdown
Author

One thread might be trying to gain a reference to the object just as another is releasing it, that's the situation i am considering.

This is also impossible. If the sole owner is releasing it, there is will be no one else having any opportunity to take take ownership. The Atomic counter approach is rock solid - do not worry.

How is it impossible?

if the original thread that owned the string deletes it just at the point that another thread goes to take a reference to it, you can have a race condition on checking if m_data exists. The original owning thread could clear the data and the new thread could just be behind it, pass the m_data check then fault on incrementing the counter as m_data is no longer valid.

Unless i am missing something, since the counter is part of the allocated data, i don't see how you cannot end up with a race condition in some circumstances.

@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

You are right that is a data race. The same principles as for std::shared_ptr apply. This is expected and okay.

@xezon xezon changed the title perf: Remove allocation overhead when retrieving labels for Waypoints() and PolygonTriggers() perf: Reduce cost of retrieving labels from Waypoint and PolygonTrigger by 90% Jun 1, 2026
@xezon

xezon commented Jun 1, 2026

Copy link
Copy Markdown

Mauller:

it's going from like 20% of cpu time down to ~5% with the atomic counters and down to 1-2% with references

@xezon xezon added the Major Severity: Minor < Major < Critical < Blocker label Jun 1, 2026

@xezon xezon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also review all call sites of the touched functions and assign to const ref where suitable.

@Mauller

Mauller commented Jun 2, 2026

Copy link
Copy Markdown
Author

Please also review all call sites of the touched functions and assign to const ref where suitable.

Will do, it can still be worth changing the global locks for the ref counters as other call sites will have improved performance.

@Mauller

Mauller commented Jun 16, 2026

Copy link
Copy Markdown
Author

Please also review all call sites of the touched functions and assign to const ref where suitable.

I went through and made the data constant where possible, in some locations it would require more significant refactors, but it's mostly in the CRC functions so data is only being read at that point for the checksum.

Comment thread Generals/Code/GameEngine/Source/GameLogic/Map/TerrainLogic.cpp Outdated
@Mauller Mauller force-pushed the fix-waypoint+trigger-label-overhead branch from 9bec63b to b35c42a Compare June 16, 2026 20:27

@xezon xezon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good and delicous

@xezon xezon merged commit f06f183 into TheSuperHackers:main Jun 18, 2026
17 checks passed
@xezon xezon deleted the fix-waypoint+trigger-label-overhead branch June 18, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Fix Is fixing something, but is not user facing Gen Relates to Generals Major Severity: Minor < Major < Critical < Blocker Performance Is a performance concern ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants