perf: Reduce cost of retrieving labels from Waypoint and PolygonTrigger by 90%#2761
Conversation
…() and PolygonTriggers()
|
| Filename | Overview |
|---|---|
| Generals/Code/GameEngine/Include/GameLogic/TerrainLogic.h | getPathLabel1/2/3() return types changed from AsciiString (by value) to const AsciiString& — correct and safe. |
| Generals/Code/GameEngine/Include/GameLogic/PolygonTrigger.h | getTriggerName() return type changed to const AsciiString& — correct and safe. |
| Generals/Code/GameEngine/Source/GameLogic/Map/TerrainLogic.cpp | getTriggerAreaByName now binds a const ref instead of copying; reference lifetime is safe within the loop iteration. |
| GeneralsMD/Code/GameEngine/Include/GameLogic/TerrainLogic.h | Mirror of Generals change — getPathLabel1/2/3() updated to return const AsciiString&. |
| GeneralsMD/Code/GameEngine/Include/GameLogic/PolygonTrigger.h | Mirror of Generals change — getTriggerName() updated to return const AsciiString&. |
| GeneralsMD/Code/GameEngine/Source/GameLogic/Map/TerrainLogic.cpp | Mirror of Generals change — getTriggerAreaByName updated to use const ref local. |
| GeneralsMD/Code/Tools/WorldBuilder/src/LayersList.cpp | Four getTriggerName() call sites updated to use const AsciiString& locals; all references are safely scoped within their enclosing block. |
| Generals/Code/Tools/WorldBuilder/src/WaypointOptions.cpp | getTriggerName() call site updated to const ref; safe. |
| Generals/Code/Tools/WorldBuilder/src/WaterOptions.cpp | getTriggerName() call site updated to const ref; safe. |
| GeneralsMD/Code/Tools/WorldBuilder/src/WaypointOptions.cpp | Mirror of Generals WaypointOptions change; safe. |
| GeneralsMD/Code/Tools/WorldBuilder/src/WaterOptions.cpp | Mirror of Generals WaterOptions change; safe. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[AI Unit per-frame pathing] --> B[isPurposeOfPath / getTriggerAreaByName]
B --> C{Loop over Waypoints / PolygonTriggers}
C -->|Before| D["AsciiString label = getPathLabel1()\n(heap alloc + copy per call)"]
C -->|After| E["const AsciiString& label = getPathLabel1()\n(zero-copy reference)"]
D --> F[String comparison]
E --> F
F -->|match| G[Return result]
F -->|no match| C
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[AI Unit per-frame pathing] --> B[isPurposeOfPath / getTriggerAreaByName]
B --> C{Loop over Waypoints / PolygonTriggers}
C -->|Before| D["AsciiString label = getPathLabel1()\n(heap alloc + copy per call)"]
C -->|After| E["const AsciiString& label = getPathLabel1()\n(zero-copy reference)"]
D --> F[String comparison]
E --> F
F -->|match| G[Return result]
F -->|no match| C
Reviews (3): Last reviewed commit: "chore: Make retrieved data const at Poly..." | Re-trigger Greptile
xezon
left a comment
There was a problem hiding this comment.
Why does this improve performance? Because of the global ref counter lock?
| PolygonTrigger *getNext() {return m_nextPolygonTrigger;} | ||
| const PolygonTrigger *getNext() const {return m_nextPolygonTrigger;} | ||
| AsciiString getTriggerName() const {return m_triggerName;} ///< Gets the trigger name. | ||
| const AsciiString& getTriggerName() const {return m_triggerName;} ///< Gets the trigger name. |
There was a problem hiding this comment.
This does not allocate. But it does touch the string ref counter.
Callers should also use references then, for example:
PolygonTrigger *TerrainLogic::getTriggerAreaByName( AsciiString name )
{
for (PolygonTrigger* pTrig = PolygonTrigger::getFirstPolygonTrigger(); pTrig; pTrig = pTrig->getNext()) {
AsciiString trigName = pTrig->getTriggerName(); // <---- can take const ref then
if (name == trigName)
return pTrig;
}
return nullptr;
}
This is where the problem lies, a copy is passed into And this is done hundreds to thousands of times per frame if there are a lot of path scripted units in a scene. It is a night and day difference in wave 1 of cobalt rush when a reference is passed from the function instead. CompareNoCase already takes a reference as well. |
|
I still don't see it. Can you explain how it allocates? if (label.compareNoCase(way->getPathLabel1())==0) match = true; |
|
FWIW make sure to measure / benchmark with release builds. |
i know, this does affect release too. |
It is not allocating/deallocating. The cost comes from the critical section, which was added in #799 last year. Better replace the critical section with atomic counting.
|
There was always a critical section, even in the original code, they just called it a "fast critical section". There still needs to be a lock as it is protecting the allocated data and not just the reference counter. |
This is not quite right. EA swapped the Critical Section in AsciiString for Atomic Counting. I expect this was deliberate, to tackle the same kind of performance culprits that you are observing now. inline AsciiString::AsciiString(const AsciiString& stringSrc) : m_data(stringSrc.m_data)
{
// don't need this if we're using InterlockedIncrement
// FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
if (m_data)
// ++m_data->m_refCount;
// yes, I know it's not a DWord but we're incrementing so we're safe
InterlockedIncrement((long *)&m_data->m_refCount); // <----- Atomic counter
validate();
}
inline void AsciiString::releaseBuffer()
{
// FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
validate();
if (m_data)
{
InterlockedDecrement((long *)&m_data->m_refCount); // <----- Atomic counter
if (!m_data->m_refCount)
freeBytes();
m_data = 0;
}
validate();
}Using atomic counter is the right thing to do. |
I was probably remembering the fast critical section bit that was not used, the thread safety is still a problem though which the atomic counters do not provide. As far as i am aware ascii strings are used between threads in the audio system. Might be wrong though, need to check. The ref counter could always be placed within the Ascii/Unicode String instead of within the data that gets allocated to it. Edit - actually the ref conter cannot be on the ascii/Unicode string object otherwise there could be out of sync copies of it. |
|
What is the concern regarding thread safety? The only write shared access point should be the counter, not the string data. The Atomic Counter is totally sufficient. The only problem currently is that VC6 only has 4 byte atomic counter, but we have a 2 byte counter here. I think 2 byte counter is available in assembler though. |
The reference counter is part of the allocated data object for the string, so if one thread releases the data as another one accesses it, then you have a race condition. |
This is impossible. Ref counter will not reach 0 when 2 threads hold a reference. |
One thread might be trying to gain a reference to the object just as another is releasing it, that's the situation i am considering. |
This is also impossible. If the unique owner is releasing it, there is will be no one else having any opportunity to take ownership. The Atomic counter approach is rock solid - do not worry. |
How is it impossible? if the original thread that owned the string deletes it just at the point that another thread goes to take a reference to it, you can have a race condition on checking if m_data exists. The original owning thread could clear the data and the new thread could just be behind it, pass the m_data check then fault on incrementing the counter as m_data is no longer valid. Unless i am missing something, since the counter is part of the allocated data, i don't see how you cannot end up with a race condition in some circumstances. |
|
You are right that is a data race. The same principles as for |
|
Mauller:
|
xezon
left a comment
There was a problem hiding this comment.
Please also review all call sites of the touched functions and assign to const ref where suitable.
Will do, it can still be worth changing the global locks for the ref counters as other call sites will have improved performance. |
I went through and made the data constant where possible, in some locations it would require more significant refactors, but it's mostly in the CRC functions so data is only being read at that point for the checksum. |
9bec63b to
b35c42a
Compare



This PR reduces the allocation overhead seen when waypoint path labels and polygontrigger names are retrieved.
In mod maps with a lot of path scripted ai, such as Cobal Rush, waypoint labels are retrieved and compared every frame.
This will occur for every AI unit that is pathing along the waypoint. Polygon triggers also have a similar issue where they can be repeatedly compared every frame per scripted unit.
The following shows the side effects of this change for waypoint path labels.
Before:

After:

In game the most significant observation was that the first wave in Cobalt Rush went from stuttering to smooth.
The average FPS also significantly increased when observing the first wave and the camera also stopped stuttering.
The above flame graphs were captured in a debug build, but the stuttering still occurs in a release build and is alleviated with the same fix.