Excerpt
Unity DOTS/ECS is powerful at processing huge amounts of operations. This performance relies on data lookup in continuous chunks, but what if you encounter dependencies between entities and have to jump between chunks? All entities should follow a moving target or entities should react to actions from another Entity? This post highlights how entities can interact with each other and what implications this has on performance.
Introduction
With Unity DOTS (Data-Oriented-Tech-Stack) most developers get the first contact with data-oriented programming. This is not just a new set of language features but a whole new way of thinking about the code and especially data that is put into a project.
There are lots and lots of tutorials, blog posts and Unite presentations on the topic so I won’t explain everything from the start. I expect some base knowledge on what Systems and Components are and how you use them in Unity ECS.
In DOTS you want to process as much data as possible placed in a continues piece of memory. For example, you have 3000 trees that all represent a collection of Components aka Entity. You iterate over the trees and add a random color to their “leafColor” field. Already you have a super-fast way of making some kind of fever dream forest. All trees share the same Archetype (same set of component types) so Unity makes sure they are packed nice and tight in memory for an efficient fetch from the CPU. Most ECS demos are setup this way to show performance and how many things they can do in an efficient way.
But what happens if you have dependencies? Elements in games are often intertwined with each other give rich and complex experiences. Even simple game mechanics require “data-lookup” in another namespace. In our example we have a mage tower in our colourful forest, and it has a few guards that protect the tower. The tower controls the units and when something happens to the tower, we want the units to react. We want the tower to be able to change the unit position/formation around the tower when the user clicks on the tower.
Meet our components:
public struct TowerComponent : IComponentData { public TowerFormation Formation; public int UnitCount; public float Radius; }
public struct UnitComponent : IComponentData { public Entity ParentTower; public float Speed; public int Index; }
Entity Look Up
How do the Tower
and Unit
communicate?
The UnitComponent
has a field ParentEntity
of type Entity
. When the units are created the field is filled with the tower of that unit. Fields of type Entity
work similar to regular references to other GameObjects. But Entity
is a struct and just saves some integer that points to a specific memory region. Still we can use this connection to lookup the Tower
data while processing the Unit
s in our unit job:
protected override JobHandle OnUpdate(JobHandle inputDeps) { var ecb = endSimulationEcbSystem.CreateCommandBuffer().ToConcurrent(); var time = Time.DeltaTime; var towerComponentData = GetComponentDataFromEntity<TowerComponent>(); var translationComponentData = GetComponentDataFromEntity<Translation>(); var unitJob = Entities .WithName("UnitJob") .WithReadOnly(towerComponentData) .WithReadOnly(translationComponentData) .ForEach((Entity entity, int entityInQueryIndex, in Translation translation, in UnitComponent unit) => { var newTrans = translation; var tower = towerComponentData[unit.ParentTower]; var towerTranslation = translationComponentData[unit.ParentTower].Value; float3 target; float fractionInFormation = unit.Index / (float)tower.UnitCount; switch (tower.Formation) { case TowerFormation.Circle: target = PointOnCircle(tower.Radius, fractionInFormation, towerTranslation); break; case TowerFormation.Rect: target = PointOnRect(towerTranslation, tower.Radius, fractionInFormation); break; default: target = newTrans.Value; break; } float3 delta = target - newTrans.Value; float frameSpeed = unit.Speed * time; if (math.length(delta) < frameSpeed) newTrans.Value = target; else newTrans.Value += math.normalize(delta) * frameSpeed; ecb.SetComponent(entityInQueryIndex, entity, newTrans); }).Schedule(inputDeps); endSimulationEcbSystem.AddJobHandleForProducer(unitJob); return unitJob; }
The key element here is the result of GetComponentDataFromEntity<TowerComponent>()
. It returns a type that can iterate over all TowerComponent
s in the current World. And we can also get a specific TowerComponent
with an Entity
. Hey, that is what we are storing in the Unit
which fits perfectly in here. Like a Dictionary we can look up the Tower for each Unit and get the required values with towerComponentData[unit.ParentTower]
. This works and is even runs in parallel jobs. Done! But wait, we are missing something here. Didn’t we say at the beginning that the best way to use ECS is to only work on tightly packed data? Let us test this goal here. Unity packs all components with the same Archetype into one continuous piece of memory. This is how our data probably looks:
Unit => (UnitComponent
, TranslationComponent
), Tower => (TowerComponent
, TranslationComponent
)
When processing each Unit we are looking up the Tower
it belongs to. So in fact we are jumping in memory quite often! Always from the Unit
to the Tower
and back. For bigger data sets this costs performance on the CPU level because memory from RAM has to be loaded and unloaded from the CPU cache.
Denormalization
One solution to the problem here is Denormalization. We copy the data that the Unit
requires to position itself in the formation to the Unit
Entity
. This gets rid of the Tower look up because the data is tightly packed with the Unit
. With this strategy we create some duplication of data and abandon our current state where the Tower
is the point of truth for its Unit
s. The advantage is performance, but the disadvantage is that we always have to think about updating the copied data on the Units when changing the tower. And of course we are using up a bit more memory. Please meet UnitFormation
:
public struct UnitFormationComponent : IComponentData { public TowerFormation Formation; public int UnitCount; public float Radius; public float3 Origin; }
This data structure will live together with the UnitComponent
in the tightly packed data range that is Orange in the image above.
With this data the Unit
is independent of the Tower
and can calculate its position itself. Which changes our Unit
position job:
protected override JobHandle OnUpdate(JobHandle inputDeps) { var time = Time.DeltaTime; var unitJob = Entities .WithName("UnitDenormalizedJob") .ForEach((Entity entity, int entityInQueryIndex, ref Translation translation, in UnitComponent unit, in UnitFormationComponent unitFormation) => { var newTrans = translation; float3 target; float fractionInFormation = unit.Index / (float)unitFormation.UnitCount; switch (unitFormation.Formation) { case TowerFormation.Circle: target = PointOnCircle(unitFormation.Radius, fractionInFormation, unitFormation.Origin); break; case TowerFormation.Rect: target = PointOnRect(unitFormation.Origin, unitFormation.Radius, fractionInFormation); break; default: target = newTrans.Value; break; } float3 delta = target - newTrans.Value; float frameSpeed = unit.Speed * time; if (math.length(delta) < frameSpeed) newTrans.Value = target; else newTrans.Value += math.normalize(delta) * frameSpeed; translation = newTrans; }).Schedule(inputDeps); return unitJob; }
You can see that we removed the usage of towerComponentData
because we don’t need the parent tower anymore. Because we don’t rely on the Tower
anymore we could also make another change and write the data directly to the component and not using a EntityCommandBuffer. Writing the data directly gives us another performance boost.
As long as we can maintain a perfect sync between the data of the Tower
and the UnitFormation
everything is great. If the denormalized data gets out of sync of the Tower
we can run into serious bugs. So be careful to always maintain your denormalized data. It can also make sense to reverse the process and “normalize” your data back into the tower by removing the UnitFormation
component until you are ready to optimize again.
Performance
Setup:
- 2.6 GHz 6-Core Intel Core i7
- 16 GB RAM
- Burst compile ON
Scene with 1000 Towers that each have 27 Units => 28000 Entities:
- Parent look up takes average 4 ms on the CPU.
- Denormalized data takes average 3 ms on the CPU which is a 25% improvement!
This benchmark is by no means extensive since it is run in Editor but just gives an impression of the effect that can be achieved.
Conclusion
We looked at how to handle relationships between Entities in a clean way and how to further optimize relationships if required using Denormalization.
This is a small concept in a whole new world but I think it is a rather important. You should always think ahead on how you are going to layout your data. Keep relationships in mind when doing so.
Need a few towers? Source code is attached below.
InnoGames is hiring! Check out open positions and join our awesome international team in Hamburg at the certified Great Place to Work®.