Getting true real time performance in MARTe2

So recently, for the past two weeks I was having issues getting my MARTe2 application to be truly real time. We set a few boundary conditions in our project:

Jitter can’t exceed 0.4ms
The application must run every 2ms a cycle.
The average time to execute should be within 2.1ms per cycle (as close to 2ms as possible actually).

This was generally a really challenging issue and I do suggest to use slack when you get issues with MARTe2. If you want to join the slack group, you can do so at:

https://join.slack.com/t/efda-marte/shared_invite/zt-1bfxwx5ol-uhgBLaRwhfp4DSTzL3AGBA

Anyway so first off, we know from MARTe2 Documentation it is recommended to use the following kernel parameters at boot:

https://vcis.f4e.europa.eu/marte2-docs/master/html/deploying/linux.html

isolcpus=1,2,3 intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll selinux=0 maxcpus=4

It also mentions in this document irqbalance, which typically distributes interrupts across CPU’s. Most people will use a script to manually move the interrupts at boot, I’ve found it’s better to just use the irqaffinity kernel option and give it 0 (since it works like isolcpu’s).

So let’s unpack these anyway (only the ones I’ve seen as important):
isolcpus=1,2,3

So here’s we’re telling the kernel to not allow any user threads onto cpu 1,2,3 where cpu is a zero based index. Next:

maxcpus=4

This tells our kernel to only use the first 4 cpu’s. This can be good in systems where you want multiple operating systems running on the same motherboard, one of my colleagues in fact has managed to get an rpi with some kind of bare metal setup on another core. I’ve heard of people in stack overflow having two server setups on one machine where they split the cpu’s half way.

Anyway, that’s me rambling on possibilities. Anyway, so what we have now is core 1 is running our kernel and user threads. Not kernel threads still run on the other cores but mainly the minimal ones for memory management and so forth that our application needs.

Core 1 is in use, cores 2,3 and 4 are empty.

So now, when our MARTe2 application runs, we tell our system to execute it on the free cores using taskset –cpu-list=1,2,3 [command]. Next we have:

idle=poll

We need this because in idle mode we want our timer to keep running – otherwise our MARTe2 application might stall, I tried out the nohz option and found this behaviour where MARTe2 uses the timer for polling to tell what time is and when to execute.

So finally, I also added:

irqaffinity=0

So here I have also moved all interrupts to core 1 without having to run a script after startup, this seemed to work very effectively.

But what really did it was this misconception:

Your scheduler requires a Timing Source as we see here:

+Timings = {
            Class = TimingDataSource
        }

+Scheduler = {
        Class = GAMBareScheduler
        TimingDataSource = Timings
        MaxCycles = 15
    }

Now the way I tracked my threads real-timeness was a bit tricky at first, DeltaTime and Time always showed 2ms because my frequency was set to that, but this seemed fishy, so we should get AbsoluteTime as this is calculated by polling the counter within the cpu itself regarding it’s timer interrupt. It is a hard ware reliable source of time, it then uses the timer frequency to calculate the time by having number of ticks, frequency of ticks = time.

So I use an IOGAM to move the AbsoluteTime value into a binary file writer:

+IO_LoggingsIO = {
            Class = IOGAM
            InputSignals = {
                AbsoluteTime = {
                    DataSource = Timer
                    Alias = AbsoluteTime
                    Type = uint64
                }
                DeltaTime = {
                    DataSource = Timer
                    Alias = DeltaTime
                    Type = uint64
                }
            }
            OutputSignals = {
                AbsoluteTime = {
                    DataSource = Loggings
                    Type = uint64
                }
                DeltaTime = {
                    DataSource = Loggings
                    Type = uint64
                }
            }

+Loggings = {
            Class = FileDataSource::FileWriter
            NumberOfBuffers = 5000
            CPUMask = 0x4
            StackSize = 100000000
            Filename = "log_0.bin"
            Overwrite = "yes"
            FileFormat = "binary"
            StoreOnTrigger = 0
            Signals = {
                AbsoluteTime = {
                    Type = uint64
                }
                DeltaTime = {
                    Type = uint64
                }
            }
        }

Okay so now I’m tracking the time well, I set the frequency in the traditional way:

+GAMDisplay = {
                Class = IOGAM
                InputSignals = {
                    Time = {
                        DataSource = Timer
                        Type = uint32
                        Frequency = 500
                    }
                }
                OutputSignals = {
                    DTime = {
                        DataSource = DDB0
                        Alias = DTime
                        Type = uint32
                    }
                }
            }

Okay so it was the tweeking here that really showed the results in the end.

I do have some threads still running from the kernel on my cores but don’t worry about that, these are minimal and serve to service our MARTe2 function calls. MARTe2 for instance uses a complete shared memory across all threads and all cores in use. Now I haven’t shown you timer yet, but that’s for a good reason. Let’s briefly unpack what we have so far:

The Scheduler’s state 1 is running on core 0x2 (i.e. our free core 2).
The FileWriter is running on core 4.

Because the GAMBareScheduler basically uses a polling method, we should see next to 100% usage on the cpu, it basically just loops continuously tracking time and when to execute the state functions.

Well, I wasn’t getting the performance i needed but finally I found that it was in my timer setup so now let’s see it:

+Timer = {
            Class = LinuxTimer
            SleepNature = "Busy"
            ExecutionMode = "RealTimeThread"
            CPUMask = 0x2
            Signals = {
                Counter = {
                    Type = uint32
                }
                Time = {
                    Type = uint32
                }
            }
        }

So by default you’d have sleepnature just polling, busy allows it to utilise some form of sleep function between iterations and you can set a SleepPercentage (Percentage of the total frequency per loop – so in my case, 50% would be to sleep 1ms). Now because of my extreme time requirements, I’m not going to use SleepPercentage, just keep polling.

The real change is in the ExecutionMode, I set the cpu to the same as my scheduler state so they run together, although they will anyway because they run in the same thread together. The difference here is that when you have it in an Independent Thread, it uses an EventSem to trigger the scheduler’s execution which has it’s own level of jitter associated. For the EventSem to work, it pushes this back to the OS Scheduler and hence, is unreliable, so if you want best performance, run the timer in the same thread as your scheduler.

Tags:

jitter kernel latency linux marte2 real-time scheduler

Getting true real time performance in MARTe2

Tags:

Leave a Reply Cancel reply