TNeoKernel
v1.04
|
Explanation of essential TNKernel problems as well as several examples of poor implementation.
As a result of the two above, the kernel is buggy. And even more, the kernel is really hard to maintain because of inconsistency, so when we add new features or change something, we are likely to add new bugs as well.
The most common example that happens across all TNKernel sources is code like the following:
If you have multiple return
statements or, even more, if you have to perform some action before return (tn_enable_interrupt()
in the example above), it's great job for goto
:
I understand there are a lot of people that don't agree with me on this (mostly because they religiously believe that goto
is unconditionally evil), but anyway I decided to explain it. And, let's go further:
While multiple goto
-s to single label are better than multiple return
statements, it becomes less useful as we get to something more complicated. Imagine we need to perform some checks before disabling interrupts, and perform some other checks after disabling them. Then, we have to create two labels, like that:
For each error handling, we should specify the label explicitly, and it's easy to mix labels up, especially if we add some new case to check in the future. So, I believe this approach is a superior:
Then, for each new error handling, we should just add new else if
block, and there's no need to care where to go if error happened. Let the compiler do the branching job for you. More, this code looks more compact.
Needless to say, I already found such bug in original TNKernel 2.7 code. The function tn_sys_tslice_ticks()
looks as follows:
If you look closely, you can see that if wrong params were given, TERR_WRONG_PARAM
is returned, and interrupts remain disabled. If we follow the one entry point, one exit point rule, this bug is much less likely to happen.
Original TNKernel 2.7 code has a lot of code duplication. So many similar things are done in several places just by copy-pasting the code.
tn_queue_send()
, tn_queue_send_polling()
and tn_queue_isend_polling()
), the implementation is just copy-pasted, there's no effort to generalize things.Transitions between task states are done, again, in inconsistent copy-pasting manner. When we need to move task from, say, RUNNABLE
state to the WAIT
state, it's not enough to just clear one flag and set another one: we also need to remove it from whatever run queue the task is contained, probably find next task to run, then set reason of waiting, probably add to wait queue, set up timeout if specified, etc. In original TNKernel 2.7, there's no general mechanism to do this.
Meanwhile, the correct way is to create three functions for each state:
And then, when we need to move task from one state to another, we typically should just call two functions: one for clearing current state, and one for settine a new one. It is consistent, and of course this approach is used in TNeoKernel.
As a result of the violation of the rule Don't repeat yourself, when we need to change something, we need to change it in several places. Needless to say, it is very error-prone practice, and of course there are bugs in original TNKernel because of that (refer to Bugs of TNKernel 2.7).
TNKernel uses architecture-depended macros like TN_CHECK_NON_INT_CONTEXT
. This macro checks the current context (task or ISR), and if it is ISR, it returns TERR_WRONG_PARAM
.
It isn't obvious to the reader of the code, but things like returning from function must be as obvious as possible.
It is better to invent some function that tests current context, and return the value explicitly:
TNKernel uses doubly-linked lists heavily, which is very good. I must admit that I really like the way data is organized in TNKernel. But, unfortunately, code that manages data is far from perfect, as I already mentioned.
So, let's get to the lists. I won't paste all the macros here, just make some overview. If we have a list, it's very common task to iterate through it. Typical snippet in TNKernel looks like this:
This code is neither easy to read nor elegant. It's much better to use special macro for that (actually, similar macros are used across the whole Linux kernel code) :
Much shorter and intuitive, isn't it? We even don't have to keep special curr_que
.
TNKernel 2.7 has several bugs, which are caught by detailed unit tests and fixed.
task_low
and high-priority one task_high
. They use mutex M1
with priority inheritance.task_low
locks M1
task_high
tries to lock mutex M1
and gets blocked -> priority of task_low
elevates to the priority of task_high
task_high
stops waiting for mutex by timeout -> priority of task_low
remains elevated. The same happens if task_high
is terminated by tn_task_terminate()
.task_low1
and task_low2
, and high-priority one task_high
. They use mutex M1
with priority inheritance.task_low1
locks M1
task_low2
tries to lock M1
and gets blockedtask_high
tries to lock M1
and gets blocked -> priority if task_low1
is elevatedtask_low1
unlocks M1
->task_low1
returns to base valuetask_low2
locks M1
because it's the next task in the mutex queuetask_low2
should be elevated, but it doesn't happen. Priority inversion is in effect.tn_mutex_delete()
: if mutex is not locked, TERR_ILUSE
is returned. Of course, task should be able to delete non-locked mutex;WAIT+SUSPEND
state, and mutex is deleted, TERR_NO_ERR
is returned after returning from SUSPEND
state, instead of TERR_DLT
. The same for queue deletion, semaphore deletion, event deletion.tn_sys_tslice_ticks()
: if wrong params are given, TERR_WRONG_PARAM
is returned and interrupts remain disabled.tn_queue_receive()
and tn_fmem_get()
: if timeout
is in effect, then TN_RC_TIMEOUT
is returned, but user-provided pointer is altered anyway (some garbage data is written there)TN_EVENT_ATTR_CLR
flag is set, and the task that is waiting for event is suspended, this flag TN_EVENT_ATTR_CLR
is ignored (pattern is not reset). I can't say this bug is "fixed" because TNeoKernel has event groups instead of events, and there is no TN_EVENT_ATTR_CLR
flag.Bugs with mutexes are the direct result of the inconsistency and copy-pasting the code, as well as lack of unit tests.