TNeo  v1.09
Why reimplement TNKernel

Explanation of essential TNKernel problems as well as several examples of poor implementation.

Essential problems of TNKernel

  • The most essential problem is that TNKernel is a very hastily-written project. Several concepts are just poorly thought out, others are poorly implemented: there is a lot of code duplication and inconsistency;
  • It is untested: there are no unit tests for the kernel, this is not acceptable for the project like real-time kernel;

As a result of the two above, the kernel is buggy. And even more, the kernel is really hard to maintain because of inconsistency, so when we add new features or change something, we are likely to add new bugs as well.

  • It is unsupported. I've written to the Yuri Tiomkin about troubles with MAKE_ALIG() macro as well as about bugs in the kernel, my messages were just ignored;
  • People are unable to contribute to the kernel. There is even no some kind of main repository: there's just an archive with source code published on the website. When someone wants to contribute, he or she has no way to do this but to write to Yuri, but, as I've already mentioned, messages like this are ignored. So, eventually people end up maintaining their own modification of TNKernel. Rest assured this will never happen with TNeo: even if some day I completely stop maintaining the kernel, I'll transfer ownership of the repository to someone who is interested in maintaining it.
  • Documentation is far from perfect and it lives separately of the project itself: latest kernel version at the moment is 2.7 (published at 2013), but latest documentation is for 2.3 (published at 2006).

Examples of poor implementation

One entry point, one exit point

The most common example that happens across all TNKernel sources is code like the following:

int my_function(void)
{
//-- do something
if (error()){
//-- do something
return ERROR;
}
//-- do something
return SUCCESS;
}

If you have multiple return statements or, even more, if you have to perform some action before return (tn_enable_interrupt() in the example above), it's great job for goto:

int my_function(void)
{
int rc = SUCCESS;
//-- do something
if (error()){
//-- do something
rc = ERROR;
goto out;
}
//-- do something
out:
return rc;
}

I understand there are a lot of people that don't agree with me on this (mostly because they religiously believe that goto is unconditionally evil), but anyway I decided to explain it. And, let's go further:

While multiple goto-s to single label are better than multiple return statements, it becomes less useful as we get to something more complicated. Imagine we need to perform some checks before disabling interrupts, and perform some other checks after disabling them. Then, we have to create two labels, like that:

int my_function(void)
{
int rc = SUCCESS;
if (error1()){
rc = ERROR1;
goto out;
}
if (error2()){
rc = ERROR2;
goto out_ei;
}
if (error3()){
rc = ERROR3;
goto out_ei;
}
//-- perform job
out_ei:
out:
return rc;
}

For each error handling, we should specify the label explicitly, and it's easy to mix labels up, especially if we add some new case to check in the future. So, I believe this approach is a superior:

int my_function(void)
{
int rc = SUCCESS;
if (error1()){
rc = ERROR1;
} else {
if (error2()){
rc = ERROR2;
} else if (error3()){
rc = ERROR3;
} else {
//-- perform job
}
}
return rc;
}

Then, for each new error handling, we should just add new else if block, and there's no need to care where to go if error happened. Let the compiler do the branching job for you. More, this code looks more compact.

Needless to say, I already found such bug in original TNKernel 2.7 code. The function tn_sys_tslice_ticks() looks as follows:

int tn_sys_tslice_ticks(int priority,int value)
{
TN_CHECK_NON_INT_CONTEXT
if(priority <= 0 || priority >= TN_NUM_PRIORITY-1 ||
value < 0 || value > MAX_TIME_SLICE)
tn_tslice_ticks[priority] = value;
return TERR_NO_ERR;
}

If you look closely, you can see that if wrong params were given, #TERR_WRONG_PARAM is returned, and interrupts remain disabled. If we follow the one entry point, one exit point rule, this bug is much less likely to happen.

Don't repeat yourself

Original TNKernel 2.7 code has a lot of code duplication. So many similar things are done in several places just by copy-pasting the code.

  • If we have similar functions (like, tn_queue_send(), tn_queue_send_polling() and tn_queue_isend_polling()), the implementation is just copy-pasted, there's no effort to generalize things.
  • Mutexes have complicated algorithms for task priorities. It is implemented in inconsistent, messy manner, which leads to bugs (refer to Bugs of TNKernel 2.7)
  • Transitions between task states are done, again, in inconsistent copy-pasting manner. When we need to move task from, say, RUNNABLE state to the WAIT state, it's not enough to just clear one flag and set another one: we also need to remove it from whatever run queue the task is contained, probably find next task to run, then set reason of waiting, probably add to wait queue, set up timeout if specified, etc. In original TNKernel 2.7, there's no general mechanism to do this.

    Meanwhile, the correct way is to create three functions for each state:

    • to set the state;
    • to clear the state;
    • to test if the state active.

    And then, when we need to move task from one state to another, we typically should just call two functions: one for clearing current state, and one for settine a new one. It is consistent, and of course this approach is used in TNeo.

As a result of the violation of the rule Don't repeat yourself, when we need to change something, we need to change it in several places. Needless to say, it is very error-prone practice, and of course there are bugs in original TNKernel because of that (refer to Bugs of TNKernel 2.7).

Macros that return from function

TNKernel uses architecture-depended macros like TN_CHECK_NON_INT_CONTEXT. This macro checks the current context (task or ISR), and if it is ISR, it returns TERR_WRONG_PARAM.

It isn't obvious to the reader of the code, but things like returning from function must be as obvious as possible.

It is better to invent some function that tests current context, and return the value explicitly:

enum TN_RCode my_function(void)
enum TN_RCode rc = TN_RC_OK;
// ...
goto out;
}
// ...
out:
return rc
}

Code for doubly-linked lists

TNKernel uses doubly-linked lists heavily, which is very good. I must admit that I really like the way data is organized in TNKernel. But, unfortunately, code that manages data is far from perfect, as I already mentioned.

So, let's get to the lists. I won't paste all the macros here, just make some overview. If we have a list, it's very common task to iterate through it. Typical snippet in TNKernel looks like this:

CDLL_QUEUE * curr_que;
TN_MUTEX * tmp_mutex;
curr_que = tn_curr_run_task->mutex_queue.next;
while(curr_que != &(tn_curr_run_task->mutex_queue))
{
tmp_mutex = get_mutex_by_mutex_queque(curr_que);
/* now, tmp_mutex points to the next object, so,
we can do something useful with it */
curr_que = curr_que->next;
}

This code is neither easy to read nor elegant. It's much better to use special macro for that (actually, similar macros are used across the whole Linux kernel code) :

TN_MUTEX * tmp_mutex;
tn_list_for_each_entry(tmp_mutex, &(tn_curr_run_task->mutex_queue), mutex_queue){
/* now, tmp_mutex points to the next object, so,
we can do something useful with it */
}

Much shorter and intuitive, isn't it? We even don't have to keep special curr_que.

Bugs of TNKernel 2.7

TNKernel 2.7 has several bugs, which are caught by detailed unit tests and fixed.

  • We have two tasks: low-priority one task_low and high-priority one task_high. They use mutex M1 with priority inheritance.
    • task_low locks M1
    • task_high tries to lock mutex M1 and gets blocked -> priority of task_low elevates to the priority of task_high
    • task_high stops waiting for mutex by timeout -> priority of task_low remains elevated. The same happens if task_high is terminated by tn_task_terminate().
  • We have three tasks: two low-priority tasks task_low1 and task_low2, and high-priority one task_high. They use mutex M1 with priority inheritance.
    • task_low1 locks M1
    • task_low2 tries to lock M1 and gets blocked
    • task_high tries to lock M1 and gets blocked -> priority if task_low1 is elevated
    • task_low1 unlocks M1 ->
      • priority of task_low1 returns to base value
      • task_low2 locks M1 because it's the next task in the mutex queue
      • now, priority of task_low2 should be elevated, but it doesn't happen. Priority inversion is in effect.
  • tn_mutex_delete() : if mutex is not locked, #TERR_ILUSE is returned. Of course, task should be able to delete non-locked mutex;
  • If task that waits for mutex is in WAIT+SUSPEND state, and mutex is deleted, #TERR_NO_ERR is returned after returning from SUSPEND state, instead of #TERR_DLT. The same for queue deletion, semaphore deletion, event deletion.
  • tn_sys_tslice_ticks() : if wrong params are given, #TERR_WRONG_PARAM is returned and interrupts remain disabled.
  • tn_queue_receive() and tn_fmem_get() : if timeout is in effect, then #TN_RC_TIMEOUT is returned, but user-provided pointer is altered anyway (some garbage data is written there)
  • Probably not a "bug", but an issue in the data queue: actual capacity of the buffer is less by 1 than user has specified and allocated
  • Event: if TN_EVENT_ATTR_CLR flag is set, and the task that is waiting for event is suspended, this flag TN_EVENT_ATTR_CLR is ignored (pattern is not reset). I can't say this bug is "fixed" because TNeo has event groups instead of events, and there is no TN_EVENT_ATTR_CLR flag.

Bugs with mutexes are the direct result of the inconsistency and copy-pasting the code, as well as lack of unit tests.

tn_disable_interrupt
#define tn_disable_interrupt
old TNKernel name of #TN_INT_DIS_SAVE
Definition: tn_oldsymbols.h:306
MAX_TIME_SLICE
#define MAX_TIME_SLICE
old TNKernel name for #TN_MAX_TIME_SLICE
Definition: tn_oldsymbols.h:335
TN_Task::mutex_queue
struct TN_ListItem mutex_queue
list of all mutexes that are locked by task
Definition: tn_tasks.h:360
TN_RC_OK
@ TN_RC_OK
Successful operation.
Definition: tn_common.h:84
TN_ListItem::next
struct TN_ListItem * next
pointer to next item
Definition: tn_list.h:73
TN_RC_WCONTEXT
@ TN_RC_WCONTEXT
Wrong context error: returned if function is called from non-acceptable context.
Definition: tn_common.h:106
tn_sys_tslice_ticks
#define tn_sys_tslice_ticks
old TNKernel name of #tn_sys_tslice_set
Definition: tn_oldsymbols.h:280
TN_RCode
TN_RCode
Result code returned by kernel services.
Definition: tn_common.h:81
TN_Mutex
Mutex.
Definition: tn_mutex.h:122
tn_is_task_context
_TN_STATIC_INLINE TN_BOOL tn_is_task_context(void)
Returns whether current system context is #TN_CONTEXT_TASK
Definition: tn_sys.h:533
TN_INTSAVE_DATA
#define TN_INTSAVE_DATA
Declares variable that is used by macros TN_INT_DIS_SAVE() and TN_INT_RESTORE() for storing status re...
Definition: tn_arch_example.h:143
tn_enable_interrupt
#define tn_enable_interrupt
old TNKernel name of #TN_INT_RESTORE
Definition: tn_oldsymbols.h:309
TERR_NO_ERR
#define TERR_NO_ERR
old TNKernel name of #TN_RC_OK
Definition: tn_oldsymbols.h:207
TERR_WRONG_PARAM
#define TERR_WRONG_PARAM
old TNKernel name of #TN_RC_WPARAM
Definition: tn_oldsymbols.h:222
TN_NUM_PRIORITY
#define TN_NUM_PRIORITY
old TNKernel name of #TN_PRIORITIES_CNT
Definition: tn_oldsymbols.h:323
TN_ListItem
Circular doubly linked list item, for internal kernel usage.
Definition: tn_list.h:63
TN_Task::priority
int priority
current task priority
Definition: tn_tasks.h:392