Essential problems of TNKernel

The most essential problem is that TNKernel is a very hastily-written project. Several concepts are just poorly thought out, others are poorly implemented: there is a lot of code duplication and inconsistency;
It is untested: there are no unit tests for the kernel, this is not acceptable for the project like real-time kernel;

As a result of the two above, the kernel is buggy. And even more, the kernel is really hard to maintain because of inconsistency, so when we add new features or change something, we are likely to add new bugs as well.

It is unsupported. I've written to the Yuri Tiomkin about troubles with MAKE_ALIG() macro as well as about bugs in the kernel, my messages were just ignored;
Documentation is far from perfect and it lives separately of the project itself: latest kernel version at the moment is 2.7 (published at 2013), but latest documentation is for 2.3 (published at 2006).

Examples of poor implementation

One entry point, one exit point

The most common example that happens across all TNKernel sources is code like the following:

int my_function(void)
{
   tn_disable_interrupt();
   //-- do something
   if (error()){
      //-- do something
      tn_enable_interrupt();
      return ERROR;
   }
   //-- do something
   tn_enable_interrupt();
   return SUCCESS;
}

If you have multiple return statements or, even more, if you have to perform some action before return (tn_enable_interrupt() in the example above), it's great job for goto:

int my_function(void)
{
   int rc = SUCCESS;
   tn_disable_interrupt();
   //-- do something
   if (error()){
      //-- do something
      rc = ERROR;
      goto out;
   }
   //-- do something
out:
   tn_enable_interrupt();
   return rc;
}

I understand there are a lot of people that don't agree with me on this (mostly because they religiously believe that goto is unconditionally evil), but anyway I decided to explain it. And, let's go further:

While multiple goto-s to single label are better than multiple return statements, it becomes less useful as we get to something more complicated. Imagine we need to perform some checks before disabling interrupts, and perform some other checks after disabling them. Then, we have to create two labels, like that:

int my_function(void)
{
   int rc = SUCCESS;
   if (error1()){
      rc = ERROR1;
      goto out;
   }
   tn_disable_interrupt();
   if (error2()){
      rc = ERROR2;
      goto out_ei;
   }
   if (error3()){
      rc = ERROR3;
      goto out_ei;
   }
   //-- perform job
out_ei:
   tn_enable_interrupt();
out:
   return rc;
}

For each error handling, we should specify the label explicitly, and it's easy to mix labels up, especially if we add some new case to check in the future. So, I believe this approach is a superior:

int my_function(void)
{
   int rc = SUCCESS;
   if (error1()){
      rc = ERROR1;
   } else {
      tn_disable_interrupt();
      if (error2()){
         rc = ERROR2;
      } else if (error3()){
         rc = ERROR3;
      } else {
         //-- perform job
      }
      tn_enable_interrupt();
   }
   return rc;
}

Then, for each new error handling, we should just add new else if block, and there's no need to care where to go if error happened. Let the compiler do the branching job for you. More, this code looks more compact.

Needless to say, I already found such bug in original TNKernel 2.7 code. The function tn_sys_tslice_ticks() looks as follows:

int tn_sys_tslice_ticks(int priority,int value)
{
   TN_INTSAVE_DATA
   TN_CHECK_NON_INT_CONTEXT
   tn_disable_interrupt();
   if(priority <= 0 || priority >= TN_NUM_PRIORITY-1 ||
                                value < 0 || value > MAX_TIME_SLICE)
      return TERR_WRONG_PARAM;
   tn_tslice_ticks[priority] = value;
   tn_enable_interrupt();
   return TERR_NO_ERR;
}

If you look closely, you can see that if wrong params were given, TERR_WRONG_PARAM is returned, and interrupts remain disabled. If we follow the one entry point, one exit point rule, this bug is much less likely to happen.

Don't repeat yourself

Original TNKernel 2.7 code has a lot of code duplication. So many similar things are done in several places just by copy-pasting the code.

If we have similar functions (like, tn_queue_send(), tn_queue_send_polling() and tn_queue_isend_polling()), the implementation is just copy-pasted, there's no effort to generalize things.
Mutexes have complicated algorithms for task priorities. It is implemented in inconsistent, messy manner, which leads to bugs (refer to Bugs of TNKernel 2.7)
Transitions between task states are done, again, in inconsistent copy-pasting manner. When we need to move task from, say, RUNNABLE state to the WAIT state, it's not enough to just clear one flag and set another one: we also need to remove it from whatever run queue the task is contained, probably find next task to run, then set reason of waiting, probably add to wait queue, set up timeout if specified, etc. In original TNKernel 2.7, there's no general mechanism to do this.

Meanwhile, the correct way is to create three functions for each state:
- to set the state;
- to clear the state;
- to test if the state active.
And then, when we need to move task from one state to another, we typically should just call two functions: one for clearing current state, and one for settine a new one. It is consistent, and of course this approach is used in TNeo.

As a result of the violation of the rule Don't repeat yourself, when we need to change something, we need to change it in several places. Needless to say, it is very error-prone practice, and of course there are bugs in original TNKernel because of that (refer to Bugs of TNKernel 2.7).

Macros that return from function

TNKernel uses architecture-depended macros like TN_CHECK_NON_INT_CONTEXT. This macro checks the current context (task or ISR), and if it is ISR, it returns TERR_WRONG_PARAM.

It isn't obvious to the reader of the code, but things like returning from function must be as obvious as possible.

It is better to invent some function that tests current context, and return the value explicitly:

enum TN_RCode my_function(void)
   enum TN_RCode rc = TN_RC_OK;
   // ...
   if (!tn_is_task_context()){
      rc = TN_RC_WCONTEXT;
      goto out;
   }
   // ...
out:
   return rc
}

Code for doubly-linked lists

TNKernel uses doubly-linked lists heavily, which is very good. I must admit that I really like the way data is organized in TNKernel. But, unfortunately, code that manages data is far from perfect, as I already mentioned.

So, let's get to the lists. I won't paste all the macros here, just make some overview. If we have a list, it's very common task to iterate through it. Typical snippet in TNKernel looks like this:

CDLL_QUEUE * curr_que;
TN_MUTEX * tmp_mutex;
curr_que = tn_curr_run_task->mutex_queue.next;
while(curr_que != &(tn_curr_run_task->mutex_queue))
{
   tmp_mutex = get_mutex_by_mutex_queque(curr_que);
   /* now, tmp_mutex points to the next object, so,
      we can do something useful with it */
   curr_que = curr_que->next;
}

This code is neither easy to read nor elegant. It's much better to use special macro for that (actually, similar macros are used across the whole Linux kernel code) :

TN_MUTEX * tmp_mutex;
tn_list_for_each_entry(tmp_mutex, &(tn_curr_run_task->mutex_queue), mutex_queue){
   /* now, tmp_mutex points to the next object, so,
      we can do something useful with it */
}

Much shorter and intuitive, isn't it? We even don't have to keep special curr_que.

Bugs of TNKernel 2.7

TNKernel 2.7 has several bugs, which are caught by detailed unit tests and fixed.

We have two tasks: low-priority one task_low and high-priority one task_high. They use mutex M1 with priority inheritance.
- task_low locks M1
- task_high tries to lock mutex M1 and gets blocked -> priority of task_low elevates to the priority of task_high
- task_high stops waiting for mutex by timeout -> priority of task_low remains elevated. The same happens if task_high is terminated by tn_task_terminate().
We have three tasks: two low-priority tasks task_low1 and task_low2, and high-priority one task_high. They use mutex M1 with priority inheritance.
- task_low1 locks M1
- task_low2 tries to lock M1 and gets blocked
- task_high tries to lock M1 and gets blocked -> priority if task_low1 is elevated
- task_low1 unlocks M1 ->
  - priority of task_low1 returns to base value
  - task_low2 locks M1 because it's the next task in the mutex queue
  - now, priority of task_low2 should be elevated, but it doesn't happen. Priority inversion is in effect.
tn_mutex_delete() : if mutex is not locked, TERR_ILUSE is returned. Of course, task should be able to delete non-locked mutex;
If task that waits for mutex is in WAIT+SUSPEND state, and mutex is deleted, TERR_NO_ERR is returned after returning from SUSPEND state, instead of TERR_DLT. The same for queue deletion, semaphore deletion, event deletion.
tn_sys_tslice_ticks() : if wrong params are given, TERR_WRONG_PARAM is returned and interrupts remain disabled.
tn_queue_receive() and tn_fmem_get() : if timeout is in effect, then TN_RC_TIMEOUT is returned, but user-provided pointer is altered anyway (some garbage data is written there)
Probably not a "bug", but an issue in the data queue: actual capacity of the buffer is less by 1 than user has specified and allocated
Event: if TN_EVENT_ATTR_CLR flag is set, and the task that is waiting for event is suspended, this flag TN_EVENT_ATTR_CLR is ignored (pattern is not reset). I can't say this bug is "fixed" because TNeo has event groups instead of events, and there is no TN_EVENT_ATTR_CLR flag.

Bugs with mutexes are the direct result of the inconsistency and copy-pasting the code, as well as lack of unit tests.

Table of Contents