Message-ID:

A transistor protected by a fast-acting fuse will protect the fuse by blowing first.

On 23/10/2021 22:49, pozz wrote:
> Il 23/10/2021 18:09, David Brown ha scritto:
>> On 23/10/2021 00:07, pozz wrote:
>>> Even I write software for embedded systems for more than 10 years,
>>> there's an argument that from time to time let me think for hours and
>>> leave me with many doubts.
>>
>> It's nice to see a thread like this here - the group needs such
>> discussions!
>>
>>>
>>> Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
>>> The software is bare metal, without any OS. The main pattern is the well
>>> known mainloop (background code) that is interrupted by ISR.
>>>
>>> Interrupts are used mainly for timings and for low-level driver. For
>>> example, the UART reception ISR move the last received char in a FIFO
>>> buffer, while the mainloop code pops new data from the FIFO.
>>>
>>>
>>> static struct {
>>>    unsigned char buf[RXBUF_SIZE];
>>>    uint8_t in;
>>>    uint8_t out;
>>> } rxfifo;
>>>
>>> /* ISR */
>>> void uart_rx_isr(void) {
>>>    unsigned char c = UART->DATA;
>>>    rxfifo.buf[in % RXBUF_SIZE] = c;
>>>    rxfifo.in++;
>>
>> Unless you are sure that RXBUF_SIZE is a power of two, this is going to
>> be quite slow on an AVR. Modulo means division, and while division by a
>> constant is usually optimised to a multiplication by the compiler, you
>> still have a multiply, a shift, and some compensation for it all being
>> done as signed integer arithmetic.
>>
>> It's also wrong, for non-power of two sizes, since the wrapping of your
>> increment and your modulo RXBUF_SIZE get out of sync.
>
> Yes, RXBUF_SIZE is a power of two.
>

If your code relies on that, make sure the code will fail to compile if
it is not the case. Documentation is good, compile-time check is better:

static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");

>
>
>> The usual choice is to track "head" and "tail", and use something like:
>>
>> void uart_rx_isr(void) {
>>    unsigned char c = UART->DATA;
>>    // Reset interrupt flag
>>    uint8_t next = rxfifo.tail;
>>    rxfifo.buf[next] = c;
>>    next++;
>>    if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
>>    rxfifo.tail = next;
>> }
>
> This isn't the point of this thread, anyway...
> You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
> approach is different.
>
> RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
> and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
> works well.
>

Yes, your approach will work - /if/ you have a power-of-two buffer size.
It has no noticeable efficiency advantages, merely an extra
inconvenient restriction and the possible confusion caused by doing
things in a different way from common idioms.

However, this is not the point of the thread - so I am happy to leave
that for now.

> Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
> received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
> be increased to 128.
> Now mainloop detect the new char (in != out), reads the new char at
> rxfifo.buf[127 % 128=127] and increase out that will be 128.
>
> The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
> = 0] and rxfifo.in will be 129. Again, the next byte will be saved to
> rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.
>
> When the mainloop tries to pop data from fifo, it tests
>
>    rxfifo.in(130) !=rxfifo.out(128)
>
> The test is true, so the code extracts chars from rxbuf[out % 128] that
> is rxbuf[0]... and so on.
>
> I hope that explanation is good.
>
>
>>
>>>    // Reset interrupt flag
>>> }
>>>
>>> /* Called regularly from mainloop code */
>>> int uart_task(void) {
>>>    int c = -1;
>>>    if (out != in) {
>>>      c = rxfifo.buf[out % RXBUF_SIZE];
>>>      out++;
>>>    }
>>>    return -1;
>>> }
>>
>> int uart_task(void) {
>>    int c = -1;
>>    uint8_t next = rxfifo.head;
>>    if (next != rxfifo.tail) {
>>        c = rxfifo.buf[next];
>>        next++;
>>        if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
>>        rxfifo.head = next;
>>    }
>>    return c;
>> }
>>
>> These don't track buffer overflow at all - you need to call uart_task()
>> often enough to avoid that.
>
> Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
> happen ever. Anyway, if it happens, the higher level layers (protocol)
> should detect a corrupted packet.
>

You risk getting seriously out of sync if there is an overflow.
Normally, on an overflow there will be a dropped character or two (which
as you say, must be caught at a higher level). Here you could end up
going round your buffer an extra time and /gaining/ RXBUF_SIZE extra
characters.

Still, if you are sure that your functions are called fast enough so
that overflow is not a concern, then that's fine. Extra code to check
for a situation that can't occur is not helpful.

>
>> (I'm skipping volatiles so we don't get ahead of your point.)
>>
>>>
>>>
>>> From a 20-years old article[1] by Nigle Jones, this seems a situation
>>> where volatile must be used for rxfifo.in, because is modified by an ISR
>>> and used in the mainloop code.
>>>
>>
>> Certainly whenever data is shared between ISR's and mainloop code, or
>> different threads, then you need to think about how to make sure data is
>> synchronised and exchanged. "volatile" is one method, atomics are
>> another, and memory barriers can be used.
>>
>>> I don't think so, rxfifo.in is read from memory only one time in
>>> uart_task(), so there isn't the risk that compiler can optimize badly.
>>
>> That is incorrect in two ways. One - baring compiler bugs (which do
>> occur, but they are very rare compared to user bugs), there is no such
>> thing as "optimising badly". If optimising changes the behaviour of the
>> code, other than its size and speed, the code is wrong.
>
> Yes of course, but I don't think the absence of volatile for rxfifo.in,
> even if it can change in ISR, could be a *real* problem with *modern and
> current* compilers.
>

Personally, I am not satisfied with "it's unlikely to be a problem in
practice" - I prefer "The language guarantees it is not a problem".
Remember, when you know the data needs to be read at this point, then
using a volatile read is free. Volatile does not make code less
efficient unless you use it incorrectly and force more accesses than are
necessary. So using volatile accesses for "rxfifo.in" here turns
"probably safe" into "certainly safe" without cost. What's not to like?

> voltile attribute needs to avoid compiler optimization (that would be a
> bad thing, because of volatile nature of the variabile), but on that
> code it's difficult to think of an optimization, caused by the absence
> of volatile, that changes the behaviour erroneously... except reorering.
>
>
>> Two - it is a
>> very bad idea to imagine that having code inside a function somehow
>> "protects" it from re-ordering or other optimisation.
>
> I didn't say this, at the contrary I was thinking exactly to reordering
> issues.
>
>
>> Functions can be inlined, outlined, cloned, and shuffled about.
>> Link-time optimisation, code in headers, C++ modules, and other
>> cross-unit optimisations are becoming more and more common. So while it
>> might be true /today/ that the compiler has no alternative but to read
>> rxfifo.in once per call to uart_task(), you cannot assume that will be
>> the case with later compilers or with more advanced optimisation
>> techniques enabled. It is safer, more portable, and more future-proof
>> to avoid such assumptions.
>
> Ok, you are talking of future scenarios. I don't think actually this
> could be a real problem. Anyway your observation makes sense.
>
>
>>
>>> Even if ISR is fired immediately after the if statement, this doesn't
>>> bring to a dangerous state: the just received data will be processed at
>>> the next call to uart_task().
>>>
>>> So IMHO volatile isn't necessary here. And critical sections (i.e.
>>> disabling interrupts) aren't useful too.
>>>
>>> However I'm thinking about memory barrier. Suppose the compiler reorder
>>> the instructions in uart_task() as follows:
>>>
>>>
>>>    c = rxfifo.buf[out % RXBUF_SIZE]
>>>    if (out != in) {
>>>      out++;
>>>      return c;
>>>    } else {
>>>      return -1;
>>>    }
>>>
>>>
>>> Here there's a big problem, because compiler decided to firstly read
>>> rxfifo.buf[] and then test in and out equality. If the ISR is fired
>>> immediately after moving data to c (most probably an internal register),
>>> the condition in the if statement will be true and the register value is
>>> returned. However the register value isn't correct.
>>
>> You are absolutely correct.
>>
>>>
>>> I don't think any modern C compiler reorder uart_task() in this way, but
>>> we can't be sure. The result shouldn't change for the compiler, so it
>>> can do this kind of things.
>>
>> It is not an unreasonable re-arrangement. On processors with
>> out-of-order execution (which does not apply to the AVR or Cortex-M),
>> compilers will often push loads as early as they can in the instruction
>> stream so that they start the cache loading process as quickly as
>> possible. (But note that on such "big" processors, much of this
>> discussion on volatile and memory barriers is not sufficient, especially
>> if there is more than one core. You need atomics and fences, but that's
>> a story for another day.)
>>
>>>
>>> How to fix this issue if I want to be extremely sure the compiler will
>>> not reorder this way? Applying volatile to rxfifo.in shouldn't help for
>>> this, because compiler is allowed to reorder access of non volatile
>>> variables yet[2].
>>>
>>
>> The important thing about "volatile" is that it is /accesses/ that are
>> volatile, not objects. A volatile object is nothing more than an object
>> for which all accesses are volatile by default. But you can use
>> volatile accesses on non-volatile objects. This macro is your friend:
>>
>> #define volatileAccess(v) *((volatile typeof((v)) *) &(v))
>>
>> (Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
>> - if you are using other compilers then you can make an uglier
>> equivalent using _Generic. However, if you are using a C compiler that
>> supports C11, it is probably gcc or clang, and you can use the "typeof"
>> extension.)
>>
>> That macro will let you make a volatile read or write to an object
>> without requiring that /all/ accesses to it are volatile.
>
> This is a good point. The code in ISR can't be interrupted, so there's
> no need to have volatile access in ISR.
>

Correct. (Well, /almost/ correct - bigger microcontrollers have
multiple interrupt priorities. But it should be correct in this case,
as no other interrupt would be messing with the same variables anyway.)

>
>>> One solution is adding a memory barrier in this way:
>>>
>>>
>>> int uart_task(void) {
>>>    int c = -1;
>>>    if (out != in) {
>>>      memory_barrier();
>>>      c = rxfifo.buf[out % RXBUF_SIZE];
>>>      out++;
>>>    }
>>>    return -1;
>>> }
>>>
>>
>> Note that you are forcing the compiler to read "out" twice here, as it
>> can't keep the value of "out" in a register across the memory barrier.
>
> Yes, you're right. A small penalty to avoid the problem of reordering.
>

But an unnecessary penalty.

>
>> (And as I mentioned before, the compiler might be able to do larger
>> scale optimisation across compilation units or functions, and in that
>> way keep values across multiple calls to uart_task.)
>>
>>>
>>> However this approach appears to me dangerous. You have to check and
>>> double check if, when and where memory barriers are necessary and it's
>>> simple to skip a barrier where it's nedded and add a barrier where it
>>> isn't needed.
>>
>> Memory barriers are certainly useful, but they are a shotgun approach -
>> they affect /everything/ involving reads and writes to memory. (But
>> remember they don't affect ordering of calculations.)
>>
>>>
>>> So I'm thinking that a sub-optimal (regarding efficiency) but reliable
>>> (regarding the risk to skip a barrier where it is needed) could be to
>>> enter a critical section (disabling interrupts) anyway, if it isn't
>>> strictly needed.
>>>
>>>
>>> int uart_task(void) {
>>>    ENTER_CRITICAL_SECTION();
>>>    int c = -1;
>>>    if (out != in) {
>>>      c = rxfifo.buf[out % RXBUF_SIZE];
>>>      out++;
>>>    }
>>>    EXIT_CRITICAL_SECTION();
>>>    return -1;
>>> }
>>
>> Critical sections for something like this are /way/ overkill. And a
>> critical section with a division in the middle? Not a good idea.
>>
>>>
>>>
>>> Another solution could be to apply volatile keyword to rxfifo.in *AND*
>>> rxfifo.buf too, so compiler can't change the order of accesses them.
>>>
>>> Do you have other suggestions?
>>>
>>
>> Marking "in" and "buf" as volatile is /far/ better than using a critical
>> section, and likely to be more efficient than a memory barrier. You can
>> also use volatileAccess rather than making buf volatile, and it is often
>> slightly more efficient to cache volatile variables in a local variable
>> while working with them.
>
> Yes, I think so too. Lastly I read many experts say volatile is often a
> bad thing, so I'm re-thinking about its use compared with other approaches.
>

People who say "volatile is a bad thing" are often wrong. Remember, all
generalisations are false :-)

"volatile" is a tool. It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless. It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct. For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics,
rather than using "volatile" directly.

But for small-systems embedded programming, it is very handy. Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.

Memory barriers are great for utility functions such as interrupt
enable/disable inline functions, but are usually sub-optimal compared to
specific and targeted volatile accesses.

>
>>> [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
>>> [2] https://blog.regehr.org/archives/28
>>
>

Subject	Replies	Author
How to write a simple driver in bare metal systems: volatile, memory By: pozz on Fri, 22 Oct 2021	58	pozz

A transistor protected by a fast-acting fuse will protect the fuse by blowing first.

computers / comp.arch.embedded / Re: How to write a simple driver in bare metal systems: volatile, memory barrier, critical sections and so on