I posted a new article on my website.
It talks about a performance optimized and high level way of handling STM32 GPIOs using Template Metaprogramming.
Like this:
Like Loading...
Tags: C++, embedded
This entry was posted on December 23, 2009 at 2:37 pm and is filed under Articles. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.
December 25, 2009 at 9:09 pm |
Hi,
I’ve written a code snippet to show that the code in ‘GpioBase::doSetMode’ can be inlined and the branch on ‘pin>=8′ eliminated using TMP. You think it’s OK?
See:
http://pastebin.com/f7ae1a65f
December 26, 2009 at 12:06 am |
As you might have guessed I didn’t inline the mode() member function partly because I didn’t know an elegant way to do it (I’m still a beginner in TMP), but partly I didn’t inline it on purpose.
I’ll explain this more in detail: in many applications the GPIO mode settings are performed once at startup and never changed, so the mode() member function is not performance critical. Given that, I decided to apply Scott Meyers’ advice to factor template independent code out to avoid possible code bloat (a primary concern in embedded development).
I’ll do some code size measurement to see if the eliminated branch compensates for the inline and decide whether to apply your patch.
Anyway thanks for the comment, I’ve learnt something new
December 26, 2009 at 12:49 pm |
I’ve done some measurement, here are the results:
For what concerns code size, here is the code generated by the non-inline version:
//16bytes 00000000 : <_Z10set_gpio_av>: 0: f641 4000 movw r0, #7168 ; 0x1c00 4: b500 push {lr} 6: f2c4 0001 movt r0, #16385 ; 0x4001 a: 2106 movs r1, #6 c: 2203 movs r2, #3 e: f7ff fffe bl 0 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E> 12: bd00 pop {pc} //64bytes 00000000 : <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E>: 0: b2c9 uxtb r1, r1 2: 2907 cmp r1, #7 4: d80d bhi.n 22 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E+0x22> 6: 0089 lsls r1, r1, #2 8: 230f movs r3, #15 a: 408b lsls r3, r1 c: fa12 f101 lsls.w r1, r2, r1 10: f8d0 c000 ldr.w ip, [r0] 14: ea2c 0303 bic.w r3, ip, r3 18: 6003 str r3, [r0, #0] 1a: 6803 ldr r3, [r0, #0] 1c: 4319 orrs r1, r3 1e: 6001 str r1, [r0, #0] 20: 4770 bx lr 22: 3908 subs r1, #8 24: b2c9 uxtb r1, r1 26: 0089 lsls r1, r1, #2 28: 230f movs r3, #15 2a: 408b lsls r3, r1 2c: 408a lsls r2, r1 2e: f8d0 c004 ldr.w ip, [r0, #4] 32: ea2c 0303 bic.w r3, ip, r3 36: 6043 str r3, [r0, #4] 38: 6843 ldr r3, [r0, #4] 3a: 431a orrs r2, r3 3c: 6042 str r2, [r0, #4] 3e: e7ef b.n 20 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E+0x20>And here is the code for the inline version:
As you can see in the non inline version there is a one time cost of 64bytes (the GpioBase::doSetMode() function), and every function call costs 16bytes. In the inline version the cost is simply 24bytes per function call inlined. This is going to increase code size, but only slightly.
Now to performance:
I made a simple test to count how many calls to Gpio::mode() could be done in a second, and here are the results (code running from external RAM, so both figures are rather small):
426033 calls to mode() per second (with inlining)
181360 calls to mode() per second (without inlining)
2.3x speed improvement
I think the large performance gain outweighs the small code size increase. Patch applied (your name has been added to the changelog).
December 26, 2009 at 5:06 pm |
Nice.
No size penalty until after 3 inlined calls. So possibly optimal in the common case (i.e. called just once at startup), and a nice trade-off thereafter.