STM32 GPIOs and Template Metaprogramming

I posted a new article on my website.

It talks about a performance optimized and high level way of handling STM32 GPIOs using Template Metaprogramming.

About these ads

Tags: ,

4 Responses to “STM32 GPIOs and Template Metaprogramming”

  1. Lee R Says:

    Hi,

    I’ve written a code snippet to show that the code in ‘GpioBase::doSetMode’ can be inlined and the branch on ‘pin>=8′ eliminated using TMP. You think it’s OK?

    See:
    http://pastebin.com/f7ae1a65f

  2. fedetft Says:

    As you might have guessed I didn’t inline the mode() member function partly because I didn’t know an elegant way to do it (I’m still a beginner in TMP), but partly I didn’t inline it on purpose.
    I’ll explain this more in detail: in many applications the GPIO mode settings are performed once at startup and never changed, so the mode() member function is not performance critical. Given that, I decided to apply Scott Meyers’ advice to factor template independent code out to avoid possible code bloat (a primary concern in embedded development).
    I’ll do some code size measurement to see if the eliminated branch compensates for the inline and decide whether to apply your patch.
    Anyway thanks for the comment, I’ve learnt something new :D

  3. fedetft Says:

    I’ve done some measurement, here are the results:
    For what concerns code size, here is the code generated by the non-inline version:

    //16bytes
    00000000 : <_Z10set_gpio_av>:
       0:	f641 4000 	movw	r0, #7168	; 0x1c00
       4:	b500      	push	{lr}
       6:	f2c4 0001 	movt	r0, #16385	; 0x4001
       a:	2106      	movs	r1, #6
       c:	2203      	movs	r2, #3
       e:	f7ff fffe 	bl	0 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E>
      12:	bd00      	pop	{pc}
    
    //64bytes
    00000000 : <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E>:
       0:	b2c9      	uxtb	r1, r1
       2:	2907      	cmp	r1, #7
       4:	d80d      	bhi.n	22 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E+0x22>
       6:	0089      	lsls	r1, r1, #2
       8:	230f      	movs	r3, #15
       a:	408b      	lsls	r3, r1
       c:	fa12 f101 	lsls.w	r1, r2, r1
      10:	f8d0 c000 	ldr.w	ip, [r0]
      14:	ea2c 0303 	bic.w	r3, ip, r3
      18:	6003      	str	r3, [r0, #0]
      1a:	6803      	ldr	r3, [r0, #0]
      1c:	4319      	orrs	r1, r3
      1e:	6001      	str	r1, [r0, #0]
      20:	4770      	bx	lr
      22:	3908      	subs	r1, #8
      24:	b2c9      	uxtb	r1, r1
      26:	0089      	lsls	r1, r1, #2
      28:	230f      	movs	r3, #15
      2a:	408b      	lsls	r3, r1
      2c:	408a      	lsls	r2, r1
      2e:	f8d0 c004 	ldr.w	ip, [r0, #4]
      32:	ea2c 0303 	bic.w	r3, ip, r3
      36:	6043      	str	r3, [r0, #4]
      38:	6843      	ldr	r3, [r0, #4]
      3a:	431a      	orrs	r2, r3
      3c:	6042      	str	r2, [r0, #4]
      3e:	e7ef      	b.n	20 <_ZN8GpioBase9doSetModeEjhN4Mode5Mode_E+0x20>
    

    And here is the code for the inline version:

    //24bytes
    00000000 : <_Z10set_gpio_av>:
       0:	f641 4300 	movw	r3, #7168	; 0x1c00
       4:	f2c4 0301 	movt	r3, #16385	; 0x4001
       8:	681a      	ldr	r2, [r3, #0]
       a:	f022 6270 	bic.w	r2, r2, #251658240	; 0xf000000
       e:	601a      	str	r2, [r3, #0]
      10:	681a      	ldr	r2, [r3, #0]
      12:	f042 7240 	orr.w	r2, r2, #50331648	; 0x3000000
      16:	601a      	str	r2, [r3, #0]
      18:	4770      	bx	lr
      1a:	46c0      	nop			(mov r8, r8)
    

    As you can see in the non inline version there is a one time cost of 64bytes (the GpioBase::doSetMode() function), and every function call costs 16bytes. In the inline version the cost is simply 24bytes per function call inlined. This is going to increase code size, but only slightly.

    Now to performance:
    I made a simple test to count how many calls to Gpio::mode() could be done in a second, and here are the results (code running from external RAM, so both figures are rather small):
    426033 calls to mode() per second (with inlining)
    181360 calls to mode() per second (without inlining)
    2.3x speed improvement

    I think the large performance gain outweighs the small code size increase. Patch applied (your name has been added to the changelog).

  4. Lee Richmond Says:

    Nice.

    No size penalty until after 3 inlined calls. So possibly optimal in the common case (i.e. called just once at startup), and a nice trade-off thereafter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: