Hi Balazs,
I confirm what you observe.
If I compile the source with:
clang --shared -o libalientest.so -Os alientest.c
then disassemble with:
objdump -D libalientest.so | less
I get this:
000000000000057c <atv>:
57c: c3 retq
000000000000057d <at34>:
57d: f2 0f 10 05 1b 00 00 movsd 0x1b(%rip),%xmm0 # 5a0 <_fini+0x10>
584: 00
585: f2 0f 10 0d 1b 00 00 movsd 0x1b(%rip),%xmm1 # 5a8 <_fini+0x18>
58c: 00
58d: c3 retq
Code is highly suspicious for atv...
at34 just load the constants in dregs, then do nothing (inline atv).
So this does not work as advertized...
The non optimized version is very convoluted:
0000000000000580 <atv>:
580: 55 push %rbp
581: 48 89 e5 mov %rsp,%rbp
584: f2 0f 11 45 e8 movsd %xmm0,-0x18(%rbp)
589: f2 0f 11 4d e0 movsd %xmm1,-0x20(%rbp)
58e: f2 0f 10 45 e8 movsd -0x18(%rbp),%xmm0
593: f2 0f 11 45 d0 movsd %xmm0,-0x30(%rbp)
598: f2 0f 10 45 e0 movsd -0x20(%rbp),%xmm0
59d: f2 0f 11 45 d8 movsd %xmm0,-0x28(%rbp)
5a2: 0f 10 45 d0 movups -0x30(%rbp),%xmm0
5a6: 0f 29 45 f0 movaps %xmm0,-0x10(%rbp)
5aa: f2 0f 10 45 f0 movsd -0x10(%rbp),%xmm0
5af: f2 0f 10 4d f8 movsd -0x8(%rbp),%xmm1
5b4: 5d pop %rbp
5b5: c3 retq
5b6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
5bd: 00 00 00
00000000000005c0 <at34>:
5c0: 55 push %rbp
5c1: 48 89 e5 mov %rsp,%rbp
5c4: 48 83 ec 10 sub $0x10,%rsp
5c8: f2 0f 10 05 38 00 00 movsd 0x38(%rip),%xmm0 # 608 <_fini+0x10>
5cf: 00
5d0: f2 0f 10 0d 38 00 00 movsd 0x38(%rip),%xmm1 # 610 <_fini+0x18>
5d7: 00
5d8: e8 a3 ff ff ff callq 580 <atv>
5dd: f2 0f 11 45 f0 movsd %xmm0,-0x10(%rbp)
5e2: f2 0f 11 4d f8 movsd %xmm1,-0x8(%rbp)
5e7: f2 0f 10 45 f0 movsd -0x10(%rbp),%xmm0
5ec: f2 0f 10 4d f8 movsd -0x8(%rbp),%xmm1
5f1: 48 83 c4 10 add $0x10,%rsp
5f5: 5d pop %rbp
5f6: c3 retq
It's clear that the resulting structure is passed via first 2 dregs (xmm0 and xmm1).
Doesn't it qualify as a clang bug? Or is our interpretation of ABI erroneous?
I would opt for the later...