A Quick Look behind a "for loop"

In a modern processor, such as x86_64 or ARM, it’s remarkably complicated. Here’s a quick example I knocked up to demonstrate.
A Quick Look behind a "for loop"
I pulled out the loop body into a second function to stop the compiler from optimizing the empty loop into nothing, and so that the loop itself is separate from the stuff it does. When the above code is assembled for the ARM_64, you get this:




  1. _example_for_loop: ## @example_for_loop
  2. Lfunc_begin0:
  3. .file 2 "/Users/grahamcox/Projects/test" "/Users/grahamcox/Projects/test/test/forloop.c"
  4. .loc 2 13 0 ## /Users/grahamcox/Projects/test/test/forloop.c:13:0
  5. .cfi_startproc
  6. ## %bb.0:
  7. pushq %rbp
  8. .cfi_def_cfa_offset 16
  9. .cfi_offset %rbp, -16
  10. movq %rsp, %rbp
  11. .cfi_def_cfa_register %rbp
  12. pushq %r14
  13. pushq %rbx
  14. .cfi_offset %rbx, -32
  15. .cfi_offset %r14, -24
  16. leaq L_.str(%rip), %r14
  17. xorl %ebx, %ebx
  18. Ltmp0:
  19. ##DEBUG_VALUE: i <- 0
  20. .p2align 4, 0x90
  21. LBB0_1: ## =>This Inner Loop Header: Depth=1
  22. ##DEBUG_VALUE: loop_body:i <- %ebx
  23. ##DEBUG_VALUE: i <- %ebx
  24. .loc 2 24 2 prologue_end ## /Users/grahamcox/Projects/test/test/forloop.c:24:2
  25. xorl %eax, %eax
  26. movq %r14, %rdi
  27. movl %ebx, %esi
  28. callq _printf
  29. Ltmp1:
  30. .loc 2 14 27 ## /Users/grahamcox/Projects/test/test/forloop.c:14:27
  31. incl %ebx
  32. Ltmp2:
  33. ##DEBUG_VALUE: i <- %ebx
  34. .loc 2 14 20 is_stmt 0 ## /Users/grahamcox/Projects/test/test/forloop.c:14:20
  35. cmpl $100, %ebx
  36. Ltmp3:
  37. .loc 2 14 2 ## /Users/grahamcox/Projects/test/test/forloop.c:14:2
  38. jne LBB0_1
  39. Ltmp4:
  40. ## %bb.2:
  41. .loc 2 18 1 is_stmt 1 ## /Users/grahamcox/Projects/test/test/forloop.c:18:1
  42. popq %rbx
  43. popq %r14
  44. popq %rbp
  45. retq
  46. Ltmp5:
  47. Lfunc_end0:
  48. .cfi_endproc
This is assembled with optimization at max, which makes it as short and efficient as it thinks it knows how. Note, for example, that the call out to loop_body has been inlined (line 28).
Some of this is just debugging hints, such as the various ‘.loc’ lines, so they’re not part of the executable code itself. But they do give you a link to what part of the source code the assembled code is derived from.
Not all machine code is this difficult. Back in the day when the 6502 was popular, a ‘for’ loop like this could be written in 4 or 5 instructions. This was just as well when you consider 1MHz was considered ‘fast’.
  1. LDA #100;
  2. TAX;
  3. LOOP: JSR LOOP_BODY;
  4. DEX;
  5. CPX #0;
  6. BNE LOOP;
(n.b. this might not be genuine 6502 assembly code -But it follows the general idea). Hope its useful.

Post a Comment

0 Comments