In a modern processor, such as x86_64 or ARM, it’s remarkably complicated. Here’s a quick example I knocked up to demonstrate.
I pulled out the loop body into a second function to stop the compiler from optimizing the empty loop into nothing, and so that the loop itself is separate from the stuff it does. When the above code is assembled for the ARM_64, you get this:
- _example_for_loop: ## @example_for_loop
- Lfunc_begin0:
- .file 2 "/Users/grahamcox/Projects/test" "/Users/grahamcox/Projects/test/test/forloop.c"
- .loc 2 13 0 ## /Users/grahamcox/Projects/test/test/forloop.c:13:0
- .cfi_startproc
- ## %bb.0:
- pushq %rbp
- .cfi_def_cfa_offset 16
- .cfi_offset %rbp, -16
- movq %rsp, %rbp
- .cfi_def_cfa_register %rbp
- pushq %r14
- pushq %rbx
- .cfi_offset %rbx, -32
- .cfi_offset %r14, -24
- leaq L_.str(%rip), %r14
- xorl %ebx, %ebx
- Ltmp0:
- ##DEBUG_VALUE: i <- 0
- .p2align 4, 0x90
- LBB0_1: ## =>This Inner Loop Header: Depth=1
- ##DEBUG_VALUE: loop_body:i <- %ebx
- ##DEBUG_VALUE: i <- %ebx
- .loc 2 24 2 prologue_end ## /Users/grahamcox/Projects/test/test/forloop.c:24:2
- xorl %eax, %eax
- movq %r14, %rdi
- movl %ebx, %esi
- callq _printf
- Ltmp1:
- .loc 2 14 27 ## /Users/grahamcox/Projects/test/test/forloop.c:14:27
- incl %ebx
- Ltmp2:
- ##DEBUG_VALUE: i <- %ebx
- .loc 2 14 20 is_stmt 0 ## /Users/grahamcox/Projects/test/test/forloop.c:14:20
- cmpl $100, %ebx
- Ltmp3:
- .loc 2 14 2 ## /Users/grahamcox/Projects/test/test/forloop.c:14:2
- jne LBB0_1
- Ltmp4:
- ## %bb.2:
- .loc 2 18 1 is_stmt 1 ## /Users/grahamcox/Projects/test/test/forloop.c:18:1
- popq %rbx
- popq %r14
- popq %rbp
- retq
- Ltmp5:
- Lfunc_end0:
- .cfi_endproc
This is assembled with optimization at max, which makes it as short and efficient as it thinks it knows how. Note, for example, that the call out to loop_body has been inlined (line 28).
Some of this is just debugging hints, such as the various ‘.loc’ lines, so they’re not part of the executable code itself. But they do give you a link to what part of the source code the assembled code is derived from.
Not all machine code is this difficult. Back in the day when the 6502 was popular, a ‘for’ loop like this could be written in 4 or 5 instructions. This was just as well when you consider 1MHz was considered ‘fast’.
- LDA #100;
- TAX;
- LOOP: JSR LOOP_BODY;
- DEX;
- CPX #0;
- BNE LOOP;
(n.b. this might not be genuine 6502 assembly code -But it follows the general idea). Hope its useful.
0 Comments