When a loop is unrolled, each iteration of the loop is replicated in
hardware and executes simultaneously if the iterations are independent. Unrolling loops trades
an increase in FPGA area use for a reduction in the latency of your component.
Consider the following basic loop with three stages and three iterations. Each
stage represents the operations that occur in the loop within one clock cycle.
Figure 31. Basic loop with three stages and three iterations

If each stage of this loop takes one clock cycle to execute, then this loop has a
latency of nine cycles.The following figure shows the loop from Basic loop with
three stages and three iterations unrolled three times.
Figure 32. Unrolled loop with three stages and three iterations

Three iterations of the loop can now be completed in only three clock cycles, but three
times as many hardware resources are required. You can control how the compiler unrolls a loop with the #pragma unroll directive, but this directive works only if the compiler knows the trip count for the loop in advance or if you specify the unroll factor. In addition to replicating the hardware, the compiler also reschedules the circuit such that each operation runs as soon as the inputs for the operation are ready.
For an example of using the #pragma unroll directive, see the best_practices/resource_sharing_filter tutorial.