Main reasons:
- Architecture of a computer processor is such a way that it can read 1 word from memory at a time.
- 1 word is equal to 4 bytes for 32 bit processor and 8 bytes for 64 bit processor. So, 32 bit processor always reads 4 bytes at a time and 64 bit processor always reads 8 bytes at a time.
- This concept is very useful to increase the processor speed.
- In order to align the data in memory, one or more empty bytes (addresses) are inserted (or left empty) between memory addresses which are allocated for other structure members while memory allocation. This concept is called structure padding.
How to avoid structure padding in C?
- #pragma pack ( 1 ) directive can be used for arranging memory for structure members very next to the end of other structure members.
Source:
You can see exmaple program for the same at:
http://fresh2refresh.com/c/c-structure-padding/
Processors need for structure padding to improve throughput:
Thanks for joshperry who wrote answer in stack overflow:
http://stackoverflow.com/questions/381244/purpose-of-memory-alignment
http://www.ibm.com/developerworks/library/pa-dalign/
Processors need for structure padding to improve throughput:
Thanks for joshperry who wrote answer in stack overflow:
The memory subsystem on a modern processor is restricted to accessing
memory at the granularity and alignment of it's word size; this is the
case for a number of reasons.
SPEED
Modern processors have multiple levels of cache memory that data must be pulled through, supporting a single byte read would severely limit the memory throughput of the processor (think PIO mode for hard drives)
The CPU ALWAYS reads at it's word size (4-bytes on a 32-bit processor), so when you do an unaligned address access --on a processor that supports it-- the processor is going to read multiple words. The CPU will read each word of memory that your requested address straddles. This causes an amplification of up to 2x the number of memory transactions required to access the requested data.
Because of this, it can very easily be slower to read two bytes than four. For example, say you have a struct in memory that looks like this:
The processor can read each of these members in one transaction.
Say you had a packed version of the struct, maybe from the network where it was packed for transmission efficiency; it might look something like this:
Reading the first byte is going to be the same.
When you ask the processor to give you 16 bits from 0x0005 it will have to read a word from from 0x0004 and shift left 1 byte to place it in a 16-bit register; some extra work, but most can handle that in one cycle.
When you ask for 32-bits from 0x0001 you'll get a 2x amplification. The processor will read from 0x0000 into the result register and shift left 1 byte; then read again from 0x0004 into a temporary register, shift right 3 bytes, then
RANGE
For any given address space, if the architecture can assume that the 2 LSB are always 0 then it can access 4 times more memory, or the same amount of memory with two bits for something like flags. Taking the two LSB off of an address would give you a 4 byte alignment, or stride, as each time the address is incremented it is effectively incrementing bit 2, not 0. This is sometimes referred to as a "stride" of 4 bytes.
This can even affect the physical design of the system, if the address bus needs 2 less bits there can be 2 less pins on the CPU and two less traces on the circuit board.
Atomicity
The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrent computing paradigms.
Conclude
The memory system of a processor is quite a bit more complex and involved than described here; if you'd like a more detailed discussion on how an x86 processor actually addresses memory take a look at this article.
There are many more benefits to adhering to memory alignment that you can read at this IBM article.
Another alignment-for-performance that I alluded to previously is alignment on cache lines which are (for example) 64K, but that's a topic for another question!
SPEED
Modern processors have multiple levels of cache memory that data must be pulled through, supporting a single byte read would severely limit the memory throughput of the processor (think PIO mode for hard drives)
The CPU ALWAYS reads at it's word size (4-bytes on a 32-bit processor), so when you do an unaligned address access --on a processor that supports it-- the processor is going to read multiple words. The CPU will read each word of memory that your requested address straddles. This causes an amplification of up to 2x the number of memory transactions required to access the requested data.
Because of this, it can very easily be slower to read two bytes than four. For example, say you have a struct in memory that looks like this:
struct mystruct {
char c; // one byte
int i; // four bytes
short s; // two bytes
}
On a 32-bit processor it would most likely be aligned like shown here:The processor can read each of these members in one transaction.
Say you had a packed version of the struct, maybe from the network where it was packed for transmission efficiency; it might look something like this:
Reading the first byte is going to be the same.
When you ask the processor to give you 16 bits from 0x0005 it will have to read a word from from 0x0004 and shift left 1 byte to place it in a 16-bit register; some extra work, but most can handle that in one cycle.
When you ask for 32-bits from 0x0001 you'll get a 2x amplification. The processor will read from 0x0000 into the result register and shift left 1 byte; then read again from 0x0004 into a temporary register, shift right 3 bytes, then
OR
it with the result register.RANGE
For any given address space, if the architecture can assume that the 2 LSB are always 0 then it can access 4 times more memory, or the same amount of memory with two bits for something like flags. Taking the two LSB off of an address would give you a 4 byte alignment, or stride, as each time the address is incremented it is effectively incrementing bit 2, not 0. This is sometimes referred to as a "stride" of 4 bytes.
This can even affect the physical design of the system, if the address bus needs 2 less bits there can be 2 less pins on the CPU and two less traces on the circuit board.
Atomicity
The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrent computing paradigms.
Conclude
The memory system of a processor is quite a bit more complex and involved than described here; if you'd like a more detailed discussion on how an x86 processor actually addresses memory take a look at this article.
There are many more benefits to adhering to memory alignment that you can read at this IBM article.
Another alignment-for-performance that I alluded to previously is alignment on cache lines which are (for example) 64K, but that's a topic for another question!
http://stackoverflow.com/questions/381244/purpose-of-memory-alignment
http://www.ibm.com/developerworks/library/pa-dalign/
No comments:
Post a Comment