MB vs MiB

13-Sep-2024 V.Wheeler

I've been intensely studying Computer Science and Software Development since January 1980 and continue to do so to this day: 1-4+ hours a day—a necessity for programmers amid quickly emerging new technologies. I have been working in the computing industry since early 1984 and have been programming professionally since 1992. I am perhaps unusual in that I am fond of working close to the CPU and close to the operating system. While I know many high-level languages and am expert in several, I have also studied 7 Assembly Languages, programmed in 5 of them, and written production software (usually for speed or other enhancements) in 2 of them that are currently out in the field: 6502, Z-80, 8080, 80x86, Harvard Instruction Set (used by 16-bit PIC MCUs), MIPS32 Instruction Set (used by 32-bit PIC MCUs and many others), and ARM Instruction Set (used by a wide range of ARM-based MCUs). That said, I can safely say that for many years I have lived and breathed bits and bytes, and am an expert in both Binary Math and Boolean Algebra, because one has to be to be successful programming close to the CPU.

That said, I just read Jannson Miller's nicely-written article entitled The Mighty Battle: MIB vs. MB[1] and I have a few things to say about it. To Jannson, the author of that article, you might feel that I'm kind of hard on you here, but both you and others need to understand the below concepts clearly, so that communication about digital storage (RAM, disk, and other) can be clear and precise when such precision matters. On the other side of that coin, communication about Internet service speeds can (apparently) afford to be ambiguous, not that I like it, but that is what I am seeing.[5]

Note that while I fully agree with using the term "mebibyte" (MiB), and "gibibyte" (GiB) whenever disambiguation is needed, in practice, this is rare and would typically involve an audience that is at least, in part, not from the computer industry, because the terms "MB", "GB", etc. already have precise and unambiguous definitions.

Background Information

The scope of this article does not extend to teaching the reader about binary math. To those that need that background information, there are many great resources for doing so.

The following information symbols and terms are used below in this article. It is important that the reader know what these mean, since they are important to understanding the content of the article:

The symbol "<<" (from C and other programming languages) in this context I use to mean a left bit shift by the number of bits to the right of the symbol. So the expression (1 << 1) evaluates to 2, (1 << 2) = 4, (1 << 3) = 8, (1 << 8) = 256, (1 << 10) = 1024, and so on.

The numeric prefix 0b... indicates the digits that follow use the binary (base 2) number system.

The numeric representation 0x100 (from C and other programming languages) is a hexadecimal value with prefix "0x" to clarify that it is a base-16 value. Hexadecimal (or "hex") is a 4-to-1 "compressed" representation of binary values. Example: binary 0b0100 0010 = 0x42. With practice, programmers can read and interpret hex values nearly as quickly as they can decimal values. Hexadecimal is used instead of decimal when the bit pattern matters, since one can "envision" the bit values by looking at the hexadecimal digits: 0x01 = 0b0000 0001, 0x0123 = 0b0000 0001 0010 0011, and so on.

All other numeric values below (when not prefixed) are decimal (base 10) values.

The term "bus" below is an electronics term that refers to a data-flow path where there is more than wire that carries a "unit" of data simultaneously. On printed circuit boards (PCBs), wires are called "traces", and when busses are present on the PCB, they are often visible as 16, 32 or more parallel traces.

It is also important to understand that many words (and prefixes) in the English language have more than one meaning, and when this is the case, it is the writer or speaker's responsibility to be clear about which meaning is intended, by use of context (surrounding words) or other means. And if the reader or listener isn't certain of what they mean, and knows he or she needs to understand them precisely, it is his or her responsibility to clear them up, typically by using a good dictionary. These different meanings are found in the different numbered definitions in dictionaries. Example: "I ran to my mailbox to get my mail" vs "I ran my company in an easy-going manner" vs "that engine tends to run hot" vs "that engine runs at 2200 RPM at 55 mph" vs "does that engine run?". In all five cases, one can go to a dictionary, and find the "sense" that applies (i.e. which numbered definition), and see that these 5 meanings are not only quite different, but are correct to use them that way and their meanings are made clear by the context of their usage.

Jannson's Story

While I consider Jannson Miller a respectable writer, his MB vs MiB article[1] is misleading on several points, as follows.

In Jannson's article, his opening sentence states, "In the world of digital storage, the terms 'mebibyte' (MiB) and 'megabyte' (MB) are often used interchangeably." While these two terms could be used interchangeably since they mean the same thing (1024² bytes), I have never seen computer-storage manufacturers (disk, RAM, other) use it at all. To prove this, walk into any Best Buy store and look at the specifications on disk and RAM boxes. The term is just not there: nowhere on the box; nowhere in the specifications. And in my long career in the computer industry, from what I have seen, the use of the term "MiB" has been very rare, which makes sense, because it appears to only exist as a disambiguation tool.

In the opening sentence of the first subsection of his article, Jannson states, "The megabyte is a term commonly using [sic] in the field of computing and digital storage. It is a unit of measurement that represents a large amount of data. In the International System of Units (SI), one megabyte is equal to 1,000,000 bytes, where a byte consists of 8 bits." This isn't true at all for computer industry professionals. "MB" had a clear and precise definition (1024² bytes) long before IEC came up with the term "MiB". See below.

In his second subsection, Jannson states, "The mebibyte was introduced by the International Electrotechnical Commission (IEC) in 1998 to clarify issues arising from the use of decimal-based prefixes for binary-based measurements. One mebibyte is equal to 1,048,576 bytes, where a byte consists of 8 bits." This is valid, but is, in fact, only used when clarification is needed — when it would otherwise be ambiguous based on target audience or context. I suspect that is because IEC introduced this term to disambiguate between the two meanings of "MB" outside the computer industry, Jannson appears to have mistaken this to mean that the term "mebibyte" is now the "official new term" to mean 1024² bytes, which it is not. There are certain contexts in which "MB" is indeed ambiguous (thus, the motive for clarification), where the use of "MiB" is perfectly legitimate and helpful. But IEC doesn't dictate computer industry terms: this was only a suggested use to resolved ambiguity in specific contexts. Again, exercise the proofs I mention above and below to see for oneself.

In the article's comparison table, Jannson states that "MB" "Equal to 1,000,000 bytes". In fact, "MB" has always meant 1024 x 1024 (1,048,576) bytes in the computer industry and has never meant anything else. On the other hand, for "mega-anything" not in the compute industry — the meaning of the prefix "mega-" is taken from the International System of Units (or "SI units")[4], and thus means 1,000,000 of whatever one is referring to. The use of the prefix "mega-" in the context of computer storage simply has a different meaning, brought into official usage by 1964 by computer-industry professionals and enforced by the nature of digital electronics. More on this below.

In Jannson's article, he includes data transfer rates are part of the context where "MB" is ambiguous. And this is a topic on which I am in partial agreement with Jannson. See below.

The Real Story

The real story about the terms "MB" and "MiB" is that since 1964 the following terms were in common use when referring to RAM and disk storage sizes. In fact, the date was probably earlier, but we know for certain that they were in common use by 1964 because they are found in the IBM 360 documentation according to this Wikipedia article titled Binary Prefix, section titled "History"[2].

According to Donald Knuth[3], circa 1968 a "byte" was the size of the smallest "unit of data" used by any particular CPU (in its registers, and supported by its instruction set), and since 1975 to present day, it has come to always mean a unit of data comprised of 8 bits. This is reinforced (or perhaps enforced) by the fact that CPUs from then until present day have registers and instruction sets that support handling bytes as 8-bit units of data, and are so specified in their documentation.
A Kilobyte (KB) — despite its departure from the International System of Units in which the prefix "kilo-" means "X 10³"—has always meant 1024 bytes (0x400, or 1 << 10, or 2¹⁰) due to the nature of how CPUs handle such values internally. To my knowledge, there was never any need nor tendency for it to mean anything else, and if a person thought so, it simply gave away that the person had more to learn (i.e. he or she certainly did not master Binary Math, nor Boolean Algebra, nor digital electronics, nor the electronic interface between CPU and RAM). RAM, by the nature of how the addressing electronics works, has always been manufactured and sold in "units" based on the Binary number system. The limitations of the address bus enforce this. (Magnetic, spinning-platter-based disk drives did not have this restriction, since their capacity is based on how storage surface area was divided up. But with solid-state storage, we are back to the same restrictions as RAM has, and as far as I can tell, will always be sold in capacities based on the Binary number system for this same reason.)
A Megabyte (MB) has always meant 1024² bytes (or 1 << 20, or 2²⁰ or or 0x100000 or 1,048,576).
A Gigabyte (GB) has always meant 1024³ bytes (or 1 << 30, or 2³⁰ or 0x40000000 or 1,073,741,824)
A Terabyte (TB) has always meant 1024⁴ bytes (or 1 << 40, or 2⁴⁰ or 0x10000000000 or 1,099,511,627,776)
And so on.

It has always been this way in the field of digital storage because CPUs (and other digital electronics) deal in those quantities in a way that is extremely convenient to their internal electronics. The actual restriction in digital electronics exists because one "wire" [a "trace" on a printed circuit board] can only carry the value of 1 binary bit. If you extrapolate that to an address bus where a memory address is conveyed and assert that:

each address points to 1 byte
all incoming addressing combinations are valid [represented by 16, 32, 48, 64 or more address "wires"],

that forces the internal storage itself to support ONLY a number of bytes that can be represented with a base-2 (binary) number that ends with a large number of binary 0's as shown above.

The fact that these terms use prefixes that are also used by the International System of Units[4], neither makes them incorrect nor ambiguous! Quite simply, these prefixes have 2 meanings, and which one is intended by the writer or speaker depends on the context in which they are used. It is clear that it occurs that people outside the computer industry might think these prefixes (kilo-, mega-, giga-, tera-, etc.) used in reference to computer storage, have the same meaning as they do in SI units[4], perhaps because that is all they have been exposed to (I speculate), so it is not surprising for them to confuse their meanings. Lacking full understanding of the meaning of those prefixes inside the computer industry, and/or not being fluent with those terms used with those meanings, it is easy to understand why they might be confused by some people. Further, I'd like to point out that while these prefixes are associated with the term "byte", it places the context within the computer industry, and in that context, they have never meant anything other than powers of 1024.

As regards ambiguity, the source of the term is important. On one hand, if it came from a retail box containing RAM or a solid-state storage drive (or other type of computer storage besides spinning-platter-based disk drives), the term is most certainly in alignment with the above (base-2 meanings), since the writer or speaker is squarely inside the computer industry, and, as stated earlier, in that context the terms have never meant anything else. On the other hand, if the writer or speaker comes from outside the computer industry, one might have to clarify with them to find out what they actually mean — whether they are using SI units[4], or computer industry units.

In the sphere of data transfer rates, for RAM and disk drive data transfer rates, the units used in such specifications have always refer to base-2 units where a "MBps" always means megabytes (1024² bytes) per second, and "GBps" always means gigabytes (1024³ bytes) per second.

However, for Internet service speed specifications (and other long-distance transmission speeds), the measurement unit most commonly used is in "bits per second" or "Mbps" (megabits—note the lower-case "b") per second, and this is where the industry does suffer from ambiguity, and on that point I agree with Jannson that there is a need for better clarity. Some Internet Service Providers allow this ambiguity by not being clear about it, and those that are clear about it appear to be about 50:50 between those favoring the base-2 meaning (where an Mbit = 1024² bits) vs those favoring the base-10 meaning (using SI units[4]). I suspect this ambiguity is allowed to continue because their customer base is largely comprised of people who are not from the computer industry. Further, there is certainly ambiguity generated by articles that write about the meaning of "Mbps"[5], especially when the author is not from deep within the computer industry. Such confusions are understandable, but not correct. I further suspect that Internet Service Providers allow this ambiguity because it is "close enough": whether the reader or listener interprets the rates as using the base-2 or base-10 meaning of "Mbps", he or she can only be off by at most 2.3%. When it comes to selling an Internet Service, I suspect the weight of giving the reader just enough understanding to say "yes" to a sale, probably outweighs the risk of losing that potential sale by spending the time to be very precise about it when, in that context, it doesn't really matter. Especially when the bit-rate they actually deliver is commonly well above what they promise. Food for thought.

RAM and Disk Storage Proof

If one wants to prove that the terms "mebibyte", "MiB", "gigibyte" and "GiB" are simply not in common use, just walk into any Best Buy store and pick up a disk drive or RAM box and look at the specifications. Better yet, buy it and take it home and prove what it means on one's own computer. Say, you buy and install 2 x 16-GB sticks of RAM. If you can, write a program in Kernel mode to prove that its first and last valid addresses are exactly ((1024³ x 32) - 1) bytes apart. (If one subtracts the first valid RAM address from the address of 1 byte past the last valid RAM address, the math is simpler (1024³ x 32), but if your program attempted to access 1 byte past the last valid address, the CPU would generate an exception (due to the invalid address), and crash your program unless you had prepared for such an exception.) If you can't write such a program, acquire RAM-testing software which shows the addresses it is testing. You will see the above pans out precisely as I have defined above.

What I DO see is that for spinning-platter-based hard drives (where the above base-2 restrictions do not apply, since storage capacities depend on how the storage surface is divided up -- not on wires [traces] carrying bit values on an address bus) is that the manufacturer will use "GB¹" or "TB¹" units (note the footnote) and then when you go and read the footnote, they explain what they actually mean and this explanation CAN BE ambiguous, and sometimes even uses SI units[4], which annoys me because they are "blurring" the meaning of the terms "GB" and "TB" (and thus "MB" and "KB"). This (I consider this unethical) practice probably confuses non-computer-industry people into thinking that that is what "GB" and "TB" actually mean. Sadly, this (unethical) practice has been being done for a long time with platter-based hard drives. But if you then look at solid-state drives (that are restricted by the base-2 restrictions already mentioned), you no longer see this blurring of meaning, and MB, GB and TB all mean powers of 1024 as they should.

One other thing to know about mass-storage devices (platter-based or not) is that the operating system has to write information onto them that helps it keep track of what files exist on a drive and what "space" each file occupies, and it helps a great deal to "pre-allocate" some of that tracking space in advance so that it is placed together in one big "block" of storage on the drive, which makes it more efficient to access later. That action, all by itself, even before any files are stored on the drive, uses some of the drive's storage capacity. And so the amount of storage the end user will be able to place on that drive in files is less than the drive's total capacity. This can be compared to keeping hundreds (or thousands) of different documents file cabinet. To help the end user be able to find things afterwards, related documents are placed in folders, and related (broad) subjects are placed in different drawers. This folder and drawer organization, by necessity, occupies part of the space in each drawer and file cabinet. Due to that fact, the total capacity of each file cabinet for documents is somewhat less than if the documents had simply been stacked in all drawers, one on top of another.

Proof On Your Own Computer

One can prove right on one's own computer that in the computer industry, the terms MB and GB always have the base-2 meaning (i.e. powers of 1024). For example, on my Windows 10 x64 machine, the "System Information" program displays the following. (These are screen shots from my own computer.)

and the number of addressable bytes in this RAM is 2³⁰ x 32, or 1024³ x 32, or 34,359,738,368 bytes. The same is true for RAM available in microcontrollers, and they are precise about the meaning in their documentation. See this example.

Right clicking three different files on my disk drive and selecting "Properties" showed these reports:

;

, and

And if one does the math:

7926 / 1024 = 7.740234375, truncated to 2 digits after the decimal point: 7.74 KB.
1745082 / 1024 = 1704.181640625, / 1024 again = 1.6642398..., truncated to 2 digits after the decimal point: 1.66 MB.
10617606144 / 1024 = 10368756, / 1024 again = 10125.73828125, / 1024 again = 9.888416..., truncated to 2 digits after the decimal point: 9.88 GB.

Conclusion

The above is the real story. "MB" means 1024² bytes, and the above is clear proof. The terms "mebibyte" and "gigibyte" are so new and are so rarely used that they are not even in most computer spelling-correction dictionaries -- including text editors designed for programmers. Big dictionary manufacturers like Merriam-Webster and Oxford English Dictionary don't even list the terms.

To Jannson, I say: while I fully believe your heart is in the right place, wanting to share your knowledge with people, I think you need to revise your article so that it is no longer misleading. Clarity of meaning of the terms "MB" and "GB" and so forth can be available to all by using the correct terms in the right contexts so that their meanings are clear, no matter what industry you are in, or who your audience is. I fully agree with using the term "mebibyte" (MiB), and "gibibyte" (GiB) whenever needed to avoid ambiguity. But I suggest that would only need to be done for an audience that is, at least in part, not comprised of computer industry people, or wherever else the terms "MB" and "GB" might be ambiguous. However, if one is writing or speaking to an audience that is exclusively computer industry people, there is no need to use those terms at all, since there is no ambiguity. The terms "MB", "GB", etc. had precise meanings long before IEC decided to propose a clarification for certain contexts. And they still do.

References

[1] Jannson Miller's article

[2] Wikipedia article on Binary Prefix, "History" section

[3] Donald Knuth

[4] International System of Units

[5] Articles that are ambiguous about the meaning of "Mbps". Note that only one article below is actually ambiguous about it, and one doesn't specify its meaning at all. But the fact that they conflict causes the meaning to be ambiguous to readers who are trying to discover the meaning of "Mbps" as used by their Internet Service Provider. So it is probably advisable to consult your Internet Service Provider about what they actually mean when they use the term "Mbps".

Articles that favor "Mbps" meaning 1024² bits per second:

https://www.highspeedinternet.com/resources/megabits-vs-megabytes-and-why-it-matters

https://www.xfinity.com/hub/internet/internet-speed

Articles that favor "Mbps" meaning 1,000,000 (1000²) bits per second:

https://www.cabletv.com/internet/what-is-mbps#What_does_Mbps_mean

https://www.allconnect.com/blog/mbps-vs-mbps

https://techterms.com/definition/mbps

Articles that are in fact ambiguous about "Mbps" meaning:

https://broadbandnow.com/guides/what-are-mbps-gbps-mb-gb

Articles that use the term "Mbps", but don't specify what it means at all:

https://corporate.comcast.com/press/releases/comcast-boosts-speeds-xfinity-internet-customers