Checksum is a method or finding, and sometimes fixing, errors in files, message packets, strings, or any set of bits that has to be accurate. It's used most often in data communications where getting a bit scrambled here and there happens a lot.
You can find a lot of libraries and "algorithms" for calculating a checksum. In fact, there are a lot of different things that are all called 'checksum'. In the Microsoft world, for example, the term "hash" or "hashvalue" is used instead of checksum. In my survey of the checksum literature for this article, it seemed to me that what was missing was a basic explanation of what a checksum actually is. That's what this article is all about.
There's no direct support for Checksum in Visual Basic. That is, there's no string.checksum method or checksum(file) function even though you need a checksum a lot in various programming problems.
Why not? They've taken the trouble to provide functions and methods for a lot of things that are used much less than checksum.
I haven't been able to find any kind of 'official' answer to this question. (Microsoft usually avoids questions that include the word, "why". They focus on "how".) But my answer is that there just isn't a consistent and accepted definition of just what a 'checksum' is. There are very standardized definitions of the things that you use to create a checksum (like CRC and MD5 for example - more about these later) but checksum is more of a concept than an exact process.
So ... what's the concept?
The idea is that you calculate a single number that is the result of a calculation done over a much longer bit string. The bit string is usually something like a file or a message packet, but since everything in programming is a 'bit string' at some level, you can verify the accuracy of 'everything' this way. You could, for example, verify that no file names in a directory have changed or that a web page hasn't been altered.
It also helps to describe what checksum is not.
- It's not the sum of the bits in the bit string.
10101 sums to 3. But so does 11100 and 00111. Early (really early) checksum calculations were often simple modulo sums. But they're not anymore.
- It's not completely reliable.
A CRC32 checksum, for example, allows for about four billion numbers. So the 'best case' probability of two different bit strings giving the same checksum is one in four billion. But that's good enough for everything we're likely to need.
- It's not a way to encrypt or protect information.
Communications message packets usually include the checksum along with the packet. You could change the packet and escape detection by simply recalculating a new checksum and substituting it for the old one.
- It's not the same calculation for all checksums.
There are a lot of formulas that can be used as a checksum. It mainly depends on how reliable you want it to be versus how fast.
On the next page, we dive into exactly how a CRC checksum works!

