Hamming Distance (Big Data Analytics / Machine Learning)



The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. 

In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. 

A major application is in coding theory, more specifically to block codes, in which the equal-length strings are vectors over a finite field.

OR

Hamming distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different.

The Hamming distance between two strings, a and b is denoted as d(a,b).

It is used for error detection or error correction when data is transmitted over computer networks. It is also used in coding theory for comparing equal length data words.

Fig: Example of Hamming Distance





Calculation of Hamming Distance

In order to calculate the Hamming distance between two strings, and , we perform their XOR operation, (a⊕ b), and then count the total number of 1s in the resultant string.

Example 

Suppose there are two strings 1101 1001 and 1001 1101.

11011001 ⊕ 10011101 = 01000100. Since, this contains two 1s, the Hamming distance, d(11011001, 10011101) = 2.


Minimum Hamming Distance

In a set of strings of equal lengths, the minimum Hamming distance is the smallest Hamming distance between all possible pairs of strings in that set.

Example 1]

Suppose there are four strings 010, 011, 101 and 111.

010 ⊕ 011 = 001, d(010, 011) = 1.

010 ⊕ 101 = 111, d(010, 101) = 3.

010 ⊕ 111 = 101, d(010, 111) = 2.

011 ⊕ 101 = 110, d(011, 101) = 2.

011 ⊕ 111 = 100, d(011, 111) = 1.

101 ⊕ 111 = 010, d(011, 111) = 1.

Hence, the Minimum Hamming Distance, dmin = 1.






Example 2]  

Example 3] The symbols may be letters, bits, or decimal digits, among other possibilities. For example, the Hamming distance between:

  • "karolin" and "kathrin" is 3.

  • "karolin" and "kerstin" is 3.

  • "kathrin" and "kerstin" is 4.

  • 1011101 and 1001001 is 2.

2173896 and 2233796 is 3.

Example 4]

No comments:

ads
Powered by Blogger.