Forward Error Correction

By Michael Spalter
June 2021

About the author

Michael Spalter

Michael Spalter


Michael Spalter has been a networking technician for over 30 years and has been the CEO of DrayTek in the UK since the company’s formation in 1997. He has written and lectured extensively on networking topics. If you’ve an idea for a blog or a topic you’d like explored, please get in touch with us.


Error Correction - The invisible but essential digital technology


There are many systems and protocols under the Internet bonnet (hood) that we users can just take for granted.  That our data arrives with us perfectly is just one of those luxuries and it's down to error correction systems - magical automatic systems that we didn't always have.  In this article I'll explain how data integrity is preserved on the Internet, and also most other modern digital storage and communication systems.

Error Detection

Error Detection protocols & algorithms are methods to provide a reliable indication that there is an error within an object, without checking every component of whatever you're checking. An analogue physical example is that, at the end of production, the sealed box of a DrayTek router is weighed. If it's not within tolerances, something is missing or extra in the box - a manual, an antenna, the PSU etc.

In digital transfer systems - such as the Internet, detecting errors is important to maintain the integrity of the data you're receiving. Without it, web pages you read or files you download could contain errors, be unreadable, corrupted or unreadable.  You need to know that the data you've received is in the same state as when it left the sender - every byte the same, none missing, none added.


Above: Analogue dial-up with no error correction. The spurious characters are caused by line noise, or your Dad picking up another extension.

Digital Checksums

On digital systems errors can also be detected by using checksums.  A checksum, in very simple terms is a mathematical sum of the data payload which is appended to the end of the packet before it is transmitted.  When that packet is received, the receiving device recalculates the checksum based on the payload received and compares it to the received checksum.  If they match, the received data is assumed to be intact or uncorrupted.

Taking a very simple method, imagine we have 10 bytes (in hexadecimal):

1B 04 E1 05 22 76 A6 DB A0 38

If we add up all of those bytes, we get 04C6 in Hex. That would be two bytes but we only want a one byte checksum so we binary AND it with FF which gives us C6 - that's our 8 bit checksum and we now send the packet with the checksum appended:

1B 04 E1 05 22 76 A6 DB A0 38 C6

The receiving end, already knowing that every packet includes a checksum and the method in use, receives those (now) 11 bytes and calculates the checksum for the first 10 itself.  It then compares its calculation to the checksum received and can tell if the data is intact.

XMODEM

A checksum like that is a very simple method and with only one byte (8 bits) as there are only 256 possible results (00-FF) it would be quite possible for a packet with errors to coincidentally still produce the correct checksum.  However, it's better than nothing and it's the exact method which was used in "XModem" an early and highly popular file transfer protocol used by Bulletin Board users in the 1980s to correct errors introduced by noisy phone lines with analogue modems. Modems were slow enough back then and XMODEM was inefficient as after every packet was received, the receiving end had to send an ACK (acknowledgement) message back to the sender to confirm safe receipt, or request resend. With analogue modem speed and latency, this wasn't fast.  

Later protocols (such as ZMODEM) used a process called sliding windows whereby the next packet was sent regardless and if the sender subsequently received a NACK (Negative Acknowledgement - a failure) for an earlier packet, it would resend it and the receiver would slot that missed packet back in order where it belonged once re-received.  ZMODEM also improved upon the 8 bit checksum by replacing it with a 16-bit Cyclic Redundancy Check (CRC) which is far less prone to false positives.   CRCs are very widely used today to verify the integrity of binary files to confirm that there has been no corruption, interference or change in size.

All of the above are Error Detection methods - they enable a receiver to detect an error but must rely on the sender to re-send the data, hopefully intact the second time around.

Forward Error Correction

Line variations and factors such as jitter and bit errors can still affect fibre lines and can degrade signal quality. This can lead to transmission errors - data corruption.

Corrupted data on a transmission system can be fixed by the receiver detecting a mismatched checksum (which indicates corrupt data) and requesting that the transmitting device resends the data which was lost.  This is Error Detection.  It requires that a whole datagram has to be re-sent even if just for one or two corrupted bytes, slowing down the receiving speed.

Forward Error Correction (FEC), on the other hand, also detects corrupted data but it includes additional data. Enabling the receiver to detect which part of the received data is corrupted and what it should have been, and is therefore able to correct the corrupted bytes. In this way, the receiver can correct the data immediately without having to request a retransmission from the sender. 

Reed Solomon Coding

The Error Correction on GPON uses the Reed-Solomon (RS) method, developed in 1960 and named after the creators.  RS is a defacto standard used by many other technologies including CDs, DVDs, Satellite TV, DVB, Barcodes, DSL, cable modem (DOCSIS) and NAS Drives (RAID6).

An RS coded packet of 'n' bytes (symbols) will be comprised of a data payload of 'k' symbols 'n-k' parity symbols (the error check code). The receiving end (decoder) can correct up to 't' symbols where 2t = n-k or 2=(n-k)/2.   If, for example, an RS format has 239 bytes of data payload and 16 bytes of error check code (255 bytes in total), it is denoted as an RS(255,239) code for short.

With an RS(255,239) the receiver can detect AND correct up to 8 errors in any of the 239 bytes of data payload. (n-k)/2 = 8.  If more than 8 bytes are corrupted, the receiver will have to request the whole packet be re-sent by the sender.

Using FEC in this way, we're 'wasting' 16 bytes in every packet which could otherwise carry data however, having to resend the whole packet just because one or two bytes are corrupted would be even more wasteful. This is particularly so on a shared system like GPON where one user's having corrupted data would cause every other user to have to receive that same data again - bandwidth which could otherwise be used for new data. i.e. the volume of the FEC data is considered to be lower than the data that would have to be resent without FEC.  


The FEC settings will vary on different systems and media according to the assumed quality of that medium - one which is expected to suffer more noise, corruption or interference or has no mechanism to re-send, would likely have a higher level of FEC (see CDs below).

FEC on Read-Only Media

With GPON, DSL, DOCSIS and other bidirectional communication systems, if there are too many errors to correct, the receiver can resend the lost data.  With something like a CD or DVD there is no way for the data to be resent of course so methods like FEC become even more vital in order that you can still listen to your music or watch your movie. A more aggressive coding setting is used with double layer RS coding and where there is the equivalent of one parity byte for every three data bytes - that's 5 times more than used in GPON so there's a lot of 'wasted' capacity on a CD, but it means that even with a scratched CD, a dirty read head or a poorly aligned disc carriage, your CD or DVD is still playable without skips or loss of picture - up to a limit, of course.

CDs and DVDs use additional methods called Cross-Interleaved Reed–Solomon Coding (CIRC) which makes the error correction even more robust. Where an error cannot be corrected, another method called interpolation may be able to conceal the corrupted data by approximation but ultimately, if an error cannot be corrected, your CD or DVD will 'skip' and your picture or sound will be interrupted.

Shannon's Law

Shannon's Law (developed in the 1940s by Dr. Claude Shannon) defines the maximum capacity of a communications channel. In very simple terms, given all of the factors of that communications channel - be it a phone line, a GPON fibre line, a satellite link or a CD-ROM, his 'rules' govern their maximum capacity. Given Shannon's limit, one can calculate the efficiency of a communications channel and determine what gain an FEC gives to a typical example and then adjust the FEC settings to give the best 'average' advantage.  Gain is measured in dB; it's an exponential/logarithmic scale and 3dB is equivalent to a doubling in the power ratio. So, given a channel of a known typical noise ratio, the right FEC can be applied to maximise throughput. The important factor is the ratio of signal (the good stuff) and noise (the bad stuff) more commonly referred to as the signal to noise ratio (SNR). You will see the SNR shown on the diagnostic information of many systems, including DSL/GPON/DOCSIS 'modems' (I put modems in quotes because otherwise someone will argue that a GPON ONT/OLT isn't a modem - I say it is!).

Error Correction in Space

The factors that Shannon's law must consider include the communication channel's raw capacity but also all other factors which might affect the ability to make full use of it, notably noise. Noise is anything which affects a signal causing irregular fluctuations which may obscure the original signal causing loss or corruption of part of the signal. Noise can be continuous or bursty. A simple example is hearing interference on your FM radio as you pass by some object.

A radio or microwave signal beamed from a satellite will suffer from noise along its path. There is man-made (electrical and other communication systems) and natural noise (vapour, oxygen, clouds, sun, atmospheric gates etc.) within the atmosphere but all matter, on earth and extra-terrestrial emits some noise including cosmic background noise in space.

With a suitably strong signal over a short distance, this wouldn't be a problem, but the huge distances in projects such as NASA's Voyager space programme (11-15 billion miles or 150 astronomical units) weaken the signal and introduce a lot of noise. In applications such as that, Error Detection and Correction becomes vital and Reed Solomon was therefore adopted. In deep space, noise tends to be Gaussian (continuous pseudo-random) rather than irregular bursts of noise as one might more commonly find terrestrially. Ordinarily RS is less effective on Gaussian noise, however, Voyager uses a combination double-layer of RS and other protocols to make the error correction most effective.

Of course, it's not just distant projects like Voyager where error correction is vital. The thousands of geostationary communications satellites around the Earth are subject to the same interference. A satellite has to be self-sustaining (no power sockets in space!) so they rely on solar panels. One has to balance the power a solar panel can produce, with its weight and the minimum viable power that will make its services usable. It may be more efficient to correct a weaker signal on Earth than use more power in space.  Note: Some spacecraft have alternative power methods, such as pure battery or radioisotope power systems but for geostationary satellites, there's enough sun for solar power to keep batteries charged.


I hope you've found this article interesting - I vary between writing about issues and technology so let us know what you think in the comments below or directly by email. - please do share a link on social and business networks / media.


Tags

Checksum
XMODEM
YMODEM
ZMODEM
CRC
Forward Error Correction
Error Detection
Lempel Ziff
Shannon's Law
Reed Solomon
FEC