Consistent Overhead Byte Stuffing: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 12:16, 17 July 2017 edit 5.199.181.105 (talk) →Encoding examples: Removing additional 01 byte in example 6: not covered by algorithm as described; following the latter, the byte represents an additional trailing 0 in the message, falsifying the msg and expanding it illegally to 255 bytes. ← Previous edit		Latest revision as of 19:35, 7 September 2024 edit undo Angeld23 (talk \| contribs) 37 edits mNo edit summary Tag: Visual edit
(43 intermediate revisions by 27 users not shown)
Line 1: {{Short description\|Algorithm for encoding data bytes}} ~~{{one source\|date=December 2010}}~~ '''Consistent Overhead Byte Stuffing''' ('''COBS''') is an [[algorithm]] for encoding data bytes that results in efficient, reliable, unambiguous [[Packet (information technology)#Packet framing\|packet framing]] regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. It employs a particular byte value, typically zero, to serve as a ''packet [[delimiter]]'' (a special value that indicates the boundary between packets). When zero is used as a delimiter, the algorithm replaces each zero data byte with a non-zero value so that no zero data bytes will appear in the packet and thus be misinterpreted as packet boundaries. '''Byte stuffing''' is a process that transforms a sequence of data bytes that may contain 'illegal' or 'reserved' values (such as packet delimiter) into a potentially longer sequence that contains no occurrences of those values. The extra length of the transformed sequence is typically referred to as the [[Overhead (computing)\|overhead]] of the algorithm. [[High-Level_Data_Link_Control#Asynchronous_framing\|HDLC framing]] is a well-known example, used particularly in [[Point-to-point protocol\|PPP]] (see [https://datatracker.ietf.org/doc/html/rfc1662#section-4.2 RFC 1662 § 4.2]). Although HDLC framing has an overhead of <1% in the ''average'' case, it suffers from a very poor ''worst''-case overhead of 100%; for inputs that consist entirely of bytes that require escaping, HDLC byte stuffing will double the size of the input. '''Consistent Overhead Byte Stuffing''' ('''COBS''') is an [[algorithm]] for encoding data bytes that results in efficient, reliable, unambiguous [[Packet (information technology)#Packet framing\|packet framing]] regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. It employs a particular byte value, typically zero, to serve as a ''packet [[delimiter]]'' (a special value that indicates the boundary between packets). When zero is used as a delimiter, the algorithm replaces each zero data byte with a non-zero value so that no zero data bytes will appear in the packet and thus be misinterpreted as packet boundaries. The value substituted for each zero data byte is equal to one plus the number of non-zero data bytes that follow. ~~'''Byte~~The ~~stuffing'''~~COBS isalgorithm, aon ~~process~~the ~~that~~other ~~transforms~~hand, atightly ~~sequence~~bounds ofthe ~~data~~worst-case ~~bytes~~overhead. ~~that~~COBS ~~may~~requires ~~contain~~a ~~'illegal'~~minimum orof ~~'reserved'~~1 ~~values~~byte ~~(such~~overhead, ~~as packet delimiter) into~~and a ~~potentially~~maximum ~~longer~~of ~~sequence~~{{ceil\|''n''/254}} ~~that~~bytes ~~contains~~for no''n'' ~~occurrences~~data ofbytes ~~those~~(one ~~values.~~byte ~~The~~in ~~extra~~254, ~~length~~rounded ofup). Consequently, the ~~transformed sequence is typically referred~~time to astransmit the ~~overhead~~encoded ofbyte ~~the~~sequence ~~algorithm.~~is ~~The~~highly ~~COBS~~predictable, ~~algorithm~~which ~~tightly~~makes ~~bounds~~COBS ~~the~~useful for ~~worst~~real-~~case~~time ~~overhead,~~applications ~~limiting~~in itwhich tojitter nomay ~~more~~be ~~than one byte in 254~~problematic. The algorithm is computationally inexpensive, and in addition to its desirable ''worst''-case overhead, its ''average'' overhead is also low compared to other unambiguous framing algorithms like HDLC.<ref> ~~{{Cite web~~ {{Cite journal ~~\| last = Cheshire \| first = Stuart~~ \| last1 = Cheshire \| first1 = Stuart \| last2 = Baker \| first2 = Mary \| authorlink = Stuart Cheshire \| title = Consistent Overhead Byte Stuffing \| ~~publisher~~journal = [[IEEE/ACM Transactions on Networking]] \| volume = 7 \| pages = 159–172 \| number = 2 \| ~~year~~date = April 1999 ~~\| format=PDF~~ \| doi = 10.1109/90.769765 \| url = http://www.stuartcheshire.org/papers/COBSforToN.pdf \| ~~accessdate~~access-date = November 30, 2015 }}\| citeseerx = 10.1.1.108.3143 \| s2cid = 47267776 }} </ref><ref> {{Cite ~~web~~conference \| ~~last~~last1 = Cheshire \| ~~first~~first1 = Stuart \| last2 = Baker \| first2 = Mary \| authorlink = Stuart Cheshire \| title = Consistent Overhead Byte Stuffing ~~\| publisher = [[Association for Computing Machinery\|ACM]]~~ \| conference = [[Association for Computing Machinery\|ACM]] [[SIGCOMM]] '97 \| url = http://conferences.sigcomm.org/sigcomm/1997/papers/p062.pdf \| ~~accessdate~~date = 17 November ~~23, 2010 }}~~1997 \| location = [[Cannes]] \| access-date = November 23, 2010 }} </ref> COBS does, however, require up to 254 bytes of ''lookahead''. Before transmitting its first byte, it needs to know the position of the first zero byte (if any) in the following 254 bytes. A 1999 [[Internet Draft]] proposed to standardize COBS as an alternative for HDLC framing in [[Point-to-point protocol\|PPP]], due to the aforementioned poor worst-case overhead of HDLC framing.<ref name="ietf-draft">{{cite ietf\|draft=draft-ietf-pppext-cobs-00.txt\|title=PPP Consistent Overhead Byte Stuffing (COBS)\|last1=Carlson\|first1=James\|last2=Cheshire\|first2=Stuart\|authorlink2=Stuart Cheshire\|last3=Baker\|first3=Mary\|date=November 1997}}</ref> ==Packet framing and stuffing== When packetized data is sent over any serial medium, asome [[communications protocol\|protocol]] is ~~needed by which~~required to demarcate packet boundaries. This is done by using a framing marker, ~~which is~~ a special bit-sequence or character value that indicates where the boundaries between packets fall. Data stuffing is the process that transforms the packet data before transmission to eliminate all occurrences of the framing marker, so that when the receiver detects a marker, it can be certain that the marker indicates a boundary between packets. COBS transforms aan ~~data~~arbitrary ~~set~~string of ~~up to 254~~ bytes in the range [0,255] into bytes in the range [1,255]. Having eliminated all zero bytes from the data, a zero byte can now be used to unambiguously mark the end of the transformed data. This is done by appending a zero byte to the transformed data, ~~and~~ thus forming a packet consisting of the COBS-encoded data (the ''[[payload (computing)\|payload]]'') ~~and~~to ~~the~~unambiguously ~~zero~~mark ~~byte,~~the end- of~~-packet~~ ~~marker~~the packet. (Any other byte value may be reserved as the packet delimiter, but using zero simplifies the description.) The overhead of COBS encoding is constant regardless of data content. Every data set is encoded with an overhead of exactly one byte, so that N bytes are always transformed into exactly N+1 encoded bytes. The overhead byte, which equals one plus the number of non-zero bytes that follow, appears at the beginning of the encoded data. Note that the overhead byte is not a transformed data byte; it is an additional byte that precedes the transformed data bytes. [[File:Cobs encoding with example.png\|center\|frameless\|800px\|Consistent Overhead Byte Stuffing (COBS) encoding process]] In the case of byte streams, or fixed-size data sets larger than 254 bytes, COBS requires data to be encoded a section at a time, such that no section exceeds 254 bytes in size. The unambiguous zero byte packet delimiter allows a receiver to synchronize reliably with the beginning of the next packet, even after an error. It also allows new listeners, which might join a broadcast stream at any time, to reliably detect the beginning of the first complete packet in the received byte stream. There are two equivalent ways to describe the COBS encoding process: ; Prefixed block description : To encode some bytes, first append a zero byte, then break them into groups of either 254 non-zero bytes, or 0–253 non-zero bytes followed by a zero byte. Because of the appended zero byte, this is always possible. : Encode each group by deleting the trailing zero byte (if any) and prepending the number of non-zero bytes, plus one. Thus, each encoded group is the same size as the original, except that 254 non-zero bytes are encoded into 255 bytes by prepending a byte of 255. : As a special exception, if a packet ends with a group of 254 non-zero bytes, it is not necessary to add the trailing zero byte. This saves one byte in some situations. ; Linked list description : First, insert a zero byte at the beginning of the packet, and after every run of 254 non-zero bytes. This encoding is obviously reversible. It is not necessary to insert a zero byte at the end of the packet if it happens to end with exactly 254 non-zero bytes. : Second, replace each zero byte with the offset to the next zero byte, or the end of the packet. Because of the extra zeros added in the first step, each offset is guaranteed to be at most 255. ==Encoding examples== These examples show how various data sequences would be encoded by the COBS algorithm. In the examples, all bytes are expressed as [[hexadecimal]] values, and encoded data is shown with text formatting to illustrate various features: * '''Bold''' indicates a data byte which has not been altered by encoding. All non-zero data bytes remain unaltered. * An {{red\|overhead byte}} appears at the beginning of every encoded packet. This byte does not correspond to a data byte; it is an additional byte that is prepended to the encoded output, with a value equal to one plus the number of non-zero bytes that follow. * {{green\|Green}} indicates a zero data byte that was altered by encoding. All zero data bytes are replaced during encoding by the offset to the following zero byte (i.e. one plus the number of non-zero bytes that follow). It is effectively a pointer to the next packet byte that requires interpretation: if the addressed byte is non-zero then it is the following {{green\|group header byte zero data byte}} that points to the next byte requiring interpretation; if the addressed byte is zero then it is the {{blue\|end of packet}}. ** This {{red\|overhead byte}} essentially points to the next zero byte symbol location. If the next zero byte symbol location binary value is not zero (Highlighted {{green\|Green}}) then the symbol is interpreted as a encoded/altered 'zero' data byte pointing to the next zero symbol location. If the next zero byte symbol has the binary value of zero, then it is interpreted as a {{blue\|zero byte}} end of packet symbol. (This is somewhat a similar concept to linked list data structure) * {{red\|Red}} is an overhead byte which is also a group header byte containing an offset to a following group, but does not correspond to a data byte. These appear in two places: at the beginning of every encoded packet, and after every group of 254 non-zero bytes. * '''Bold''' indicates a data byte that has not been altered by encoding. All non-zero data bytes remain unaltered. * A {{blue\|blue}} zero byte appears at the end of every packet to indicate end-of-packet to the data receiver. This packet delimiter byte is not part of COBS proper; it is an additional framing byte that is appended to the encoded output. * {{green\|Green}} indicates a data byte that was altered by encoding. All zero data bytes are replaced during encoding, by one plus the number of non-zero bytes that follow. * A {{blue\|zero byte}} appears at the end of every packet to indicate end-of-packet to the data receiver. This packet delimiter byte does not correspond to a data byte; it is an additional byte that is appended to the encoded output. {\| class="wikitable" \|- ! Example !! Unencoded data (hex) !! Encoded with COBS (hex) \|- \| 1 \|\| ~~<span style="font-family:monospace">~~{{mono\|00~~</span>~~}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|01}} {{green\|01}} {{blue\|00}}~~</span>~~}} \|- \| 2 \|\| ~~<span style="font-family:monospace">~~{{mono\|00 00~~</span>~~}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|01}} {{green\|01 01}} {{blue\|00}}~~</span>~~}} \|- \| 3 \|\| ~~<span~~{{mono\|00 ~~style="font-family:monospace">~~11 22 00 ~~33</span>~~}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|0301}} ~~'''11 22'''~~ {{green\|02 }} '''3311''' {{green\|01 }} {{blue\|00}}~~</span>~~}} \|- \| 4 \|\| ~~<span style="font-family:monospace">~~{{mono\|11 22 3300 ~~44</span>~~33}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|0503}} '''11 22''' 33{{green\|02}} 44'''33''' {{blue\|00}}~~</span>~~}} \|- \| 5 \|\| ~~<span style="font-family:monospace">~~{{mono\|11 0022 0033 ~~00</span>~~44}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|0205}} '''11 22 33 44''' ~~{{green\|01 01 01}}~~ {{blue\|00}}~~</span>~~}} \|- \| 6 \|\| ~~<span~~{{mono\|11 ~~style="font-family:monospace">01~~00 0200 ~~... FE</span>~~00}} \|\| ~~<span style="font-family:monospace">~~{{mono\|{{red\|FF02}} '''~~01 02 ... FE~~11''' {{green\|01 01 01}} {{blue\|00}}~~</span>~~}} \|- \| 7 \|\| {{mono\|01 02 03 ... FD FE}} \|\| {{mono\|{{red\|FF}} '''01 02 03 ... FD FE''' {{blue\|00}}}} \|- \| 8 \|\| {{mono\|00 01 02 ... FC FD FE}} \|\| {{mono\|{{red\|01}} {{green\|FF}} '''01 02 ... FC FD FE''' {{blue\|00}}}} \|- \| 9 \|\| {{mono\|01 02 03 ... FD FE FF}} \|\| {{mono\|{{red\|FF}} '''01 02 03 ... FD FE''' {{red\|02}} '''FF''' {{blue\|00}}}} \|- \| 10 \|\| {{mono\|02 03 04 ... FE FF 00}} \|\| {{mono\|{{red\|FF}} '''02 03 04 ... FE FF''' {{red\|01}} {{green\|01}} {{blue\|00}}}} \|- \| 11 \|\| {{mono\|03 04 05 ... FF 00 01}} \|\| {{mono\|{{red\|FE}} '''03 04 05 ... FF''' {{green\|02}} '''01''' {{blue\|00}}}} \|} Below is a diagram using example 34 from above table, to illustrate how each modified data byte is located, and how it is identified as a data byte or an end of frame byte.<pre> [OHB] : Overhead byte (Start of frame) 3+ -------------->\| : Points to relative location of first zero symbol Line 72 ⟶ 102: OHB = Overhead Byte (Points to next zero symbol) EOP = End Of Packet </pre~~></span~~> Examples 7 through 10 show how the overhead varies depending on the data being encoded for packet lengths of 255 or more. ==Implementation== Line 79 ⟶ 111: <syntaxhighlight lang="c"> /* * StuffData byte stuffs "length" bytes of * data at the location pointed to by "ptr", * writing the output to the location pointed * to by "dst". / ~~#include <stdint.h>~~ #include <stddef.h> #include <stdint.h> #include <assert.h> /* COBS encode data to buffer ~~#define FinishBlock(X) (code_ptr = (X), code_ptr = dst++, code = 0x01)~~ @param data Pointer to input data to encode @param length Number of bytes to encode ~~void StuffData(const uint8_t ptr, size_t length, uint8_t dst)~~ @param buffer Pointer to encoded output buffer @return Encoded buffer length in bytes @note Does not output delimiter byte / size_t cobsEncode(const void data, size_t length, uint8_t buffer) { assert(data && buffer); ~~const uint8_t end = ptr + length;~~ ~~uint8_t code_ptr = dst++;~~ ~~uint8_t code = 0x01;~~ uint8_t encode = buffer; // Encoded byte pointer ~~while (ptr < end)~~ uint8_t codep = encode++; // Output code pointer { uint8_t code = 1; // Code value ~~if (ptr == 0)~~ ~~FinishBlock(code);~~ ~~else~~ { dst++ = ptr; ~~if (++code == 0xFF)~~ ~~FinishBlock(code);~~ } ~~ptr++;~~ } for (const uint8_t byte = (const uint8_t )data; length--; ++byte) ~~FinishBlock(code);~~ { } if (byte) // Byte not zero, write it encode++ = byte, ++code; if (!byte \|\| code == 0xff) // Input is zero or block completed, restart / { * UnStuffData decodes "length" bytes of codep = code, code = 1, codep = encode; data at the location pointed to by "ptr", if (!byte \|\| length) writing the output to the location pointed ++encode; * to by "dst". } / } codep = code; // Write final code value return (size_t)(encode - buffer); ~~void UnStuffData(const uint8_t ptr, size_t length, uint8_t dst)~~ { ~~const uint8_t end = ptr + length;~~ ~~while (ptr < end)~~ { ~~int code = ptr++;~~ ~~for (int i = 1; i < code; i++)~~ dst++ = ptr++; ~~if (code < 0xFF)~~ dst++ = 0; } } /* COBS decode data from buffer /* @param buffer Pointer to encoded input bytes * Defensive UnStuffData, which prevents poorly @param length Number of bytes to decode * conditioned data at ptr from over-running @param data Pointer to decoded output data the available buffer at dst. @return Number of bytes successfully decoded / @note Stops decoding if delimiter byte is found / ~~void UnStuffData(const uint8_t ptr, size_t length, uint8_t dst)~~ size_t cobsDecode(const uint8_t buffer, size_t length, void data) { assert(buffer && data); ~~const uint8_t end = ptr + length;~~ ~~while (ptr < end)~~ const uint8_t byte = buffer; // Encoded input byte pointer { uint8_t decode = (uint8_t )data; // Decoded output byte pointer ~~int code = ptr++;~~ ~~for (int i = 1; ptr < end && i < code; i++)~~ for (uint8_t code = 0xff, block = 0; byte < buffer + length; --block) dst++ = ptr++; { ~~if (code < 0xFF)~~ if (block) // Decode block byte dst++ = 0; decode++ = byte++; } else { block = byte++; // Fetch the next block length if (block && (code != 0xff)) // Encoded zero, write it unless it's delimiter. decode++ = 0; code = block; if (!code) // Delimiter code found break; } } return (size_t)(decode - (uint8_t )data); } </syntaxhighlight> == See also == * [[Bit stuffing]] * [[Serial Line Internet Protocol]] ==References== Line 160 ⟶ 194: * [https://pypi.python.org/pypi/cobs Python implementation] * [https://github.com/cmcqueen/cobs-c Alternate C implementation] * [https://web.archive.org/web/20180719005741/http://www.jacquesf.com/2011/03/consistent-overhead-byte-stuffing/ Another implementation in C] * [https://pythonhosted.org/cobs/cobsr-intro.html Consistent Overhead Byte Stuffing—Reduced (COBS/R)] * [https://patents.google.com/patent/US9438411B1 A patent describing a scheme with a similar result but using a different method] [[Category:Encodings]]