Maker Pro
Maker Pro

Serial data decoding

E

Ed

Jan 1, 1970
0
Please help!
I have a string of data as the result of serial to parallel hardware
conversion.
I need to decode this string into correct bytes.
The start of a byte position is unknown.
In this particular case, part of my data is as follows.
Serial data string: (not rs232, straight serial data)
01001010010010010101001001001001010100100010010001001010
This was read through a shift register to get 8-bit bytes to store in memory
as:
4A 49 52 49 52 24 4A
I need to decode bit pairs into corrected bits.
00 = 0 01=1 10=0 11=Illegal
So that means every 16bits = 8bit byte
Ultimately I need to find certain values within this string which confirm
bit decode positions.
Somewhere within this string of bits there is a pattern of 4E 4E A1 and on
paper I do see it.
The actual data memory size to decode is 16k.
Thanks,
Ed
 
D

Dan Henry

Jan 1, 1970
0
Please help!
I have a string of data as the result of serial to parallel hardware
conversion.
I need to decode this string into correct bytes.
The start of a byte position is unknown.
In this particular case, part of my data is as follows.
Serial data string: (not rs232, straight serial data)
01001010010010010101001001001001010100100010010001001010
This was read through a shift register to get 8-bit bytes to store in memory
as:
4A 49 52 49 52 24 4A
I need to decode bit pairs into corrected bits.
00 = 0 01=1 10=0 11=Illegal
So that means every 16bits = 8bit byte
Ultimately I need to find certain values within this string which confirm
bit decode positions.
Somewhere within this string of bits there is a pattern of 4E 4E A1 and on
paper I do see it.

Where do you see it? According to your coding rules, the encoded
nibble '4' would be 00010000 and I don't see any run of 4 zeroes in
the original encoded string.
 
E

Ed

Jan 1, 1970
0
Look at where I separated it.
Ignore the 6 bits then start.
010010 1001001001010100 - 1001001001010100 - 1000100100010010 - 10

4E = 0100 1110 = 1001001001010100

Remember that both a 10 and a 00 = 00
 
D

Dan Henry

Jan 1, 1970
0
Look at where I separated it.
Ignore the 6 bits then start.
010010 1001001001010100 - 1001001001010100 - 1000100100010010 - 10

4E = 0100 1110 = 1001001001010100

Remember that both a 10 and a 00 = 00

Please don't top-post.

That last bit about dual patterns for zero is what I missed.

Anyway, what's the issue? How to code something to do a post-process
search for 4E 4E A1? How to code something to sync up to 4E 4E A1 in
realtime as the data streams in?

--
Dan Henry

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
E

Ed

Jan 1, 1970
0
Dan Henry said:
Please don't top-post.

That last bit about dual patterns for zero is what I missed.

Anyway, what's the issue? How to code something to do a post-process
search for 4E 4E A1? How to code something to sync up to 4E 4E A1 in
realtime as the data streams in?

--
Dan Henry

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?



The decode can be done either way.
Decoding it on the fly would be faster I think.
But, I am able to store the data first then parse through it.
Thanks for any help.
 
Please don't top-post.
You worry about top posting and don't edit?!?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Interesting. But not an example of top posting. It's an example of
reversal of an interspersed reply, used, I assume, because Q1 Q2 A1 A2
isn't "order in which people normally read text" either. A bit
disingenuous, no?

Ponder why someone may already have read the question and may want to see
just the answer and skip to the next message, noting that in text groups
usenet is resonably reliable these days.*

Ponder why editing to just the highlight of the question and doing an
an interspersed reply might also work nicely.

Ponder if the fact that the bottom post "rule" letting you jump on a
supposed less 1337 individual is really a sufficient reason for it's
existence.

3ch

*I know several people who read with speech interfaces and -really-
prefer not to have to read the same message over and over to get to the
replys in case this one is stumping you.
 
Please help!
I have a string of data as the result of serial to parallel hardware
conversion.
I need to decode this string into correct bytes.
The start of a byte position is unknown.
In this particular case, part of my data is as follows.
Serial data string: (not rs232, straight serial data)
01001010010010010101001001001001010100100010010001001010
This was read through a shift register to get 8-bit bytes to store in memory
as:
4A 49 52 49 52 24 4A
I need to decode bit pairs into corrected bits.
00 = 0 01=1 10=0 11=Illegal
Is it correct that the bit pattern 11 is always illegal but otherwise 10
is the prefered encoding for 0 (i.e. clear 10 cannot be encoded 01 10 and
clear 00 cannot be encoded 00 00)?
If that is true then I believe you only need to look for a single special
5 bit patter (01000) to find the incorrect bit boundary.

If the last encoded digit was a zero then the last digit of the encoded
stream is also a zero and the encoding would be
00->1010
01->1001
10->0100
11->0101
starting after a one
00->0010
01->0001
10->0100
11->0101
so legal 5 bit patterns are in the encoded stream are
01010
01001
00100
00101
10010
10001
10100
10101
8 out of 32 possible patterns so there are 24 illegal 5 bit patterns. Find
one and you've off on you bit boundaries. 0 10 00 is -not- on the list
but it is in a sifted position in 4E 4E A1:

4E 4E A1 encodes as
xx 01 00 10 01 01 01 00 10 01 00 10 01 01 01 00 01 00 01 00 10 10 10 01
^^ ^^ ^
so finding 01 00 01 confirms your current bit boudary while finding 10 10
00 denys it and one or the other must be present if 4E 4E A1 is in the
unencoded version.

Double check the conversions as I did them at the keyboard but with 3/4 of
the 5 bit combos illegal I'll bet there's one in the sample string even if
I've glitched the conversion.

3ch
 
4E 4E A1 encodes as xx 01 00 10 01 01 01 00 10 01 00 10 01 01 01 00 01 00 01
00 10 10 10 01
.. ^^ ^^ ^
The marker was wrong. 01 00 0 occurs later. so it is 01 00 01 which
confirms and 10 10 00 which denys correct allignment

3ch
 
E

Ed

Jan 1, 1970
0
I see I made an error in coppying the 3rd sequence.
010010 1001001001010100 - 1001001001010100 - 1000100100010010 - 10
Should have been
010010 1001001001010100 - 1001001001010100 - 0100010010001001 - 10
Sorry bout that.

Decoding it on the fly would be faster I think.
But, I am able to store the data first then parse through it.

My newsserver is having problems, so I've resorted to posting from
Google Groups.

If you can post-process it, I'd first decode the bit pairs, then shift
the pattern across the resulting decoded bits. The following code
should give you gist of it. The code is not portable, assumes
little-endian by ordering, could be optimized, etc.

Do notice that of the 4E 4E A1 that you saw, I don't find the A1 by
hand and neither does the program. However 4E 4E 14 is present.

-Dan Henry

#include <stdio.h>
#include <stdlib.h>

/* int find_bit_pattern(buffer, length, pattern, width)
*
* The function searches 'length' bits of 'buffer' bit-by-bit for the
* 'width'-bit 'pattern' ('width' must be <=24) and returns the bit
position
* where the pattern was found or -1 if not found.
*
* NOTE: This function assumes little-endian byte ordering.
*/
int find_bit_pattern(unsigned char *buffer,
unsigned length,
unsigned long pattern,
unsigned width)
{
union _32_BITS {
unsigned long ul;
unsigned char b[4];
};
union _32_BITS pat; /* Shifting search pattern */
union _32_BITS msk; /* Shifting match mask */
union _32_BITS xor; /* Pattern match result */
int pos = 0; /* Search's bit position */
unsigned long mask = 0;
unsigned i;

/* Left-justify pattern and pattern mask.
*/
for (i = 32 - width; i != 0; i--) {
pattern <<= 1;
mask = (mask << 1) | 1;
}
mask = ~mask;
msk.ul = mask;
pat.ul = pattern;

length -= width;
while (length--) {
xor.b[0] = (buffer[3] ^ pat.b[0]) & msk.b[0];
xor.b[1] = (buffer[2] ^ pat.b[1]) & msk.b[1];
xor.b[2] = (buffer[1] ^ pat.b[2]) & msk.b[2];
xor.b[3] = (buffer[0] ^ pat.b[3]) & msk.b[3];

if (xor.ul == 0) {
return pos;
} else {
/* Shift the pattern and mask 1 bit. Every 8 bits,
'advance' the
* pattern, mask, and buffer pointer to the next byte.
*/
msk.ul >>= 1;
if (msk.b[3] == 0) { /* 8 bits shifted? */
pat.ul = pattern;
msk.ul = mask;
buffer++;
} else {
pat.ul >>= 1;
}
}
pos++;
}
return -1;
}

int decode_pairs(unsigned char *pairs, size_t pairs_len, unsigned char
*decoded)
{
unsigned char pairs_bit = 0x80;
unsigned char decode_bit = 0x80;

*decoded = 0;
while (pairs_len != 0) {
if (*pairs & (pairs_bit >> 1)) {
if (*pairs & (pairs_bit >> 0)) {
return -1;
} else {
*decoded |= decode_bit;
}
}
if ((pairs_bit >>= 2) == 0) { /* Advance pairs by two bits */
pairs_bit = 0x80;
pairs_len--;
pairs++;
}
if ((decode_bit >>= 1) == 0) { /* Advance decode by one bit */
decode_bit = 0x80;
printf("%02X", (unsigned)*decoded);
*++decoded = 0;
}
}
if (decode_bit != 0x80) {
printf("%02X", (unsigned)*decoded);
}
putchar('\n');
return 0;
}

/* 01001010 01001001 01010010 01001001 01010010 00100100 01001010 RAW
*
* 01001010010010010101001001001001010100100010010001001010 RAW
* 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0
DECODED
*
* NOTE: The Ch C interpreter's (www.softintegration.com) sizeof
operator
* seems to not be quite right and needs parens.
*/
unsigned char raw_serial[] = {0x4A, 0x49, 0x52, 0x49, 0x52, 0x24,
0x4A};
unsigned char decoded_serial[((sizeof raw_serial)/2)+1];

int main(void)
{
int i;

if (decode_pairs(raw_serial, (sizeof raw_serial), decoded_serial)
!= 0) {
printf("Illegal bit pair in serial data string.\n");
exit(EXIT_FAILURE);
}
i = find_bit_pattern(decoded_serial, (sizeof decoded_serial)*8,
0x004E4E14, 24);
if (i < 0) {
printf("Pattern not found.\n");
} else {
printf("Pattern found at decoded bit position %d.\n", i);
}
exit(EXIT_SUCCESS);
}
 
E

Ed

Jan 1, 1970
0
. ^^ ^^ ^
The marker was wrong. 01 00 0 occurs later. so it is 01 00 01 which
confirms and 10 10 00 which denys correct allignment

3ch

01 01 01 01 01 is legal
10 10 10 10 10 is legal
10 00 01 is legal
11 is illegal
00 00 00 is illegal
Actual coding rules:
1 = 01
0 = 10 if following a 0 data bit
0 = 00 if following a 1 data bit
 
01 01 01 01 01 is legal
10 10 10 10 10 is legal
Not under the encoding assumptions I listed.

How do you know when to encode 0 as 00 and when as 10? If you know that
rule you can almost certainly find a pair of fairly short bit patterns to
sync the bits.

Your example encoded string had no strings of more than three zero in a
row and no pairs of ones. In fact "11" was refered to as illegal, not
merely "unused". When I've seen funny schemes like this it's often been to
ensure a transition at least every so often to keep things in sync (i.e.
rll for a hard drive), so it seems to make some sense.

If the rules say use 10 unless it would make a 11 in the encoded stream
then x0 10 00 will never occur, but 01 00 0x -will- occur if your sample 3
bytes are encoded.

So the question is it ok to encode 10010 encoded as 0100000110 or not? Do
you have a sample of data? Is there -ever- a 11? a 0000? If not, then I
stongly suspect these or similar rules are in effect and you can sync up
without even looking at the decoded data. And the sample data could occur
in both the true and off-by-half-a-bit streams so merely looking at the
decoded data can fail to identify the correct stream.

3ch
 
E

Ed

Jan 1, 1970
0
Not under the encoding assumptions I listed.

How do you know when to encode 0 as 00 and when as 10? If you know that
rule you can almost certainly find a pair of fairly short bit patterns to
sync the bits.

Your example encoded string had no strings of more than three zero in a
row and no pairs of ones. In fact "11" was refered to as illegal, not
merely "unused". When I've seen funny schemes like this it's often been to
ensure a transition at least every so often to keep things in sync (i.e.
rll for a hard drive), so it seems to make some sense.

If the rules say use 10 unless it would make a 11 in the encoded stream
then x0 10 00 will never occur, but 01 00 0x -will- occur if your sample 3
bytes are encoded.

So the question is it ok to encode 10010 encoded as 0100000110 or not? Do
you have a sample of data? Is there -ever- a 11? a 0000? If not, then I
stongly suspect these or similar rules are in effect and you can sync up
without even looking at the decoded data. And the sample data could occur
in both the true and off-by-half-a-bit streams so merely looking at the
decoded data can fail to identify the correct stream.

3ch

A 10010 would encode as 0100100100
Note that both 10 and 00 = 0
The rules stated that:
1 = 01
0 = 00 if following a 1
0 = 10 if following a 0
 
A 10010 would encode as 0100100100
Note that both 10 and 00 = 0
The rules stated that:
1 = 01
0 = 00 if following a 1
0 = 10 if following a 0
Ok. I missed you'd clairified. Sorry.


So if you have 3 bits to encode they would encode as (x means I can't
tell without the previous data)

000 - x0 10 10
001 - x0 10 01
010 - x0 01 00
011 - x0 01 01
100 - x1 00 10
101 - x1 00 01
110 - x1 01 00
111 - x1 01 01


Since the unencoded three bit patterns are all the possible patterns, then
the resulting patterns in the last 5 bits are the only possible ones that
can occur. But there are 32 possible unrestricted 5 bit patterns. That
means there are 24 that cannot occur in an encoded message. Listing and
noting that four 0's in a row cannot occur:

00000 - 0000 rule
00001 - 0000 rule
00010
00011 - 11 rule
00100 - OK
00101 - OK
00110 - 11 rule
00111 - 11 rule
01000 -
01001 - OK
01010 - OK
01011 - 11 rule
01100 - 11 rule
01101 - 11 rule
01110 - 11 rule
01111 - 11 rule
10000 - 0000 rule
10001 - OK
10010 - OK
10011 - 11 rule
10100 - OK
10101 - OK
10110 - 11 rule
10111 - 11 rule
11000 - 11 rule
11001 - 11 rule
11010 - 11 rule
11011 - 11 rule
11100 - 11 rule
11101 - 11 rule
11110 - 11 rule
11111 - 11 rule

leaves only two that might occur in a bit shifted stream but that cannot
occur in the proper one. I found one on these in the bit-shifted vesion of
your sample data so you know it must exist in the "wrong" stream.

I belive that unless you know a quite a bit about the data (like your
sample data cannot occur due to shifting of other data) merely scanning
the decoded data will not work.

3ch
 
Top