Discussion:
Storing a large number in four bytes
Add Reply
Test
2017-05-16 12:26:11 UTC
Reply
Permalink
Raw Message
I have
...
char str[4];
..
where number up to 9999 where stored as in:
strcpy(str,"9999");

The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.

A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.

Another oen could be using a hex number say "1000" is 4096 decimal.

This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.

Not wanting to reinvent the wheel are there C source code examples to this
effect?
Test
2017-05-16 12:33:44 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
..
strcpy(str,"9999");
Let me correct that: I am not using strcpy and cause an overflow - above is a
lazy typo.
Malcolm McLean
2017-05-16 12:38:55 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
..
strcpy(str,"9999");
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.
Another oen could be using a hex number say "1000" is 4096 decimal.
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
Not wanting to reinvent the wheel are there C source code examples to this
effect?
You've created a mess by using a poor interface.
(There's also a bug lurking with the nul when you call strcpy()).

If you must have four bytes, and must continue to support ascii whilst also
supporting number greater that 0-9999, take advantage of the fact that
ascii always has top bit clear. Set that bit (OR with 0x80) if you have a number
ina non-asciii format. Which will probably be binary, you can save up to 2 billion
in 31 bits in binary. However you need to reconstruct the integer from your
binary format.

Mask off the top bit (AND with 0x7F). Then read out the bytes, and use multiplication
to reconstruct the integer (for multiply by 256, add the unsigned byte, four times).
GOTHIER Nathan
2017-05-16 13:36:46 UTC
Reply
Permalink
Raw Message
On Tue, 16 May 2017 15:26:11 +0300
Post by Test
I have
...
char str[4];
..
strcpy(str,"9999");
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.
Another oen could be using a hex number say "1000" is 4096 decimal.
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
One solution to keep compatibility with your old numbering scheme is to set a
flag character such as:

[0000-9999] old numbering scheme for [0-9999]
[000a-999a] new numbering scheme plane 'a' for [10000-10999]
[000b-999b] new numbering scheme plane 'b' for [10999-11999]
...
[000Z-999Z] new numbering scheme plane 'Z' for [60999-61999]

Where decoded_number = 10000 + ((flag_value - 1) * 1000) + offset_value
For flag_value from [a-Z] as [1-52]
Post by Test
Not wanting to reinvent the wheel are there C source code examples to this
effect?
Only you have invented a bad designed wheel then only you can fix this.

Good luck! ;-)
Ben Bacarisse
2017-05-16 16:53:28 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
..
<snip>
Post by Test
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
str is (and I'm sure you know) a bad name because the array is not a
string.

The best advice will depend on how these bytes are used. Do you rely on
some particular sorting order, for example? If so, that limits the
schemes you can use.

Given a free hand, I would using consider a union like this

union number {
int32_t as_int;
char as_chars[4];
};

not for actually storing the numbers -- if you could change the type you
use for them your first choice would be simply to use an int -- but for
inspecting and changing them:

unsigned long valof(char n[4]) // the 4 is for documentation
{
union number u;
memcpy(&u, n, sizeof u);
return u.as_int < 0 ? -u.as_int : whatever_you_do_now(n);
}

I hope you get the idea. It's a variation on what Malcolm was suggesting.

<snip>
--
Ben.
Keith Thompson
2017-05-16 18:48:21 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by Test
I have
...
char str[4];
..
<snip>
Post by Test
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
str is (and I'm sure you know) a bad name because the array is not a
string.
The best advice will depend on how these bytes are used. Do you rely on
some particular sorting order, for example? If so, that limits the
schemes you can use.
Given a free hand, I would using consider a union like this
union number {
int32_t as_int;
char as_chars[4];
};
not for actually storing the numbers -- if you could change the type you
use for them your first choice would be simply to use an int -- but for
I'd use unsigned char. A sequence of unsigned char is the definition of
an *object representation* (N1570 6.2.6.1p4).

If you're constrained to represent an integer value as a sequence of
plain char objects, it's going to be a little trickier. Knowing exactly
what the OP's requirements are would be helpful.

[...]
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
d***@gmail.com
2017-05-17 01:16:16 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
..
strcpy(str,"9999");
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.
Another oen could be using a hex number say "1000" is 4096 decimal.
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
Not wanting to reinvent the wheel are there C source code examples to this
effect?
Change the underlying data type to int, if it can be negative and unsigned int if not.

You are going to have to change the interface to the data anyway. Any process that accesses that column will need to be reprogrammed or it will get wrong answers. So, because you will have to correct each and every process that touches the column anyway, you might as well fix it with a sensible change. The int type is compact and easy to understand and maintain. A "four byte kludge integer" is something very bad. You would have to have new read, write and display algorithms.

Convert on the column will have to be performed once.

Now, suppose that you write your spiffy new "four byte kludge integer" and it gets tested over a large enough dynamic range so that you are sure that it will work.

Now, the boss wants to write a report against the data using standards based tools like Business Objects, Crystal Reports, Pentaho Reporting, or what have you. How well is the spiffy new "four byte kludge integer" going to work in that environment?

IMO-YMMV.
Robert Wessel
2017-05-17 04:53:07 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
..
strcpy(str,"9999");
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.
Another oen could be using a hex number say "1000" is 4096 decimal.
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
Not wanting to reinvent the wheel are there C source code examples to this
effect?
As an aside, a similar thing happened in many mainframe data
processing shops in the US in the early/mid eighties.

Many shops that stored addresses stored the ZIP code (postal code) as
a five character field, with five numeric digits in it. IOW, a Cobol
"PIC 99999 USAGE DISPLAY" field. The introduction of the extended
scheme ("ZIP Plus 4" - that's the 12345-6787 you see at the end of
many computer generated addresses - rarely is that written on hand
addressed envelopes) suddenly gave people *nine* digits to store.

The hack, repeated in hundreds of shops, was to redefine (create a
union) of the original (display) field and a *packed* (BCD) ZIP+4
field, and use the fact that no valid packed field would also be a
valid display field (and vice-versa, at least in EBCDIC) to
discriminate. So:

05 ZIP-CODE PIC 9(5) USAGE DISPLAY.
05 ZIP-PLUS-FOUR REDEFINES ZIP-CODE PIC 9(9) USAGE COMP-3. (="BCD")
...
IF ZIP-CODE IS NUMERIC
...use 5 digit ZIP Code
IF ZIP-PLUS-FOUR IS NUMERIC
...use 9 digit ZIP+4

That let them handle both formats, without getting all the files
converted, plus allowed them to update the fields without
re-arraigning the record layouts to accommodate a bigger field.

It would be a trivial thing to do something similar, and distinguish
between the two cases based on the "old" form being valid or not. A
simple solution would be to store a "z" in the first position, and
then a 24-bit binary number in the other three. That get's you some
16.7 million additional values. A bit more complex, you could code
the new numbers as a 24 bit binary plus another number from
0..(255-11) (there being 11 valid character in the first position of
an "old" style number, for a total for 4.1 billion new numbers.

Now whether or not this is a good idea is a completely different
question.
m***@gmail.com
2017-05-17 07:39:17 UTC
Reply
Permalink
Raw Message
Post by Robert Wessel
As an aside, a similar thing happened in many mainframe data
processing shops in the US in the early/mid eighties.
I'm having flashbacks :-) (Skip the next bit if you object to
a bit of semi-relevant, but not strictly on-topic, reminiscence)

I was a newly recruited COBOL programmer in late 1978, working
on a system which managed production in a factory. A monthly
batch job scheduled and rescheduled work orders (internal orders
for the component parts which would be assembled into the
product) for the coming month.

The process started by setting the due dates of all work orders
to an arbitrary future date, taken to be far enough away to be
considered cancelled, and then scheduled them back in to meet
requirements, and raised new work orders as needed.

The system (written in the early 1970s) represented due dates in
YYMM format and the "arbitrary future date" chosen was "7912"...

My first job was to find all the hardcoded references to "7912" and
replace them with "9912", and write programs to patch the database
in like manner.

I'm presuming that everything had changed by the time Y2K appeared
on the horizon - I'd moved on long before then.
Robert Wessel
2017-05-17 16:21:02 UTC
Reply
Permalink
Raw Message
Post by m***@gmail.com
Post by Robert Wessel
As an aside, a similar thing happened in many mainframe data
processing shops in the US in the early/mid eighties.
I'm having flashbacks :-) (Skip the next bit if you object to
a bit of semi-relevant, but not strictly on-topic, reminiscence)
I was a newly recruited COBOL programmer in late 1978, working
on a system which managed production in a factory. A monthly
batch job scheduled and rescheduled work orders (internal orders
for the component parts which would be assembled into the
product) for the coming month.
The process started by setting the due dates of all work orders
to an arbitrary future date, taken to be far enough away to be
considered cancelled, and then scheduled them back in to meet
requirements, and raised new work orders as needed.
The system (written in the early 1970s) represented due dates in
YYMM format and the "arbitrary future date" chosen was "7912"...
My first job was to find all the hardcoded references to "7912" and
replace them with "9912", and write programs to patch the database
in like manner.
I'm presuming that everything had changed by the time Y2K appeared
on the horizon - I'd moved on long before then.
IIRC, MVS still has special handling for some dates in 1999, since
those were used for the expiration date for "never expire" files.
Scott Lurndal
2017-05-17 13:04:47 UTC
Reply
Permalink
Raw Message
Post by Robert Wessel
Post by Test
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
As an aside, a similar thing happened in many mainframe data
processing shops in the US in the early/mid eighties.
Many shops that stored addresses stored the ZIP code (postal code) as
a five character field, with five numeric digits in it. IOW, a Cobol
"PIC 99999 USAGE DISPLAY" field. The introduction of the extended
scheme ("ZIP Plus 4" - that's the 12345-6787 you see at the end of
many computer generated addresses - rarely is that written on hand
addressed envelopes) suddenly gave people *nine* digits to store.
A far more significant problem in that timeframe was the common
use of PIC 99 for the year (instead of PIC 9999) to save space,
which was tight in those days. That lead to the massive Y2K effort
in the 1990's.
bartc
2017-05-17 13:32:59 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
A far more significant problem in that timeframe was the common
use of PIC 99 for the year (instead of PIC 9999) to save space,
which was tight in those days. That lead to the massive Y2K effort
in the 1990's.
I remember writing a POS system in the mid-90s that stored the year as 2
digits. But it was offset, so that year digits 90-99 became 1990 to
1999, but years 00 to 89 had an extra century added so they became 2000
to 2089 when displayed or printed. (I think it's still running.)

Of course, all the people that didn't think that far ahead, benefited by
being paid to fix the problem they'd created!
--
bartc
Robert Wessel
2017-05-17 16:24:06 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by Robert Wessel
Post by Test
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
As an aside, a similar thing happened in many mainframe data
processing shops in the US in the early/mid eighties.
Many shops that stored addresses stored the ZIP code (postal code) as
a five character field, with five numeric digits in it. IOW, a Cobol
"PIC 99999 USAGE DISPLAY" field. The introduction of the extended
scheme ("ZIP Plus 4" - that's the 12345-6787 you see at the end of
many computer generated addresses - rarely is that written on hand
addressed envelopes) suddenly gave people *nine* digits to store.
A far more significant problem in that timeframe was the common
use of PIC 99 for the year (instead of PIC 9999) to save space,
which was tight in those days. That lead to the massive Y2K effort
in the 1990's.
While true, most Y2K solutions didn't try to overload a single field
with two formats, and so doesn't really seem on topic.
j***@gmail.com
2017-05-18 15:23:30 UTC
Reply
Permalink
Raw Message
Post by Test
I have
...
char str[4];
I'd suggest

#define SERIALLENGTH 8

char str[ SERIALLENGTH ];
Post by Test
..
strcpy(str,"9999");
memcpy( str, "00009999", SERIALLENGTH );
Post by Test
The original design was that only 0-9999 would be enough (only positive
integers). Unfortunately this was not enough (after 10 years). It would much
better if I could fit a higher number. What I need to maintain compatibility, ie
if str gets " 15" it mean number 15. A hex number could otherwise suffice if not
for compatibility.
The problem with hexadecimal is telling hexadecimal 1234 from decimal 1234.
Post by Test
A quick solution would have a numbering systems where 0..9998,9999, AAAA,AAAB...
etc. where "AAAA" is 10000 and "AAAb" 10001 and so on.
Pseudo-hexadecimal could work, but if you stop at F you only get
6^4 (== 1296) new serial numbers.

You can go to Z and get 26^4 (== 456976) new serial numbers.

You're going to have to add code various places to handle those numbers
in a meaningful way.

Validation means that you have to make sure the numbers are in one
range or the other, but not mixed. Trying to allow 1A34 is going to
eat your lunch. (It can be done, but you have to really think about
things, like whether 1A34 comes before 2000 or after 9999. Comparisons
can get really tricky.)
Post by Test
Another oen could be using a hex number say "1000" is 4096 decimal.
See above about telling the difference between 1000 hexadecimal and
1000 decimal.
Post by Test
This would make life a lot easier and provide backward comptibility: older
systems can continue providing 0...9999 but newer systems would be able to also
utilize higher numbers while haveing 0...9999 if the number is within oldstyle
range. When when the main program sees anything other than space of a digit it
assumes that a newer version want to give a number larger than 9999.
Not wanting to reinvent the wheel are there C source code examples to this
effect?
That's your homework, of course.

But it's fairly straightforward. Validation requires a bit more code
and comparison has to compare the two ranges separately.

Have fun.

--
Joel Rees

Ranting randomly:
http://reiisi.blogspot.com

Loading...