Thoughts from the Deep

by Lukas Oberhuber Started 12/19/96

The contents of a stream of information can only be understood/parsed if the structure of the information is understood.

Fuzzy version: Some of the info can be parsed if some of the structure is understood.

Consequences: Natural/continuous speech can only be correctly parsed if the computer understands language.
This also shows why it’s such a pain to move data between disparate systems (like databases). What it also shows is that the implied information that is not in the stream can hold some, a lot or almost all of the actual information.
So, what cryptography really does is take a huge amount of information out of the stream and transfer that through other means. Think of it this way, the cryptography algorithm mixes in a large amount of information into the data stream (presumably so large that it outweighs the information in the stream) so that the actual data can’t be found. It makes the data stream appear to be random which is also why it is so hard to compress.
Look at the lowest level: the code word “sausage.” If this means to an army, start the attack, none of that information--except the exact timing of the attack--is in the message. All of the information was communicated at another time.
Corollary: Spoken language has always been considered to be low in information content, but this is completely false in the general case. In fact, when looking at the information stream, one has to include all the info not spoken as part of the data. In most cases, the not transmitted data rivals or exceeds the transmitted data. It is not the ambiguity of language that makes it hard to recognize accurately, but lack of knowledge of the structure, or remaining data. This appears as ambiguity when in fact it is just lack of knowledge.

As a matter of fact, the possibility of ambiguity is accepted as part of the language. The amount of information that can be transmitted is very large, and the cost of rectifying miscommunication is small comparatively. I would strongly suggest that the focus on ambiguity in language has been a red herring. Humans easily resolve ambiguity in normal conversation, due to knowledge, otherwise know as context (in linguistics).

Theorem (maybe?): This should be interpreted in the light of information theory. A data stream consists not only of the data/information inside the stream, on the wire, over the air, but the corresponding information outside of the stream. We have to look at data streams as two parts, what is transmitted, the in band part, and what is implied, the out band part.
Much current analysis of information has concentrated on the in band part and relegated the rest to a separate analysis. In AI linguistics this would be the context or even intelligence. They have focused on ambiguity, an in band problem, rather than context/knowledge, an out band problem which is considered intractable (or at least very difficult) and therefore something to be worked around. Compression theory talks to this problem very clearly. A simple theorem states that you must know something about the data in order to compress it. It is impossible to devise an algorithm which will compress all possible data streams at least 1 bit. And the higher your compression, the more you need to know about the data coming through.
Now we begin to see why computers have such a difficult time communicating: they don’t know anything about protocols they haven’t been programmed to accept. This is because the information to read the protocol is implied, is out band. MIDI is a good example of a protocol with a great deal of out band information in it. It is very compact, but requires the correct voices to be at the other end of the line. Data caches are a similar concept, but in a very crude way.
I would say that there is a continuum in data streams from in band heavy to out band heavy. While computers are almost exclusively in band heavy, humans have the ability/knowledge to range far higher on this scale.

So let us call I the amount of data in band and O the amount of data out band. Then

B = O / I

B is the Band Ratio. As B approaches infinity the data stream becomes harder and harder to interpret. Let us call D the difficulty of interpreting a stream, then if we have 2 streams X and Y, we get:

Bx <>

I believe this relationship always holds, regardless of the level at which it is looked at (i.e. bits in a network, bytes, ascii text, or human speech, or voltages). I think this also speaks for the philosophical theory that humans maintain an internal representation of the world (it is simply more efficient.) It is part of what allows us to make sense of the world. Thus by processing a small amount of data in our world, we can still get an amazingly accurate view of it.
So how do you get levels of compression impossible today? Put the knowledge at both ends. Only transmit triggers to it (this is the hybrid CD-ROM/Internet concept for example.) What is useful here? I think we get a clearer idea of what it means to communicate, and we can begin to address issues of computer communication and why today it is so bad/difficult.

T is the total information transmitted.

T = O + I

then

Bx <>

Dictionaries help (if you have the right dictionary.)

Addendum 7/14/97

One of the issues immediately raised by the foregoing:
What is included for the purposes of measuring B (Band Ratio)?
The problem is this: O (outband) info could range from the topic of conversation all the way up to the entire universe (in this model we add that any piece of matter is also information to an arbitrary level of detail). This problem of context for this “context-based” theory could be fatal. I propose however that there is some function which can loosely be looked at as a distance function from the topic of transmission/communication. This would include the distance in terms of level of detail as well as distance in topic. This function would return a value inversely proportional to the distance of the topic. The value of this function would be the relevance of the additional data to the current transmission. This allows us to come up with reasonable values for O, rather than having to assume O is infinite. Unfortunately I believe we will only be able to estimate the values for O in real-world situations.
However, for computer to computer communications I believe O can be accurately determined (in fact the programmer of the systems communicating must know what is O and program it in manually).
So back to math:
P is an arbitrary piece of information

P_allis all the information in the universe: Integral(P)
T is the transmission topic

R( T, P ) is the range/distance between the topic and some piece of information

V( R( T, P ) ) is the value ratio that some P contributes to O (V increase with increasing relevance).
Then the value of O is

O = integral ( V( R( T, P ) ) )

A possible definition of V

V = 1 / (R( T, P ) )²

Then T would equal:

T = integral ( V( R( T, P ) ) ) + I

where I is the bits over the wire. One step further:

T = integral ( 1 / square(R( T, P )) ) + I

Remember

R = Square Root( (R_topic)²+ (R_{level of detail})² )

9/2/97
Data streams can have levels of detail, or more precisely, information can be retrieved in parts. A computer can almost never, extract useful information from a data stream (DS) without “understanding” the entire contents of the stream.

3/5/07

One of the issues which arises, is what happens when these equations are applied to smaller and smaller parts of a transmission stream, as integrals like. Do the results of the equations tend to infinity or 0, thereby making them useless. I don't think so, as the only real change is that the rest of the transmission is added to the O value so that

T = integral ( 1 / square( R( T, P ) ) ) + I(rest) + I(slice)

where I(rest) is the remainder of the transmission, thus

O = integral ( 1 / square( R( T, P ) ) ) + I(rest)

Cryptography Example

Imagine a 128-bit key K plus a 128 byte plain text Z.

Assuming all we want to do is retrieve Z from the transmission.

What is O and what is I (in terms of order of magnitude).

The knowledge of K is worth 2^128 and the plain text is worth (2^8)^128 in terms of random complexity (assuming the plain text is random which in practice it wont be).

Now we ship 2^136 bits (I) across the wire after encrypting the data with algorithm C, so

I = C(Z, K)

O in this case is what?

O = C + K + Z - C(Z,K)

Does this help at all?

Thoughts from the Deep

Saturday, October 04, 2003

Blog Archive

Links