Wednesday, August 17, 2005

Semantic Information Hiding: Mary thinks Joe is intelligent, but how smart is she? Perhaps I am not smart enough to be called truly intelligent.=0110

The English language contains a lot of redundancy. If one chooses to convey some information X, there exists a collection of sentences {L1,...,Ln} that could express X. For example, replacing words with their synonyms is one such simple operation that doesnt not alter meaning. Consider a sentence L* that contains a word W', which has synonyms {W1,...,Wk}; then the replacement of W' with Wi, where W \in [1,k], does not alter the meaning X. Now for an example.

The statement "Tombone is smart" is semantically equivalent (up to a threshold) to "Tombone is intelligent." Imagine that I write a program called Propaganda to fetch thousands of random blogs (which will inevitably contain the words "smart" and "intelligent") and concatenate them into one gigantic blog. Inside this gigantic blog, everytime I encounter the word "smart" or "intelligent" I am free to replace one with the other or keep the original word. I perform my swaps whenever I see them necessary, and repost the blog on the web. I also write another program called InterpretBlog, which simply reads in this gigantic blog (which looks like a bunch of text that normal people would write on a daily basis such as rants, love stories, debates, etc). While reading in the gigantic blog, the InterpretBlog program writes another file. When InterpretBlog encounters the word "intelligent" it writes a 0, and when it encounters the word "smart" it writes a 1. This new file that InterpretBlog writes can be anything we desire such as an executable program, an image, a hidden message, or a Dickens novel (given that the number of "intelligent"s plus the number of "smart"s is large enough).

The point is that two hosts can communicate via hidden messages embedded into blogs.

An hour later:
After a bit of googling, I found that the scientific term which most accurately captures the essence of my idea is "Lexical Steganography." I found the following webpage on Lexical Steganography very helpful.