Tuesday, 12 February 2008

chat bots exposed googletalk



Chat Bots Exposed! GoogleTalk?

Chat bots are applications designed to mimic man and engage him in a

conversation using normal human language.

The challenge is to conceal the identity of the application, and be

for all intensive purposes, perceived as just another person on the

other end of a communication medium.

The best attempt to date is Alicebot from http://alicebot.org/. Its

underlying architecture has been released as open source and packaged

as AIML (Artificial Intelligence Markup Language).

The basic premise behind it is to map input patterns to output

responses, or input patterns to other simplified input patterns for

re-processing. (similar to recursion). There is also a framework for

executing functions and retrieve and defining variables during a

session and possibly saved and reloaded for others.

Consider a simple input pattern, output response pair, a single

conversation thread.

>How are you today?

Never better!

Although the most believable system to date, its still not machine

scaleable. The only method for training is via a botmaster who simple

edits and modifies the existing AIML text file.

Even worst, conversations can become predictable and stale due to

little variations via a random function. Your typical AIML will handle

about 40,000 input responses.

In contrast, Google Sets took about 2,000,000 elements. Google Suggest

took about 80,000,000. For translation Google admitted they processed

2,000,000,000 United Nations transcripts.

If such a system is going to work you'll probably need trillions of

conversation threads.

Why not use Instant Messaging such as GoogleTalk Aggreagated data

and/or emails as the training corpus since it can be represented as

conversations. It's impossible for a single botmaster to train

anything even remotely scaleable in comparison.

I've already started the development. I need your help. Start saving,

and on a frequent basis, send me (mailto://questsin@rogers.com) your

chat history from your various Instant Messaging applications like

ICQ, AIM, MSN Messenger, Yahoo Messenger and Email.

In return, when the application gets enough critical mass, I will

email the application, its source code and all underlying dataset,

completely free with no strings attached.

I will release the code as open source to everyone else.

Have you seen Google Mail? Google is probably doing this today. They

openly admit they never delete any information including the emails

you delete.

They say it's for building a profile for targeting advertisement.

A possible tell

1. We organize your emails into conversations

This technology cannot be left and dominated by private hands, even

Google.

Especially a Corporation with such a powerful presence in the media

and controlled by relatively few.

Why does a company have to say our number 1 rule is to "Do no Evil"

I cannot help but make an analogy to conquers with mantras "I come in


No comments: