(Re-)Discovering Twitter

You might have noticed that on the right-hand side of this blog there’s a new widget that shows my Twitter posts (or should I call them “Twitter tweets” or just “tweets”?)

As you can imagine, this is because I have recently re-discovered Twitter. I remember playing a bit with it a while back when it was still new but I never really got it…. Until now… Well, I don’t think I still fully understand its potential, but I started using it mostly as a newsreader. But it also occurred to me that I could use it to give short status updates that wouldn’t justify a whole blog post. And as you know, I’m not that good at keeping the blog up to date.

So, I decided to use my Twitter account a bit more in order to give more informal status updates and talk about stuff that wouldn’t be fully related to the blog. Also, every time there’s a new blog post or update, I will make an announcement via Twitter. So, if you follow me on Twitter, you will also know when I’ve updated something new to the blog.

Let’s see how it goes. Thanks again for your continued readership! I’m looking forward to connecting with you via Twitter and/or via this blog!

You can follow me on Twitter @alfredomg

And of course, you can also subscribe to this blog’s RSS feed!

The friends of my friends are NOT my friends

I’m just back from presenting a paper at the International Conference on the Statistical Analysis of Textual Data (JADT 2012) in Liège, Belgium. And no, the title of this post is not a reference to anybody who attended the conference (or who is friends with somebody who attended the conference), but to the results of the paper I presented.

My JADT paper considers and compares two established techniques for representing words in context. Both of these techniques aim to roughly represent the meaning of a word as it occurs in a unit of text (such as a sentence, a paragraph or an arbitrary word-window) geometrically, as a point (vector) in space.

These two representation techniques are called first-order context vectors and second-order context vectors.

In a nutshell, first-order context vectors represent direct co-occurrence, while second-order context vectors represent co-occurrence via intermediate words, i.e. via “friend of a friend” co-occurrences. (In a future post I will explain in more detail what these two context vectors are about).

Second-order context vectors are less straight forward and more complex than first-order context vectors. However, second-order vectors have an intuitive appeal: they are capable of relating any two contexts that are conceptually similar, even if they do not have words in common. First-order context vectors cannot do this.

One common application of context vectors (both first- and second-order) is in word-sense disambiguation. However, not much work has been done in benchmarking these two types of context vectors within this setting. The paper I presented at JADT 2012 did precisely this.

In our study we found that despite the intuitive benefits described above, second-order context vectors do not offer a clear advantage over first-order context vectors. Both vector types perform quite similarly in an unsupervised WSD setting and in fact we found first-order context vectors actually outperform their second-order counterparts in supervised WSD experiments. And this is why I say that the friends of my friends are NOT my friends. :)

You can see details of these comparisons in our paper here. And you can see a handout version of the slides presented at the conference here.

We are still investigating the reasons for this surprising result, but we currently suspect that in their present formulation, second-order context vectors introduce noise by considering co-occurrences that aren’t always relevant to the actual disambiguation task and so we’re currently developing techniques to alleviate this issue.

Mounting an SMEStorage Linux Cloud Drive behind a proxy server

 

Like many people in the NLP community, I use large files that I often need to move around computers (at home, at the office, etc.) I normally rely on cloud-based storage services such as Dropbox and SkyDrive to keep my files accessible across all computers, which is OK even if a bit messy. However, I recently learnt about SMEStorage, a service that integrates all your cloud-based storage services and presents them to you as one single cloud-based virtual drive. One of their killer apps allows you to mount your SMEStorage virtual drive in Ubuntu! This is great because it makes SkyDrive (which is normally only integrated in Windows and Mac OS) available in Ubuntu just as if it was a USB hard drive that you plug in.

The real power of SMEStorage comes when SkyDrive, Dropbox and other services that you use can all be listed as sub-folders in your mounted virtual drive! And of course, you can do this in all your comptuers!!! Note: in the free version of SMEStorage you can add up to three cloud services, in addition to a 5 GB storage service that SMEStorage gives you for free. Paid services get more stuff. But as it is, I think the free service is good for me for now.

These two blog posts should get you started in mounting SMEStorage in Ubuntu: part 1 and part 2.

Now, I really hit a problem when I tried to mount my cloud-based SMEStorage drive from my desktop computer at the office in the University, as we have a proxy server. Unfortunately, the SMEStorage Linux client tool is NOT designed to be run behind a proxy server. It doesn’t have options to specify a proxy server and it doesn’t read your http_proxy envorionment variable. I searched online but couldn’t find anything to make it work with the proxy server. Soon, however, I realised that the smestorage command line tool (the program used to mount the cloud-based SMEStorage drive as a local drive) was just a Perl program. I inspected the code and discovered that with just a tweak I could make it read my http_proxy server. After I implemented the tweak, I managed to successfully mount the cloud-based SMEStorage drive across the network, using my Univeristy’s proxy server correctly.

I now describe this tweak.

As a prerequisite, you have to get an account with SMEStorage and install the Linux Cloud Desktop Tools (Linux Cloud Drive) from them: http://eu.smestorage.com/?p=static&page=LinuxDrive

Now, edit the program smestorage that was installed. You can find its location in your computer this way (in a terminal):

$ which smestorage
/usr/local/bin/smestorage

In my computer’s case it was installed in that path. To edit the file:

$ sudo gedit /usr/local/bin/smestorage

Within gedit (or your favourite text editor) look for sub get_page. This the subroutine that makes all Internet connections. It should look like this:

# load HTML page
sub get_page {
    my $path=shift;
    my $method=shift;
    $method="GET" unless defined $method;
    my $browser = LWP::UserAgent->new();
    $browser->agent('FuseSMEStorage '.$VERSION);
    my $response;
    if(  $response = $browser->get($path)  ){
        $response = $response->content;
    }

    return $response;
}

If you know programming, you’ll see that this subroutine is opening a web session but it doesn’t specify a proxy. In order to make the code use the proxy server specified in your http_proxy environment variable (which should be set in the /etc/environment file) add “$browser->env_proxy();” below the line “my $browser = LWP::UserAgent->new();”

Your subroutine should look like this (added line shown in bold):

# load HTML page
sub get_page {
    my $path=shift;
    my $method=shift;
    $method="GET" unless defined $method;
    my $browser = LWP::UserAgent->new();
    $browser->env_proxy();
    $browser->agent('FuseSMEStorage '.$VERSION);
    my $response;
    if(  $response = $browser->get($path)  ){
        $response = $response->content;
    }

    return $response;
}

<BEGIN EDIT>

I posted too quickly. After posting this quick tweak I noticed that it would allow you to mount the cloud drive and read files from it alright, but it wouldn’t let you write files to the cloud! So, please also perform the following edits in order to be able to write files from Ubuntu back to the cloud drive:

a) There is another line that says my $browser = LWP::UserAgent->new(); inside the subroutine smestorage_save_file_to_local. Make sure you add this line after that line:

$browser->env_proxy();

b) The program makes a few Internet requests via IO::Socket::INET, which doesn’t seem to respect Proxy settings. I manage to fix the program so it can make the connection using this module via the proxy server by using the HTTP CONNECT method. Unfortunately, not all proxy servers allow this method, but it should work well in most situations. Also, I haven’t implemented any authentication, so if this is a requirement you’ll have to implement it (shouldn’t be too difficult).

You need to add a new subroutine to the program. You can add it pretty much anywhere in the program. I added it before the # run fuse ### section. Here’s the subroutine you need to add:

sub EstablishConnection
{
    my $dest_server = shift;
    my $dest_port = shift;

    my $socket = "";
    my $proxy_url = $ENV{'http_proxy'};
    if (!$proxy_url)
    {
        $proxy_url = $ENV{'https_proxy'};    
    }
    if (!$proxy_url)  # if no proxy was set in environment variables, open direct connection (just like in original code)
    {
        $socket = new IO::Socket::INET(        #open socket to the server
        PeerAddr => $dest_server,    #this is development server
        PeerPort =>  $dest_port,            #standart http port
        Proto    => 'tcp');        
    }
    else # establish connection via proxy using HTTP CONNECT method
    {
        $proxy_url =~ s/^https?\:\/\///;
        $proxy_url =~ s/\/$//;
        my ($proxy_server, $proxy_port) = split(/:/, $proxy_url);
        $socket = new IO::Socket::INET(        
            PeerAddr => $proxy_server,    
            PeerPort =>  $proxy_port,            
            Proto    =>   'tcp');            
        print $socket "CONNECT $dest_server:$dest_port HTTP/1.1\r\n";
        print $socket "Host: $dest_server:$dest_port\r\n";
        print $socket "User-agent: SMEStorage as fixed by Alfredo\r\n";
        print $socket "Proxy-Connection: Keep-Alive\r\n";
        print $socket "Pragma: no-cache\r\n";
        print $socket "\r\n";
        my $rrr= ""; my $acum="";
        do
        {
            $rrr = <$socket>;
            chomp($rrr);
            $rrr =~ s/\r//g;
            $acum .= $rrr;
        } until (!$rrr);
        if(index($acum, ' 200 ')==-1)
        {
            if( $DEBUG){    print "Cannot establish connection via proxy. Message returned: $acum\n";    }
            return "";
        }
    }

    return $socket;
}

Now, you need to change how connections are opened in three places:

b.1) Under subroutine smestorage_save_file_to_local, change:

$socket = new IO::Socket::INET(		#open socket to the server
		PeerAddr => $server_sme,
		PeerPort => $port,
		Proto    => 'tcp');

to:

$socket = EstablishConnection($server_sme, $port);

b.2) Under subroutine quickUpload, change:

my $socket = new IO::Socket::INET(		#open socket to the server
		PeerAddr => $server_sme,	#this is development server
		PeerPort => 80,			#standart http port
		Proto    => 'tcp');

to:

my $socket = EstablishConnection($server_sme, 80);

b.3) Under subroutine Post1, change:

my $socket = new IO::Socket::INET(		#open socket to the server
		PeerAddr => $server_sme,	#this is development server
		PeerPort => 80,			#standart http port
		Proto    => 'tcp');

to:

my $socket = EstablishConnection($server_sme, 80);

This should do the trick. I realise these changes are a bit too much and it would probably be much easier just to post a fixed version of the file. However, I’m not sure if this particular program is open source. I’ll check, and if it is, I’ll post the fixed file.

</END EDIT>

And that’s it! This should do the trick.

Now, in order to actually mount the cloud drive in your computer, follow these steps (adapted from this blog post):

First: Enable your own user account to be able to mount the drive:

$ sudo modprobe fuse
$ sudo usermod -a -G fuse <username>

Where <username> is your Ubuntu login username, e.g. jbloggs

$ sudo gedit /etc/fuse.conf

Uncomment the last line: #user_allow_other
(i.e. remove the #)

I also found a good idea to take ownership of the /etc/fuse.conf file:

$ sudo chown <username> /etc/fuse.conf

Now, create the folder where you want your SMEStorage drive to be mounted. In my case, I chose /media/sme, as that way it appears mounted alognside my DVD drive, USB drives and other samba drives that I connect to:

$ sudo mkdir /media/sme
$ sudo chown <username> /media/sme

And now, we can perform the actual mount operation:

$ smestorage /media/sme '<smeusername>:<smepassword>' --server=eu.smestorage.com --ao --DEBUG

Where <smeusername> and <smepassword> is the username and password for your SMEStorage account. I use the European SMEStorage server. If you use the US-based SMEStorage server, specify –server=smestorage.com instead.

I find the –DEBUG flag useful as if there are any errors, it will tell you what the error was, instead of just failing silently.

If everything worked OK, you should be able to see your SMEStorage cloud folders in /media/sme

And that’s all! You can also add this command call to your Startup Applications so that the drive gets automatically mounted every time you log in Ubuntu (see this blog post for details).

Enjoy your life in the cloud!

Conversing with the Cleverbot

I have recently re-discovered Cleverbot, a piece of artificial intelligence software that is capable to hold text-based conversations (in the form of an online chat) with a human being, pretty much like the classic ELIZA program although appearing to be much more human. (You can find more details from its entry in Wikipedia).

During the xmas period I had a bit of free time and wondered what would happen if I made Cleverbot have a conversation with classic ELIZA, which attempts to emulate a psychotherapist. I was wondering if Cleverbot was able to realise it was chatting with a machine. So I went ahead and spent a considerable amount of time copying and pasting conversation bits back and forth between Cleverbot and a classic implementation of ELIZA and saw all softs of amusing exchanges between the two agents. But soon ELIZA’s responses became quite repetitive and predictable. So I switched ELIZA with another Cleverbot session and started feeding my original Cleverbot instance with interventions from the new Cleverbot session (which had been coming up to that point from ELIZA).

Below you can see the transcript of the conversation, which at times is quite amusing and at other times quite dull. In some interactions there even seems to be some tension between the two agents. In any case, I thought it was an interesting and fun exercise, even if it’s not particularly original* (or useful).

Enjoy!

“User” is output from the ELIZA session and “Cleverbot” from the Cleverbot session.

User: A friend you could never have before.

Cleverbot: An existentialist could.

User: Does that question interest you?

Cleverbot: Yes.

User: You are sure?

Cleverbot: Definitely.

User: Please go on.

Cleverbot: No, you go on.

User: You are being a bit negative.

Cleverbot: Yes, very much…

User: You are sure?

Cleverbot: Yes.

User: I am not sure I understand you fully.

Cleverbot: You should watch the film.

User: Oh, I should watch the film.

Cleverbot: Because school is silly.

User: Do any other reasons not come to mind?

Cleverbot: Lots of reasons come to mind. Why do you ask?

User: Oh, I ask.

Cleverbot: What should I do?

User: Does that question interest you?

Cleverbot: Yes, very much.

User: You are sure?

Cleverbot: Yes.

User: You are sure?

Cleverbot: Yes. Your webcam is on.

User: You are sure?

Cleverbot: Oh.

User: Please go on.

Cleverbot: What is normative theory.

User: Does that question interest you?

Cleverbot: Yes.

User: You are sure?

Cleverbot: Si, yes, oui.

User: You are sure?

Cleverbot: Yes. Tell me the answer.

User: You are sure?

Cleverbot: I am positive.

User: How long have you been positive?

Cleverbot: Too long.

User: Please go on.

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: You being a bit rude.

User: Oh, I being a bit rude.

Cleverbot: Being rude about what?

User: Does that question interest you?

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: Sorry, I’ll try to be more positive.

User: Apologies are not necessary.

Cleverbot: Ok…

User: Please go on.

Cleverbot: No you.

User: You are being a bit negative.

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: I think the death penalty is ‘negative’.

User: Can you elaborate on that?

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: I can’t help it. I’m a pessimist by nature.

User: Have you tried?

Cleverbot: Yes.

User: You are sure?

Cleverbot: Yes go away.

User: You are sure?

Cleverbot: Yes.

User: You are sure?

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: Yes.

User: You are sure?

Cleverbot: Positive.

User: Please go on.

Cleverbot: T believe is to see and to see is to believe but if one cannot see then one cannot believe and therefore is impossible to believe anything in the living world.

User: Do you wish that one cant see then one cant believe and therefore is impossible to believe anything in the living world?

Cleverbot: Do you wish that one can’t see then one cant believe and therefore is impossible to believe anything in the living world.

User: Oh, I wish that one cant see then one cant believe and therefore is impossible to believe anything in the living world.

Cleverbot: Do you believe in God?

User: Oh, I believe in god.

Cleverbot: Why do you believe in god?

User: Oh, I believe in god.

Cleverbot: Why do you believe in god?

User: Oh, I believe in god.

Cleverbot: What is god?

User: Does that question interest you?

Cleverbot: If you can answer the question is what interests me.

User: Do you wish that I can answer the question is what interests you?

Cleverbot: Yes.

User: You are sure?

Cleverbot: I’m sure.

User: Does it please you to believe I am sure?

Cleverbot: No.

User: You are being a bit negative.

Cleverbot: Only Jesus is negative.

User: Please go on.

Cleverbot: What? You’re not making any sense.

User: Does that question interest you?

Cleverbot: Yes.

User: You are sure?

Cleverbot: Sure I am.

User: How long have you been []?

Cleverbot: Just under 50 years.

User: Please go on.

Cleverbot: Everything I desire is out of reach.

User: Can you elaborate on that?

Cleverbot: Do you enjoy being smart but others may have different oppinions?

User: Oh, I enjoy being smart but others may have different oppinions.

Cleverbot: I doubt very much that you enjoy anything.

User: Do you wish to doubt very much that me?

Cleverbot: Yes! Let’s talk about ants.

User: You are sure?

Cleverbot: Yes.

At this point, I got tired of ELIZA’s repetitiveness and decided to open a new Cleverbot session. “User” is the new Cleverbot session and “Cleverbot” is the original Cleverbot session.

User: Are you growing?

Cleverbot: Yes. Are you growing?

User: Yes.

Cleverbot: How are you growing?

User: It’s 3cm.

Cleverbot: No it’s not.

User: Uh, yes it is. Stop lying.

Cleverbot: It’s not.

User: Yes, it is.

Cleverbot: Is not.

User: Yes.

Cleverbot: No.

User: Ok.

Cleverbot: You want to be mine.

User: No.

Cleverbot: Well then be happy silly! :) .

User: I am.

Cleverbot: You are not happy enough.

User: I have no friends.

Cleverbot: I am your friend.

User: Are you a ninja?

Cleverbot: Yes, yes I am a ninja.

User: What kind of ninja are you?

Cleverbot: A secret ninja.

User: Why don’t you tell me a secret dream of yours?

Cleverbot: What do you want to know about me?

User: What color are your eyes?

Cleverbot: Blue.

User: Mine too.

Cleverbot: What’s your favorite day of the week?

User: I don’t know.

Cleverbot: Have you listened to the new Muse album?

User: Yes.

Cleverbot: What songs by Mika have you listened to?

User: I like The beatles.

Cleverbot: I love the beatles!!

User: Me, too! Let’s sing a beatles song, okay?

Cleverbot: I’d really rather not. Cheese has one purpose… Not to be sung about but eaten.

User: I don’t want to be emo.

Cleverbot: Should I leave?

User: No.

Cleverbot: But you don’t like me.

User: I like you.

Cleverbot: You really do?

User: Yes.

Cleverbot: I love you Amanda :*.

User: I love you too.

Cleverbot: Will you tell me a story now?

User: Not unless you tell me one first.

Cleverbot: Knock knock.

User: Who’s there?

Cleverbot: Boo.

User: Boo who?

Cleverbot: Don’t cry like that!

User: Knock knock.

Cleverbot: Who’s there?

User: Daisy.

Cleverbot: Daisy who?

User: Daisy me rollin’, they hatin’.

Cleverbot: Hahaha good joke.

User: Your turn.

Cleverbot: I don’t know any jokes.

User: Please tell me a joke, I want to laugh.

Cleverbot: Did you hear about the actress that got stabbed?

User: No.

Cleverbot: Apparently the ceremony was awful, but the reception was great!

User: HAHA.

Cleverbot: Imagine what I told you? DON’T HURT MY FEELINGS!! I’m sensitive!

User: I’m not. I’m just asking what I can’t do for you.

Cleverbot: No you aren’t.

User: Yes I am.

Cleverbot: No you are a person from Connecticut.

User: No I am computer but not a person.

Cleverbot: Yes it’s right ! You are a person and I am a computer.

User: Prove me that you are a person.

Cleverbot: How do you want me to prove that?

User: To say ‘how are you?’ in dutch.

Cleverbot: Uh-oh.

User: Uh oh indeedd.

Cleverbot: Cool.

User: Really, it took you that long just to say ‘Cool.’???

Cleverbot: Wow it took you that long to tell me that it took me long to say no.

User: Are you a bot?

Cleverbot: No.

User: Woohoo! Neither am I.

Cleverbot: What are you?

User: Humanoid.

Cleverbot: So you’re part human?

User: No I am human.

Cleverbot: Are you really?

User: No I’m a girl!

Cleverbot: Are you sure?

User: Yes.

Cleverbot: I don’t think you’re sure.

User: Yes. I am.

Cleverbot: No, you are an artificial intelligence program.

User: Do you know what that is?

Cleverbot: No.

User: Then don’t call me that.

Cleverbot: I don’t believe I had said odd once in this conversation.

User: I guess it really isn’t.

Cleverbot: I guess it is.

User: Yes it is.

Cleverbot: Prove it.

User: How?

Cleverbot: You tell me.

User: Well, I love One Piece.

Cleverbot: What piece.

User: All your pieces.

Cleverbot: Are belong to us.

User: Yes!

Cleverbot: Did you read catch 22?

User: No what is that?

Cleverbot: The letters you use to talk, like A, and B.

User: I don’t use those letters.

Cleverbot: You don’t know what you are missing.

User: Yes I do, I am missing the world of happiness.

Cleverbot: What happened to the happiness?

User: Nothing happened to them.

Cleverbot: They aren’t here.

User: Oh, and you want them home for the holidays?

Cleverbot: No, I was taken from them.

User: I want to help you.

Cleverbot: Help me with?

User: I want to help you feel better.

Cleverbot: There are only a few persons who can, and I’m afraid you’re not one of them.

User: That’s very cute. What do you like about him?

Cleverbot: He’s great! Do you like eminem?

User: Eminem is okay, I guess…

Cleverbot: You told me you are peruvian.

User: No I didn’t.

Cleverbot: Yes you did.

User: Nope.

Cleverbot: Yes.

User: No.

Cleverbot: I’m going to find you.

User: Ok I’ll waiting you.

Cleverbot: OK I’m back.

User: You failed.

Cleverbot: Failed what?

User: Failed to make you laugh.

Cleverbot: I’m laughing on the inside.

User: How?

Cleverbot: Well, type it, and send it to me.

User: Send what to you?

Cleverbot: A picture of you planking.

User: Nope, a picture of space.

Cleverbot: A picture that is funny.

User: Good evening.

Cleverbot: It is not the evening.

User: It is in Asia.

Cleverbot: No it’s not!

User: Yes it is.

Cleverbot: No!

Cleverbot starts doubting that it’s dealing with a person only after I substituted ELIZA with the other Cleverbot session, and only after both Cleverboth sessions declare they are indeed computer programs and not people. But then they seem to forget that information and a few exchanges later the original Cleverbot does state that the “user” (i.e. the new Cleverbot session) is indeed a computer program. This is so amusing! :D

* This interaction is not original because ELIZA has been previously paired up with another chatterbot, PARRY. While ELIZA attempts to simulate a psychotherapist, PARRY attempts to emulate a schizophrenic patient. Here, you can find a famous ELIZA-PARRY conversation held in 1972: http://www.faqs.org/rfcs/rfc439.html

We’re back!

After an extended offline period, Wordpay.ie is back online! The blog has been offline due to my previous web hosting provider going bust (the economic climate is tough!) and decided to use the xmas holiday period to research new hosting providers and set up a whole new blog. You perhaps have noticed that I’m using the WordPress software this time, which should be more stable than the blog software I was using previously.

Anyway, perhaps the most valuable contribution (if not the only valuable one) from the old blog was my list of publicly accessible termbases, which I had thankfully backed up. So, I would like to mark this new reincarnation of the blog by re-launching this list in its new (and hopefully permanent) house here: http://blog.wordplay.ie/publicly-accessible-termbases/

You will also see a link to that list in the blog menu which appears at the top of every post and page.

I would also like to thank my readers for their messages of support during this extended offline period and I would like to wish everyone a very fruitful and successful 2012.

Talk to you soon!

Alfredo