* How To Add Keywords To Your Blog With MT-KeywordExtractor

Automatically tag your Movable Type entries with keywords for use with del.icio.us or otherwise.

I have been experimenting with some Movable Type plugins. Here’s how I added keywords to my blog entries with MT-KeywordExtractor.

MT-KeywordExtractor (http://www.sixapart.com/pronet/plugins/plugin/keyword_extract.html) is a Movable Type plugin that is designed to take the contents of the “Entry Body” field of a weblog entry, send it to Yahoo’s Term Extraction API (http://developer.yahoo.com/search/content/V1/termExtraction.html), and then populate the weblog entry’s “Keyword” field with keywords. MT-KeywordExtractor adds keywords to your entries when you initially publish or rebuild an entry. Are the keywords perfect? No. But other APIs may be added in the future, and Yahoo may improve its algorithm. But using Yahoo’s Term Extraction API is certainly a lot easier than manually going through each of your entries to add keywords.

If you’re planning to add keywords to your weblog entries (e.g. for Technorati (http://www.technorati.com/), del.icio.us (http://del.icio.us/), or other folksonomy-oriented or taxonomy-oriented applications) but don’t want to take the time to individually tag teach entry, then MT-KeywordExtractor may be for you.

1. Use the most recent version of the plugin. For some reason, Six Apart’s ProNet’s plugin directory (http://www.sixapart.com/pronet/plugins/) is not always current. It is also not exactly user-friendly: it is difficult to search and browse; lacks useful features like documentation, sorting, and user ratings; and generally feels like a neglected piece of Six Apart’s Movable Type support resources. If you wait long enough, then the most popular plugins will likely be rolled into a future release of Movable Type, but if you don’t want to wait, then dive in to the Movable Type plugin directory and take it for what it’s worth. In this case, Six Apart lists v0.5 as the most recent version, the plugin author lists v0.7 as the most recent version. Go with v0.7.
You can get the most recent version of MT-KeywordExtractor from the MT-KeywordExtractor developer’s website (http://blog.socklabs.com/keywordextractor/).

2. Read the instructions, including the comments to the instructions.

3. Test rigorously, read the error messages, and debug rigorously. Quoting NASA Flight Director Gene Kranz from the movie “Apollo 13,” “Let’s work the problem people. Let’s not make things worse by guessing.” If you installed MT-KeywordExtractor and it is not appearing in your plugins page or it is not appearing to do anything, then there are several likely problems:

(1) you installed MT-KeywordExtractor in the wrong directory,
(2) you need to make keywordextractor.pl executable, and/or
(3) you need to install a correct version (notice that I did not say “the correct version”) of the XML::Simple Perl module (http://blog.socklabs.com/keywordextractor/2006/04/02/release_06/#comments).

3.1. Install MT-KeywordExtractor in the correct directory. MT-KeywordExtractor should be installed (via FTP, SCP, SFTP, or however you get files onto your web server) in the Movable Type “plugins” directory. There are two pieces, a lib folder (containing the KeywordExtractor.pm file) and a keywordextracor.pl file. Your installation should look like this:

.../plugins/KeywordExtractor/keywordextractor.pl
.../plugins/KeywordExtractor/lib/WWW/Yahoo/KeywordExtractor.pm

3.2. Make keywordextractor.pl executable. If you have command-line (shell) access, type “chmod 755 keywordextractor.pl” to make the Perl file executable (to owner, group, and world). The KeywordExtractor.pm file does not need to be executable, but go ahead and type “chmod 644 KeywordExtractor.pm” to make sure that its file permissions are also correct.

3.3. Install a correct version (notice that I did not say “the correct version”) of the XML::Simple Perl module (and possibly others). I had this problem and found a hint to the solution in the documentation (http://blog.socklabs.com/keywordextractor/2006/04/02/release_06/#comments).

CPAN (Comprehensive Perl Archive Network) (http://www.cpan.org/) is both a directory/database of user-created Perl modules and a program (or series of programs) that provides a semi-automated interface for installing Perl modules from the CPAN database. I have the cpan program installed on my server and access it primarily from the command line, sometimes from the Webmin interface. (Webmin (http://www.webmin.com/) is a web-based interface for system administration for Unix and Unix-like operating systems.)

Type cpan and then try to install XML::Simple on your system by entering the following command at the cpan prompt:

cpan> install XML::Simple

Read the error messages. Did XML::Simple install, compile, and test successfully? If not, then you may need to install different versions of other modules, including XML::Parser and/or Storable. If you get an error message from cpan that includes text like this:

Can't locate auto/Storable/dclone.al

then it probably means your are missing – or, possibly, you’re missing the wrong version of – the Storable module. This might work for you:

cpan> install XML::Simple
cpan> install XML::Parser
cpan> install Storable

And this is what I love/hate about Perl and the Perl modules available from CPAN and elsewhere: you never know what magic combination of Perl versions and Perl module versions you are going to need. CPAN will try, by default, to install the most current version of a particular Perl module, which is not always what you want. The above did not work for me.

I will skip to the punch line. CPAN installed Storable version 2.14 by default. I had to download and manually install an older version, Storable version 1.0.14 (http://www.cpan.org/modules/by-module/Storable/), in order to get MT-KeywordExtractor working. So here is one combination of versions of Perl and various Perl modules that worked for me on my server’s operating system, FreeBSD 4.7, with Movable Type 3.2:

# Package                        Version
#  perl                           5.8.6
#  XML::Simple                    2.14
#  Storable                       1.014
#  XML::Parser                    2.34
#  XML::SAX                       0.12
#  XML::NamespaceSupport          1.08
#  XML::SAX::PurePerl             0.90
#  XML::LibXML::SAX::Parser       1.50
#  XML::LibXML::SAX               1.00

Needless to say, your mileage may vary (YMMV). Don’t guess. Do test and debug rigorously, read the error messages, and read the documentation.

4. Understand how the MT-KeywordExtractor plugin works (and hack it to do what you want).

  • If you create a new entry, then MT-KeywordExtractor will automatically add keywords to that entry when you save it.
  • If you edit an existing entry, then MT-KeywordExtractor will automatically add keywords to that entry when you save it.
  • If you edit an existing entry that has existing keywords, MT-KeywordExtractor will automatically replace the existing keywords with new keywords.
  • If you rebuild your entire weblog, then MT-KeywordExtractor will not add keywords to rebuilt entries (unless their content has changed). But here’s a hack to get around this feature/bug. Simply do a search and replace on all of your entries, searching for something like “<p>” (lower case) and replacing it with “<P>” (upper case) (or any other harmless search/replace combination, such as replacing “2006” with “2006”), which results in the found entries being “touched” as having been changed. After the search/replace process completes, all of the found entries will have new keywords added by MT-KeywordExtractor.

MT-KeywordExtractor is a very cool plugin for Movable Type, one that Six Apart should definitely incorporate into the next released version of Movable Type. The keywords for my entries are not yet displayed on my weblog’s public pages – that’s the next project. But trust me, they are there. FYI, the keywords that MT-KeywordExtractor produced for this entry are:

keywords: added keywords, sixapart, movable type, keyword field, pronet, plugins, api, weblog, http, html, add, search content, apis, algorithm, adds, re planning, blog

Hope it helps!

Technorati Tags:


“><$MTTag$>

4 Replies to “* How To Add Keywords To Your Blog With MT-KeywordExtractor”

  1. Erik;
    I’ve bombed out trying to get MT-KeywordExtractor or work. I’m on a Windows server, so maybe that’s part of it. I think I’ve installed all the required Perl modules, but I don’t know my way around PPM well enough (and haven’t’ found a good “brain dump” on it yet).

    I’m a programmer, lots of ASP, VB, MySQL, MS-SQL, etc. experience, but I’m new to Perl and MT. I can’t seem to find the correct place to look to see error messages related to this. MT-KeywordsExtractor shows up in the Plug-In list, but never creates any keywords. There’s nothing in the MT Activity Log. Any ideas or suggestions?

  2. A couple of suggestions.

    Perl. Perl is fussy. You need the right version of Perl
    and the right version of the CPAN modules. I don’t run MT on Windows, but I do run a Perl-powered web server (my intranet) on Windows. I use ActivePerl v5.8.4 from ActiveState (http://www.activestate.com/Products/ActivePerl/).

    MT-KeywordExtractor. Check that MT-KeywordExtractor is installed and activated as a system-level plugin and as a blog-specific plugin. When you click the “Show Setting” link, there are four options:

    1. Enabled [ON]
    1. Set Keywords [ON]
    2. Set Technorati Tags [OFF]
    3. Set Technorati Tags (more) [OFF]

    Make sure that first two are enabled (checked). Selecting “Set Technorati Tags” will append Technorati links to the end of the EntryBody field. Selecting “Set Technorati Tags (more)” will append Technorati links to the EntryMore (extended entry) field. I didn’t want MT-KeywordExtractor to add Technorati tags, in part because if you add them to the body of your post, then you’ll get multiple sets of tags each time you modify the entry. Also, I wanted more control over the location and format of the Technorati tags, and MT-KeywordExtractor doesn’t provide this, but Tagwire does this nicely.

    Remove Conflicting Plugins. It is possible that MT-KeywordExtractor may conflict with other MT plugins. Try removing all plugins (except those provided by Six Apart as part of the MT distribution) except for MT-KeywordExtractor.

    Use Apache For Debugging. Point your web browser at the keywordextractor.pl file on your server, and then take a look at the Apache error logs to see what it says. There should be some useful info in there. Common problems include path problems (Perl programs not knowing where the Perl program is located) and permission problems (files not being executable). My Apache error log lives at C:Program FilesApache GroupApachelogs.

    Go Linux-like with Cygwin. A final option is to run Apache, Perl, and MT under the Cygwin environment (http://www.cygwin.com/) instead of natively under Windows. This may be more work that it is worth to get a Linux-like environment under Windows, but it is an option.

    Hope it helps!

  3. Erik;
    That did it… I didn’t have the correct checkboxes check in BOTH the system and blog plug-in levels. Once I did that, it started working.
    Thanks so much for you help!

Leave a Reply

Your email address will not be published. Required fields are marked *