Proxying Pingbacks and Trackbacks

Introduction

Trackbacks and Pingbacks are both implementations of the linkback methodology for providing links between digital resources and things which hyperlink to those digital resources. This paradigm is most common in blogs, when trackbacks are used to notify authors when another blogger has linked to their work. The trackback appears as a comment in the original blog, with a link back to the post which links there. So trackbacks provide a bidirectional link between two blog posts.

There is nothing, though, that limits this protocol to blog posts. It would be desirable, within the life sciences, to make research objects trackback-enabled. By research objects, I mean not just publications (though these are the most obvious primary output of research), but entries in databases (such as GenBank and UniProt) and even entire blobs of data (e.g. a microarray experiment stored in ArrayExpress or GEO).

Pingbacks

The pingback protocol works as follows: consider a blog a site A, linking to a resource at site B. The blog engine for site A parses the new post, discovering all the hyperlinks, and discovers a link to a resource at B. The blog engine then does two things: first it can check the HTTP header of the resource at B for an X-pingback header; and, second, if it is HTML, it can parse the resource looking for a <link> element. The former has the advantage that it does not require parsing of the resource; in fact, A does not even need to download the resource. In practice, it has the disadvantage that it will generally require alternation of the web server configuration to achieve.

This then provides the blog engine at site A with a location to which it can then send an XML-RPC request. The XML-RPC request is relatively simple, and can be read; in addition, there is no requirement that the location be on the same domain, although some clients may require this.

Implementation

Although this is formally all that need to be done to pingback-enable a resource, in practice, there are two other requirements in practice. For each XML-RPC pingback, site B gets two URLs – the URL of the originator (on site A) and the URL of the link (on site B).  To be useful, site B has to do something with this information, such as display a comment.  The second requirement stems from the absence of security within the protocol; like email, pingback and trackback servers tend to attract a significant amount of spam, often far more than genuine links, so site B must have some mechanism for moderation and automated spam detection.

From this, there are a number of different methods by which any research object could be linkback enabled, which have different implications. First, the resource (site B, following the terminology earlier) could just implement the relevant XML-RPC server.  A lighter-weight alternative would be to use a pingback-proxy such as that at http://software.hixie.ch/utilities/cgi/pingback-proxy/. This avoids the necessity for implementing the XML-RPC server, translating the request into something easier, such as a standard HTTP GET call.  However, this still leaves the site B with the task of displaying the links and dealing with spam.

A second technique would be to use a proxy URL.  In this case, instead of using, for example, http://www.uniprot.org/uniprot/OPSD_HUMAN,  a proxy URL such as http://www.uniprot.org.pingback.knowledgeblog.org/uniprot/OPSD_HUMAN could be used. This would have the significant disadvantage that the author at site A would be required to change their URL. This problem could be overcome with minimal support from http://uniprot.org with the addition of either a X-Pingback header, or <link> element.  Alternatively, it could be overcome for specific sites through use of a plugin that modified the pingback client at site A, or rewrote URLs to include this modified form. Clearly this is not a general solution as it will only work for specific clients with plugins installed.

Finally, this would still leave the problem of presenting the information and dealing with spam. One solution here would be to use a tool such as WordPress, which already presents this form of information and has a spam detection engine.  The proxy URL above could be resolved by WordPress, autogenerating a post, with no content except a link to the original resource on site B. To be useful, site B would itself need to display a link to this proxy URL, so that users of the resources on site B would be able to be able to navigate to the proxy.

Summary

In short, there are a number of techniques that can enable pingback to research objects, which require different degrees of buy-in from the different stakeholders. The relative simplicity of the pingback protocol makes implementation straight-forward in theory, but doing so usefully requires more work, dealing with display of the results and avoiding spam.