Behavior: Links in Reblog cannot be ‘archived’ or ‘published’
Problem: Reblog wasn’t sanitizing the ‘link’ field in the MySQL table properly, allowing injection of unescaped single quotes (’) from RSS feeds into the database. The query would fail and thus would never process properly.
File: Controller.class.php in /refeed/library/RF
Version: $Revision: 1.40 (reBlog version 2.0b2)
Status: Bug reported with code fix implemented below. I didn’t see a solution posted in the official Reblog help forum. I have posted about the issue there and linked here.
Example database call (http part is stripped so WP doesn’t make a link, error emphasized)
mysql> select link from items where link like ‘%\’%';
| businessweek.com/technology/content/jun2006/tc20060615_290127.htm?
chan=technology_technology+index+page_more+of+today‘s+top+stories |
1 row in set (1.81 sec)
Note: this link isn’t being sent properly by del.icio.us either. It should have been converted to a safe URL by them and then transported through the RSS feed. This leaves the problem to the aggregator to sanitize before entering the database — which should always be done anyway.
This URL bug ran through three places: businessweek RSS feed (source), delicious popular (tagged) and reblog (aggregator, and into the database).
The proper code to sanitize query strings from URLs like this:
$the_link =
“?chan=technology_technology+index+page_more+of+today‘s+top+stories”;
print “before: $the_link
<hr />After: “
.
urlencode($the_link);
If this step had been taken before transferring the URL to the source RSS feed (businessweek) or in the tagging (delicious popular) then it wouldn’t be needed in the RSS Aggregator (Reblog). Since that’s not the case, I decided to modify the Reblog code to deal with unsanitized URLs before they reach the database and render future database writes in Reblog disfunctional. Because Reblog can use multiple input sources it makes more sense to alter the point just before the save routine rather than alter the code of each input source (like magepieRSS).
Reblog
1. locate Controller.class.php in /refeed/library/RF
2. line to patch is in saveItem() function starting around line 1679 as follows:
Locate this:
function saveItem(&$item)
{
$dbhw =& $this->getWriteHandle();
$data = $item->columnNamesValues();
And CHANGE to this:
function saveItem
(&
$item)
{
$dbhw =&
$this->
getWriteHandle();
$data =
$item->
columnNamesValues();
// sanitize all links being saved with single quote
$data[‘link’] = str_replace(“‘”,“%27″,$data[‘link’]);
Notice this patch string replaces any instance of the single quote character in the link to the %27 sanitized version. A more comprehensive fix would be to urlencode the query string portion of the URL shown earlier in this post. My concern was fixing that specific character which kept causing problems.
You can test this issue in your version of Reblog with the broken del.icio.us feed cached here (not linked):
php-scripts.com/examples/popular-delicious_06172006.xml