MySpace event parser
The idea
I was building a site for my friend Clara, a wonderful singer-songwriter I went to school with. She wanted to keep track of the gigs she had coming up, but without having to update her site every time she had changes. She had a MySpace page and was happy to keep that up to date, however.
The excellent Make Data Make Sense provided a basic MySpace event parser. But the only information it provided was where, when and who - and some of that information was being duplicated. The problem was to deliver the description of the event provided by the "all shows" page on a MySpace profile.
More inspiration
The Make Data Make Sense page very handy and nice and simple, but because it's all done behind the scenes it wasn't much help to a very amateur coder like myself. The PCRE (Perl compatible regular expressions) used to tease out the appropriate data are provided, but not much else.
A little digging revealed the originator of some PHP code, and full source for it, here - http://www.kainjow.com/?p=11 - with the added bonus it was open to tweak as required. The code produces an RSS feed for events; this can be parsed using a standard PHP RSS parser code to present it how you want.
I used this for a while, but as Clara added more and more gigs, more and more problems came up. Line spaces broke it, as did different characters, and it got very frustrating.
So I re-wrote it all from scratch. It's probably a bit bloated; I make no claims about its safety on your server, or processor load, or anything like that. If someone who knows what they're doing would like to try things out and clean things up, I'd be very grateful. It's presented here for information only, and I don't even pretend to completely understand what I'm doing. OK?
If you don't know what you're doing either, I'd recommend you visit Amazon and buy some excellent PHP books - so you know, that's an affiliate link, which helps me cover my hosting costs.
The code
The code itself has a pretty basic function. It opens the MySpace events page, puts each of the attributes it wants into different arrays, and then churns out an RSS item for each step through the code. Make sense? Good.
The same principle can be applied to any page with repeating blocks of information on it fairly simply. Let's look at the first bit of code I ended up making:
<?php
// Function readPage takes a URL, opens it, finds the start and end of the events section and puts it all into a variable.
function readPage ($url)
{
$html = file_get_contents($url);
$arrayFull = explode('<table width="615" border="0" cellspacing="0" cellpadding="0" align="center">', $html);
// arrayFull is the page split into chunks into an array
array_shift($arrayFull); // arrayFull has now had the bits before the events table removed
$i = count($arrayFull); // $i now corresponds to the key of the final element +1
$j = $i-1; // set the number for the final bit
$final = $arrayFull["$j"]; // extract the final event - will deal with this in a minute
array_splice($arrayFull, "$j"); // remove the final event from arrayFull and discard
// now split "final" and remove all the stuff from the end
$clean = explode('<hr color="#6699CC" width="100%" align="center" size="2" noshade />', $final);
// remove the *last* element from clean and discard:
array_splice($clean, 1);
// finally we stitch it all back into one array which is *just* the clean events table ready to parse
$merged = array_merge($arrayFull, $clean);
The code above opens the page up ready to read it, and then starts playing with arrays.
First up we split the page according to the first line of the events table code - removing that code as we go, using explode() - and discard everything before that point using array_shift.
Next we find out how many items there are in the array. Remember that array keys start at 0, so we'll also need to shift this number by 1 in order to refer to the right bits of the array.
After this we need to clean up the end of the code - we want to remove everything after the event table. To do this, we take the final chunk from the array, split it in two just after the table, and then discard everything but the first item in the array.
Finally in this section we stitch it back into one big array.
// set the iteration variables so we can count through
$iter = 0;
foreach ($merged as $event) {
// make array of times
if (eregi('<input type="hidden" name="calEvtDateTime" value="(.*)">', $event, $match));
{
array_shift($match);
foreach ($match as $time)
{
$position_event = strpos($time, '"');
$time_final = substr_replace($time, '', $position_event);
$month = substr($time_final, 0, 2);
$day = substr($time_final, 3, 2);
$year = substr($time_final, 6, 4);
$hour = substr($time_final, 11, 2);
$minute = substr($time_final, 14,2);
$unix = mktime($hour,$minute,'00',$month,$day,$year);
// $rfc = date(r, $unix);
$time_array[$iter] = $unix;
}
}
// make array of cities
if (eregi('<input type="hidden" name="calEvtCity" value="(.*)">', $event, $match))
{
array_shift($match);
foreach ($match as $city)
{
$position = strpos($city, '"');
$city_final = htmlspecialchars(substr_replace($city, '', $position));
$city_array[$iter] = $city_final;
}
}
// make array of location/venue
if (eregi('<input type="hidden" name="calEvtLocation" value="(.*)">', $event, $match));
{
array_shift($match);
foreach ($match as $venue)
{
$position_venue = strpos($venue, '"');
$venue_final = htmlspecialchars(substr_replace($venue, '', $position_venue));
$venue_array[$iter] = $venue_final;
}
}
// make array of street
if (eregi('<input type="hidden" name="calEvtStreet" value="(.*)">', $event, $match));
{
array_shift($match);
foreach ($match as $street)
{
$position_street = strpos($street, '"');
$street_final = htmlspecialchars(substr_replace($street, '', $position_street));
$street_array[$iter] = $street_final;
}
}
// make array of description
if (eregi('<br /><br />(.*)</td>', $event, $match));
{
array_shift($match);
foreach ($match as $description)
{
$description_array[$iter] = htmlspecialchars(preg_replace('([\n])', '<br />', $description));
}
}
// now add one to the count
$iter++;
}
Here the we take the big array and, for each event within it, look for certain bits of data marked by specific bits of code. In a similar way to before, we shift out (I just put a very silly typo in there) the first value, which is just code, and remove everything after the value we want.
We then create an array for each of the elements we want. Event 1's details will be in position 0 of every array, event 2's in position 1, and so on. After going through each in turn - we just move on if there's nothing down for town, venue or whatever - we increment the count by one, go back to the start, and put everything in the next spot of the arrays.
Finally, we need to create the RSS:
// now take each value in turn and present it:
$event_number = count($merged); // set event number - remember first value is 0!
$f = 0; // will use $f to count the number of times through we are.
// start with the RSS headers
print '<?xml version="1.0" encoding="utf-8"?>';
print '<rss version="0.91"><channel><title>Clara Kousah shows</title><description>Clara Kousah events</description><link>http://www.clarakousah.com/gigs</link>';
while ($f < $event_number)
{
print "<item>\n";
print "<title>" . $venue_array[$f] . " - " . $street_array[$f] . "," . $city_array[$f] . "</title>\n";
print "<description>" . $description_array[$f] . "</description>\n";
print "<pubDate>" . $time_array[$f] . "</pubDate>\n";
print "<link>" . htmlspecialchars($url) . "</link>\n";
print "</item>\n";
$f++;
}
print '</channel></rss>';
}
readPage ('http://collect.myspace.com/index.cfm?fuseaction=bandprofile.listAllShows&friendid=41827648&n=Clara+Kousah');
?>
Here we go through, printing the value of each position of the arrays into an RSS format. We've counted how many events there are, and as long as $f isn't bigger than that, we keep going - putting the value of the array in position $f in each time. Each time we finish an event, we put $f up one, and check that we're still under the total number of events
And that's about it. Paste it all together, and you have the final source.
Remember to replace the address of the events page with the one you want. When it's up and running you'll be outputting some pretty simple RSS, which you can later parse using some more PHP.
There might be some issues with your hosting not allowing you to access pages which produce outputs like this - I don't even pretend to understand all that stuff, so if you have access to another host and you're having trouble getting it to work, try that.
The RSS parser you use must use the UTF-8 character encoding, otherwise you'll end up with some odd character behaviour going on.
Remaining issues
It still needs a bit of testing to iron out all the problems - I'm waiting to see what happens if Clara's not busy and has no gigs coming up, and I'll update this page if I need to should problems arise. In the mean time, please let me know if you have any comments or ideas...