RSS Feed with PHP, Part 1

RSS as an acronym has stood for various things, but the current standard is: Really Simple Syndication. This is the most recent variation of this very common and very useful standard. Back when the Internet was young(er), a piece of software called Pointcast pushed data to a screensaver application on a user’s computer, providing news updates of all kinds. Eventually browser developers such as Netscape and Microsoft worked to create something similar to this immensely popular service. Netscape produced the most widely accepted variant and that eventually was released into the development wilds of the Internet, to eventually become the RSS of today.

RSS distributes recently updated information to many receivers, much like a broadcast system. Once you have a substantial number of users, the RSS feed acts like a beacon to draw your users back to look at updates. It is little wonder that RSS has increased in popularity and use among content providers, as it provides a much needed method of maintaining an audience’s attention.

When you see this icon  you can bet that an RSS feed is available on that site. This icon is the de-facto standard icon representing the availability of RSS for updates at a site. The curved lines represent radio waves, a symbol of the broadcast nature of the RSS feed.

Benefits of RSS Feeds

Your site has content that you want to get out to the masses, which is why you put it on the Internet in the first place. Once a substantial number of users know about your site and content, will they come back each day to check for updates? Probably not. This is where RSS steps in.

For your users, RSS can be a huge benefit, especially if they value opinions or news listed on your site. Without having to return to your site frequently, they will know exactly when you update or add content, allowing them to save time and effort, and they won’t miss anything either.

Content generation isn’t a problem, if you incorporate RSS feeds to fuel content aggregation for your own site. If you pull data off a feed and include it in your site, it can add a good amount of content to your site with only a little bit of time investment.

Getting Ready

A RSS feed is an XML file used to describe the contents of your website. As your website content changes, your RSS feed changes. Other computer systems, known as aggregators or harvesters, read your RSS feed every once in a while. If you have provided new information, the aggregator takes that information and sends it to readers around the world. Thus information about your site’s contents is syndicated, that is, rebroadcast to a much larger audience.

The RSS standard defines and contains the content of a feed. These feeds can be from any data source, defining Internet documents and in a very basic sense, make up a list of links and their descriptions.

An RSS file is a plain text file. This means it can be created with any ordinary text editor. Whatever you chose, be sure to save your file as plain text. Any formatting breaks the RSS file.

The easiest way to create an RSS file is to use the template and replace the relevant content. On the listing below you will find a short, simple template file you can use. Save it as mysite.rss for example.

Simple RSS Template

<?xml version="1.0"?>
<rss version="2.0">

    <channel>
    <title>My Amazing Website</title>
    <link>http://www.myamazingwebsite.com/</link>
    <description>My Amazing Website description</description>
    <language>en-us</language>
    <pubDate>Wed, 27 Jan 2010 07:23:15 GMT</pubDate>
    <lastBuildDate>Wed, 27 Jan 2010 07:23:15 GMT</lastBuildDate>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <generator>Weblog Editor 2.0</generator>
    <managingEditor>editor@myamazingwebsite.com</managingEditor>
    <webMaster>webmaster@myamazingwebsite.com</webMaster>

    <item>
        <title>Item 1 Title</title>
        <link>http://www.item1link.com/</link>
        <description>
            This is a description of Item 1.
        </description>
        <pubDate>Wed, 27 Jan 2010 09:39:21 GMT</pubDate>
        <guid>http://www.myamazingwebsite.com#item271</guid>
    </item>

    <item>
        <title>Item 2 Title</title>
        <link>http://www.item2link.com/</link>
        <description>
            This is a description of Item 2.
        </description>
        <pubDate>Wed, 27 Jan 2010 09:39:23 GMT</pubDate>
        <guid>http://www.myamazingwebsite.com#item274</guid>
    </item>

    </channel>
</rss>

You can save your file using any name you choose, but it is a lot easier if you indicate the name of your site and use .rss as an extension, just so people can tell by looking at the title what your file does.

Defining the Elements

Near the top of the file, notice the line <channel>. The lines following this tag (and above the tag </channel>) will be a place for you to describe your website as a whole. A channel is simply the feed itself and it’s associated information. Many RSS feeds have one channel object, but you can have several, perhaps if you wanted to separate feeds by an arbitrary filter.

You need to enter three pieces of information. The information goes between the tags. Delete the example data and type in the information about your site. The objects: title, link and description are required by the channel object. They define the basic descriptive information about the feed. The optional objects are: language, copyright, managingEditor, webMaster, pubDate, lastBuildDate, category, generator, docs, cloud, ttl, image, rating, textInput, skipHours, and skipDays.

Required elements:

  • title – The name of the channel. It’s how people refer to your service. If you have an HTML website that contains the same information as your RSS file, the title of your channel should be the same as the title of your website.
  • link – The URL to the HTML website corresponding to the channel.
  • description – Phrase or sentence describing the channel.

Optional elements:

  • language – The language the channel is written in. This allows aggregators to group all Italian language sites, for example, on a single page. You may use values defined by the W3C.
  • copyright – Copyright notice for content in the channel.
  • managingEditor – Email address for person responsible for editorial content.
  • webMaster – Email address for person responsible for technical issues relating to channel.
  • pubDate – The publication date for the content in the channel. All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).
  • lastBuildDate – The last time the content of the channel changed.
  • category – Specify one or more categories that the channel belongs to.
  • generator – A string indicating the program used to generate the channel.
  • docs – A URL that points to the documentation for the format used in the RSS file.
  • cloud – Allows processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds. More info here.
  • ttl – Time to live. It’s a number of minutes that indicates how long a channel can be cached before refreshing from the source. This makes it possible for RSS sources to be managed by a file-sharing network such as Gnutella.
  • image – Specifies a GIF, JPEG or PNG image that can be displayed with the channel. More info here.
  • textInput – Specifies a text input box that can be displayed with the channel. More info here.
  • skipHours – A hint for aggregators telling them which hours they can skip. More info here.
  • skipDays – A hint for aggregators telling them which days they can skip. More info here.

You only need to create this part once. Once it is done, it is done forever.

Your site may contain one or more articles you want readers to read. This is the part that changes from time to time, whenever there is a change or an update you wish to inform your visitors. Each article is described by an <item> tag. In the example above, there are two items included.

A channel may contain any number of <item>s. An item may represent a story – much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself; if so, the description contains the text (entity-encoded HTML is allowed), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present.

In between the <item> tags are some tags containing information about the article. To create information about an article, fill in the following tags:

  • title – The title of the item.
  • link – The URL of the item.
  • description – The item synopsis.
  • author – Email address of the author of the item. For newspapers and magazines syndicating via RSS, the author is the person who wrote the article that the <item> describes. For collaborative weblogs, the author of the item might be different from the managing editor or webmaster. For a weblog authored by a single individual it would make sense to omit the <author> element.
  • category – Includes the item in one or more categories. More info here.
  • comments – URL of a page for comments relating to the item.
  • enclosure – Describes a media object that is attached to the item. More info here.
  • guid – A string that uniquely identifies the item. Guid stands for globally unique identifier. It’s a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new.
    There are no rules for the syntax of a guid. Aggregators must view them as a string. It’s up to the source of the feed to establish the uniqueness of the string.
    If the guid element has an attribute named isPermaLink with a value of true, the reader may assume that it is a permalink to the item, that is, a url that can be opened in a Web browser, that points to the full item described by the <item> element. An example:

    <guid isPermaLink="true">http://inessential.com/2002/09/01.php#a2</guid>

    isPermaLink is optional, its default value is true. If its value is false, the guid may not be assumed to be a url, or a url to anything in particular.

  • pubDate – Indicates when the item was published. All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).
  • source – The RSS channel that the item came from. More info here.

Create an item for each article and save your .rss file.

Escape Characters

An RSS file is an XML file. Some characters are illegal in XML. So you have to escape them – insert a text string in their place. Here is the list of the escape characters you can use for the RSS feed file:

  • & – Replace all instances of & with the following: &amp;
    Don’t forget the ampersands in URLs.
  • – Change every full quote to &quot;
  • – Change every apostraphe to &apos;
  • > – Change every greater than character to &gt;
    Do not change them in the tags.
  • < – Change every less than character tag to &lt;
    Do not change them in the tags.

Upload your file to your web server. You now have a working RSS feed. Congratulations.

Validating RSS Feed

This step is optional, but is highly recommended. To validate your RSS file, go to a RSS validator on the Web and enter the full URL of your RSS file into the form. If there are any errors, the validator will tell you about them. Otherwise, it will validate your file.

Advertising RSS Feed

You need to tell people that your feed exists. They are not (necessarily) going to find it through Google. There are two major ways to do this.

First, add an XML button to your home page and link it to your RSS file. Get a copy of the image (one is located at http://dusan.kuzmanovic.net/download/icon-rss.jpg) and upload it to your website.Then place the image, with a link, on your home page. For example, place the following code on your home page:

<a href="http://www.myamazingwebsite.com/mysite.rss">
    <img src="icon-rss.gif" width=36 height=14 alt="RSS feed for this site" border=0 />
</a>

Second, you can submit your site’s RSS feed URL to various aggregators, such as Syndic8. This will tell them to start checking your RSS feed for updates.

Conclusion

So far, these were instructions for creating a basic RSS feed. The RSS standard can be extended in a variety of ways. But this is a good starting point: your feed will be fully functional, and most important, it will work. Get to know your feed. After a time, you may want to look at the world of RSS 1.0 and RSS 2.0 and the various ways to add more information.

More info here:

http://feedvalidator.org/docs/rss2.html
http://cyber.law.harvard.edu/rss/rss.html

Leave a Reply

Your email address will not be published.