TL;DR I created a WordPress plugin that allows me to ingest tons of posts 24/7 via curated RSS. See below. While creating it I caused a huge headache and had to recover from it.
RSS feed used to be all the rage, and feed readers were a major way to keep up with news and interests, but now RSS is pretty much seen as an API which is a shame.
Anyway, I digress.
This plugin worked gangbusters so I thought I could use it for a client who needed to take a copy of his newsletter archive.
Two problems occurred.
One, the system he uses has a weird way of embedding widgets into posts. They are kind of a rich JSON style format but not JSON.
This would be fine, but I had already imported them at the point I realized this.
Next issue was only 20 items were available in RSS, so I am going to have to scrape his archive anyway.
I was feeling pretty pleased with myself, I had even managed to set his featured images successfully after a lot of swearing and debugging, but now I had these nasty messed up embeds.
No problem, I thought, search and replace!
But ... nope.
The embed is HTML encoded, and the search and replace tools are careful to sanitize your form submissions.
So I had to go to the source database, but I have no direct MySQL access nor do I have his PHPAdmin.
Arrgh!
Wonderful news though, a nice fellow created a plugin that uses Adminer as a PHPMyAdmin replacement AND it works from WordPress!
Now I just had to use an SQL statement including escape characters:
update wp_posts set post_content = replace(post_content, '\\\'', '\'')
where post_status = 'draft' and post_type = 'post'
These are top WordPress plugins for if you get into a similar pickle:
- debug-log-manager
- pexlechris-adminer
- WP Crontrol
RSS Feed Importer WordPress Plugin
This WordPress plugin imports posts from a specified RSS or Atom feed into your WordPress site on a recurring schedule.
Features
- Scheduled Imports: Automatically checks a specified feed URL for new posts based on standard WordPress cron schedules (e.g., hourly, twice daily, daily).
- Configurable Import Quantity: Set the maximum number of posts to attempt importing during each scheduled check.
- Post Status Control: Choose the status (Draft, Published, Pending Review, Private) for newly imported posts.
- Category Assignment: Assign all imported posts to a specific category selected from a dropdown list.
- Image Handling:
- Imports images found within the feed item's content, uploads them to the WordPress Media Library, and updates the image URLs in the post content.
- Detects image enclosures in the feed and attempts to set them as the post's Featured Image.
- Duplicate Prevention: Stores the feed item's GUID (or permalink as fallback) in post meta to prevent importing the same item multiple times.
- Simple Settings Page: Configure the feed URL, schedule, quantity, status, and category under "Tools" -> "RSS Importer" in the WordPress admin area.
How it Works
- Settings: Configure the plugin via the admin settings page. Saving the settings schedules (or reschedules) the import task using WP-Cron.
- Cron Job: On the chosen schedule, the
rss_importer_cron
function runs. - Fetch Feed: It fetches the specified RSS/Atom feed using WordPress's built-in
fetch_feed()
function. - Check for Duplicates: For each item in the feed (up to the specified quantity), it checks if a post with the same GUID has already been imported by checking post meta (
_rss_importer_guid
). - Process Content Images: It parses the feed item's content (
the_content
), finds<img>
tags, downloads the images usingdownload_url()
, uploads them to the Media Library usingmedia_handle_sideload()
, and updates thesrc
attributes in the content with the new URLs. - Insert Post: It creates a new post using
wp_insert_post()
, mapping the feed item's title, processed content, date, and selected status/category. It also stores the original GUID, feed URL, and item permalink as post meta. - Featured Image: If the feed item has an image enclosure, it attempts to download and upload that image using
media_handle_sideload()
and sets it as the Featured Image for the newly created post usingset_post_thumbnail()
.
Requirements
- WordPress (tested vaguely around recent versions)
- PHP (with standard extensions like DOM, mbstring, and an image library like GD or Imagick for thumbnail generation)
- A functioning WP-Cron system (standard WordPress cron or a server-side cron job triggering
wp-cron.php
)
Installation
- Download the plugin files (e.g.,
rss_importer.php
). - Place the
rss_importer.php
file (or the directory containing it, if packaged) into your/wp-content/plugins/
directory. - Activate the "RSS Feed Importer" plugin through the 'Plugins' menu in WordPress.
- Go to "Tools" -> "RSS Importer" to configure the settings.
Notes
- Import scheduling relies on WP-Cron, which depends on site traffic to trigger schedules. For guaranteed timing, consider setting up a server-level cron job to hit
wp-cron.php
. - Image import success depends on the remote server allowing downloads, PHP memory limits, and correct image library setup on your server.
- The author for imported posts is currently hardcoded to user ID 1 (the initial admin user). This could be made configurable in future versions.