You’re probably not reading the post you thought you clicked on. Scroll down for the explanation.
When I wrote the post entitled Requirements for permalink /%postname%/ I didn’t realise that in this site my permalink structure was already set to /%postname%/
. This means that I can demonstrate the problem.
The above link was created by inserting a link to the post, not the page. But if you click on the link you should end up back here, on the page.
Current post’s fields
21124publishrequirements-for-permalink-postnamepage
Duplicate posts
Title | ID | Post Status | Post Name | Post Type |
---|---|---|---|---|
Requirements for permalink /%postname%/ | 21123 | publish | requirements-for-permalink-postname | post |
Requirements for permalink /%postname%/ | 21124 | publish | requirements-for-permalink-postname | page |
Can we create a link to the post?
If you click on this link, which is the post ID, you would expect to be shown the correct post wouldn’t you?
/?p=21123
No. That doesn’t work!
How about adding the post type?
/?p=21123&post_type=post
How about adding a prefix, as if the permalink structure was /%year%/%postname%/
?
/2022/?p=21123&post_type=post
Clutching at straws…
/page_id=22919
Results for different values of Status and Visibility
OK, so what about when we change the status of this page? Do we get to see the post?
Status | Logged in? | Logged out? |
---|---|---|
Published (publish) | No | 404 |
Draft (draft) | No | 404 |
Pending (pending) | No | 404 |
Scheduled (future) | No | 404 |
Deleted (trash) | Yes | Yes |
Visibility: Private (private) | No | 404 |
Visibility: Password protected (publish / future ) | No | 404 |
The original post
For the time being, I can’t find a way to make a link to the post actually resolve to the post. So here it is – embedded directly into this page using a shortcode that references the post by its ID.
At the WordPress Portsmouth Online Meetup, on 16th March 2022, we had a long discussion about a bug that was raised 12 years ago. It’s never been closed and hasn’t been worked on for a while. The issue in question is #13459 Conflict between post and page slugs/permalinks when permalink setting is set to /%postname%/
In a nutshell the problem can be summarised by one of these two quotes.
I clicked on the link and got shown the wrong content.
I clicked on the link and got a 404 Not found.
- The issue is assigned to the Permalink component.
- There have been 16 duplicates of this issue.
- Each duplicate has been closed, even though the problem hasn’t been fixed. That’s Business As Usual.
- Abha suggested we chivvy it along by raising it at the weekly bug scrub.
- We did and got some response… more work needed.
So here’s my effort at describing the problem, the requirements to be satisfied, and some thoughts on possible solutions that may work even when there are duplicate slugs between post types.
Reproducing the problem
The problem occurs when the site is configured with permalinks that just use the postname
. Also known as the slug
, this is the field in the post that contains what is supposed to be a unique identifier for the post. Unfortunately, it turns out that it’s not unique and this can lead to unexpected results when clicking on permalinks to the website’s content.
The permalink of this post is currently requirements-for-permalink-postname
, since it’s been automatically generated from the post’s title. If I were to have my permalink structure set to /%postname%/
and I were to create a page or other custom post type with the same title… and hence the same slug, then attempting to view the post can lead to me seeing the page, not the post.
For the majority of users this is an unexpected result.
If I were to make the page private, then users who aren’t logged in would get the 404.
These are also unexpected results. The user knew the post was there, but wasn’t shown it.
Steps to reproduce the problem
- Set permalinks to
/%postname%/
- Create a page called XXX
- Create a post called XXX
- View posts archive
- Choose XXX
Expected result: The post called XXX
Actual result: The content of the page called XXX
You can actually reduce the number of steps to produce the problem.
- Set permalinks to
/%postname%/
- Create a post with the same name as an existing page.
- Save the post
- View it.
Expected result: The post
Actual result: The page
Fun with post status
Once you’ve published the post you’ll find that you can’t even Preview it. This is because the URL for the Preview uses the permalink eg https://herbmiller.me/requirements-for-permalink-postname/?preview=true.
You’ll find that you can preview the post when it’s a Draft or Scheduled post. What’s more surprising, is that when you’re logged in you can see the post if its status is Draft or Scheduled ( future ).
This is because the URL uses the post ID rather than the permalink… but it only works when you’re logged in and the post isn’t (yet) published!
Summary of posts with this post name
This post: Requirements for permalink /%postname%/
This post’s fields: 21123publishrequirements-for-permalink-postnamepost
Title | ID | Post Status | Post Name | Post Type |
---|---|---|---|---|
Requirements for permalink /%postname%/ | 21123 | publish | requirements-for-permalink-postname | post |
Requirements for permalink /%postname%/ | 21124 | publish | requirements-for-permalink-postname | page |
Requirements
The basic requirement is to be able to satisfy the user’s request to view the content they were offered.
The additional requirement implied by the /%postname%/
permalink structure is
- either for WordPress to prevent duplicate URLs, when using this permalink structure.
- or for WordPress to resolve duplicates using a documented / logical algorithm.
For completeness, the solution should work
- for all permalink structures
- for all hierarchical permalinks, for posts, pages, attachments, taxonomies and Custom Post Types (CPTs).
There are additional requirements to be satisfied:
- pages are allowed to have non-unique slugs across hierarchies
- CPTs are allowed to have non-unique slugs across hierarchies
- solution should work when non-unique items have been trashed
- solution should support paginated content
Test cases
Prior to developing any automated test cases I believe it’s necessary that we document the requirements clearly enough for expected results to be articulated and agreed. One way of achieving this is to document the scenarios where the wrong result is produced for the given input, what the correct result should be and why. In other words, to clearly document what we expect.
This is quite a challenge as there are so many combinations. The problem is not limited to posts and pages. It also extends to attachments and custom post types.
Other custom permalinks don’t work
The problem is not limited to /%postname%/
. The table below summarises the results obtained with several custom permalinks.
Custom permalink | Works? | Comments |
---|---|---|
/%postname%/ | No | See above. |
/%postname%/%post_id%/ | No | Different results. Given that the permalink contained the post ID, these results were even more unexpected. Sometimes we get a 404. |
/%post_id%/ | Yes | But very unsatisfactory URL |
/%post_id%/%post_name%/ | Yes | Fairly unsatisfactory URL |
/-/%post_name%/ | Sort of | It works for posts but not for the Attachment scenario. |
/%year%/ | No | The date archive for year is displayed |
Any other combination not including /%postname%/ or /%post_id%/ | No | You’ll get some archive display for every post you click on. |
Note: I’ve not yet created / seen a scenario which applies to taxonomies and/or their permalink prefixes.
I did however try setting the Optional Category base to a single blank character. It got converted to %20
. This led to a 403 error when attempting to view the posts in a selected category.
Similarly, entering a question mark into the Optional Tag base field led to a 404.
Possible solutions
Following analysis of wp_unique_post_slug()
many moons ago, three options were proposed:
- Always prevent posts, pages (and CPTs?) from having the same slug (require unique slugs across all post types). Since having the same slug is actually fine with most permalink structures, this sounds like an unnecessary restriction.
- Only do the above for the
/%postname%/
permalink structure. However, if the structure changes to/%postname%/
later (after the page and the post are created), we’ll still end up with a conflict. - Leave this to a plugin, since
wp_unique_post_slug()
is filterable.
These options focused on preventing the problem in the first place.
An alternative approach would be to deal with the URL request taking into account the permalink structure. A third would be to detect and alter the duplicate slug on permalink creation such that when the link was clicked WordPress would find the correct post.
I already use this technique for taxonomies which are attached to several CPTs.
eg. https://blocks.wp-a2z.org/letters/b/?post_type=oik-plugins will display post type oik-plugins
which are classified in the letters
taxonomy as b
.
What happens when the URL request is being processed?
When the request’s query is parsed, WordPress uses the rewrite rules to help it construct the query to run to find the requested content.
For /%postname%/
tracing of the wp
hook showed the WP object containing:
[query_vars] => Array
[page] => (string) ""
[pagename] => (string) "trouble-with-urls-paged"
[query_string] => (string) "pagename=trouble-with-urls-paged"
[request] => (string) "trouble-with-urls-paged"
[matched_rule] => (string) "(.?.+?)(?:/([0-9]+))?/?$"
[matched_query] => (string) "pagename=trouble-with-urls-paged&page="
[did_permalink] => (boolean) 1
Note: The global $wp_rewrite
object’s rules array doesn’t differentiate between posts and pages.
This means that the code’s already decided to load the page. The query that was performed was invoked by `get_page_by_path()`
SELECT ID, post_name, post_parent, post_type
FROM wp_posts
WHERE post_name IN ('trouble-with-urls-paged')
AND post_type IN ('page','attachment')
This didn’t make any sense to me. Why didn’t the post_type
clause include post
?
Looking at the code I found where get_page_by_path()
is being called from parse_request()
.
if ( $wp_rewrite->use_verbose_page_rules && preg_match( '/pagename=\$matches\[([0-9]+)\]/', $query, $varmatch ) ) {
// This is a verbose page match, let's check to be sure about it.
$page = get_page_by_path( $matches[ $varmatch[1] ] );
if ( ! $page ) {
continue;
}
I believe that it’s this logic that’s finding the page and therefore ignoring the post.
So now I need to understand why use_verbose_page_rules
is set to true…. and what I might be able to do to convince WordPress to have another look for any other posts that could satisfy the user’s request.
More investigation necessary…
I’ve started writing a plugin that attempts to intercept the current logic. It’s looking promising in some respects, but fails in others. Basically it has a look at what WordPress has decided the query should be and overrides it, chaging pagename
to name
in the $query_vars array.
I will continue with my hacky workaround, But it would be nice to see a proper solution to this issue in the not too distant future.
In that respect I have started to develop PHPUnit test cases to test the different scenarios. See bobbingwide/dupes.