Steem improvement proposal : Canonical links.

in #steemdev6 years ago (edited)

As you may have noticed, the ranking of "steemit.com" has a gone up a lot these last few months. Currently, it is up and around the top 1000 most visited websites in the world (according to alexa.com)

This is great! However; it can also be bad news for the steem ecosystem because of one thing :

Duplicate content

When seaching for keywords on a search engine, it will look them up in their index. If they find the same content twice they have to make a choice of which one to display first ? One way to choose is by looking up the rankings.

For instance, if the same text appear both on wallstreetjournal.com and mysuperblog.com, it's unlikely that the article originated from mysuperblog.com. Since wallstreetjournal.com has a way higher ranking, it will show be shown first.

But with the steem blockchain the same content is displayed on a lot of various websites who are, in the end, front ends that display the steem blockchain. And this is where it gets tricky :

If I make a contribution on utopian, the text will be available on steemit.com and on utopian.io

So what will the search engine do ?

Show steemit.com even though I posted it on utopian and would like to keep the formatting of utopian.The same goes for other applications like steepshot and zappl.

As a result, it's very hard for dapps to do marketing because they are always in competition with steemit.com who make them less visible. This is the reason why you can't see articles when you go to utopian.io and you are not logged in.

This is even harder for small bloggers who want to post on their own blog and on steem to reach a broader audience and try to earn a secondary source of income (via SteemPress for instance). By using the steem blockchain they are basically hurting their own seo (search engine optimization). Although this sacrifice will be worth it for smaller bloggers, it may play out badly in the long run and is therefore a problem.

Which is a shame because it can be solved easily with what's called canonical links.

A proposed solution

Canonical links is a standard that allows to tell the search engines "ok I display this content but it originally came from this source"

In practice it's shown like so :

<link rel="canonical" href="http://original-article.com/article">

The issue is that this can only be set in the header part of the page. Which is basically a place where we have no control over.

So this is a proposition of a standard, it's heavily inspired by @jesta's work from last year.

Json metadata

I propose a new optional tag to the json metadata : "canonical"

The behaviour would be like so :

If the tag is not set, the front end (whatever it may be) won't add the line to its header. If however the canonical tag is set, then it'll add the provided url in the header.

The changes are fairly simple, and it would greatly help the whole ecosystem.

So for instance if steemit.com sees :

{
  "author": "jesta",
  "permlink": "test-post",
  "category": "test",
  "json_metadata": {
    "canonical" : "http://mysuperblog.com/testpost"
  }
}

Then in the header there will be

<link rel="canonical" href="http://mysuperblog.com/testpost">

I think there is a need to write the complete url in the canonical tag to allow anyone to add his own site regardless of how the url is shaped.

Canonical links for the popular dapps

This causes some overhead on the blockchain which will grow to be quite large if all the big dapps add a canonical link tag so I propose that if the "app" is recognized (aka it's part of a small list of whitelisted apps) then the front end automatically generate the canonical link :

{
  "author": "jesta",
  "permlink": "test-post",
  "category": "test",
  "json_metadata": {
    "app": "busy/0.1"
  }
}

From that the front end know that it orginated from busy, and thanks to the small list it can do busy -> https://busy.org and since almost all dapps uses this url scheme :

category/@username/permlink

it can easily recreate the cannonical link :

https://busy.org/test/@jesta/test-post

and add it in the header :

<link rel="canonical" href="https://busy.org/test/@jesta/test-post">

Which allows us to save a few bits on the blockchain.

Of course this means that there is a need for a list of the most popular dapps to be hardcoded on the front ends, but it's not like there are like 50 and other apps can still use the canonical tag for the same effect.

This is just an optimization of the space and is in no way a necessity and since it brings ongoing work like having to keep a list of all the popular apps and their urls, so the work may not be worth the saved bits.

Cascading links

What if stemit encounters a json_metadata that has the "app" and the "canonical" tag in json-metadata ?

{
  "author": "jesta",
  "permlink": "test-post",
  "category": "test",
  "json_metadata": {
    "app": "busy/0.1",
    "canonical": "http://mysuperblog.com/testpost"
  }
}

We disregard the app tag and place the content of "canonical" on the link. So we cascade the links like so :

json_metadata.canonical > json_metadata.app > empty

So in the end, this i show it would work :

I would be very interested in your thoughts on this, and if everyone agrees that this is the way to go, then I'll put the work into a pull request.


Update :

Created an issue in condenser with the proposal : https://github.com/steemit/condenser/issues/2505

Sort:  
There are 2 pages
Pages
There are 2 pages
Pages