Python Script to Save All Your Posts As MarkDown or Html Files

in HiveDevs4 years ago (edited)

I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.

I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)

If you are a dev and have been doing this for a while, you probably can write a more efficient script.

I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.

Features and options:

  • saves all your markdown posts as .md files
  • saves all your raw HTML posts as .html files
  • you can set a main sub-directory or sub-path in the current directory where the files will be placed
  • posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
  • you can save the posts for any account
  • you can save resteemed posts as well or not
  • you can add tags at the end of the post or not
  • title is automatically added as H1 at the beginning of the post

I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.

You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.

So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.

While I'm far from a Python or Steem dev expert, if you have questions let me know.

Feedback to improve from more experienced devs is welcomed as well. :)

import os
import sys
import json
from steem import Steem
s = Steem()

# script parameters
# =================

# author
author_name = 'testuser123'

# relative directory under which the posts will be saved (don't add a final "/"!)
main_save_dir = 'steem-posts-' + author_name

# structure of directories under which posts will be saved
# Options:
# primary-tag - posts are saved under their primary tag subdirectory
# year-month - posts are saved under the year-month of their creation date subdirectory
dir_struct_option = 'year-month'
print('Save posts by ' + dir_struct_option)

# bool flag to determine if tags are added at the end of the post or not
adding_tags_to_saved_post = True
print('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))

# bool flag to determine if to save resteemed posts of other authors as well
include_resteem_posts = False
print('Include resteemed posts? ' + str(include_resteem_posts))

# =====================
# end script parameters
#

#create main save directory (as a subdirectory or sub-path of the current directory)
try:
os.makedirs(main_save_dir)
print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)
except FileExistsError:
print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)
except OSError:
print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)

#save current dir
cur_dir_saved = os.curdir

# loops through all the posts of the given author
# we break out of the loop after we reach the last post of the author
i = 1
while True:

#retrieve current blog post info
#theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one
try:
blogs = s.get_blog(author_name, i, 1)
except Exception:
print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.')
continue
#is it empty? then we reached the end and we should break out of the loop
if blogs == []: break

#is it the author's post or a resteem?
#if it's a resteem continue from the next iteration and resteems are not to be included
if blogs[0]['comment']['author'] != author_name:
if not include_resteem_posts:
print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.')
i += 1
continue
else:
print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.')

#choose the name of the subdir where to place the saved posts
#(i.e. posts can be saved by primary-tag or date [year-month])
if dir_struct_option == 'primary-tag':
subdir_name = 'tags/' + blogs[0]['comment']['category']
elif dir_struct_option == 'year-month':
subdir_name = 'date/' + blogs[0]['comment']['created'][0:7]

#attempt to create the subdir first
if cur_dir_saved == '.':
dir_name = main_save_dir + '/' + subdir_name
elif cur_dir_saved == '/':
dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name
else:
dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name

#create the subdirectory/ies where we will place our files
try:
os.makedirs(dir_name)
print('Directory ' + dir_name + ' created.')
except FileExistsError:
pass
except OSError:
print('Directory ' + dir_name + ' couldn\'t be created.')
raise OSError

#deserialize json_metadata
json_metadata_str = blogs[0]['comment']['json_metadata']
json_metadata_dict = json.loads(json_metadata_str)

try:
format = json_metadata_dict['format']
except KeyError:
print('Broken blog json before format key. Defaulting to "markdown+html".')
format = 'markdown+html'

#is the post markdown?
if format == 'markdown+html' or format == 'markdown':
#choose the filename as the blog post's permlink + ".md" extension
filename = blogs[0]['comment']['permlink'] + '.md'

if (adding_tags_to_saved_post):
#get tags and create a string with them to add at the end of the post
try:
tags_str = '\n\n'
for x in json_metadata_dict['tags']:
tags_str += '#' + x + ' '
except KeyError:
tags_str = ''
else: tags_str = ''

#get post body
body = blogs[0]['comment']['body']

#get post title
title = blogs[0]['comment']['title']

#format the body to also include title at the begining as H1 and tags (with #) at the end
body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str
#or is the post raw html?
else:
#choose the filename as the blog post's permlink + ".md" extension
filename = blogs[0]['comment']['permlink'] + '.html'

if (adding_tags_to_saved_post):
#get tags and create a string with them to add at the end of the post
try:
tags_str = '\n\n'
for x in json_metadata_dict['tags']:
tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> '
except KeyError:
tags_str = ''
else: tags_str = ''

#get post body
body = blogs[0]['comment']['body']

#get post title
title = blogs[0]['comment']['title']

#format the body to also include title at the begining as H1 and tags (with #) at the end
body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str

#write post to file (overwrite if exists)
try:
f = open(dir_name + '/' + filename, 'w')
f.write(body_with_title_and_tags)
f.close()
print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.')
except OSError:
print('Something went wrong while attempting to write file ' + dir_name + '/' + filename)
raise OSError

i+=1

print('No (more) posts.')

Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.

Sort:  
Loading...

As a note, I use VS Code (because I'm a dev I guess) w/ an extension to preview .md files as I write them (basically like writing a post with preview), probably similar free apps to do it with that aren't as massive as VS Code though.

Yes, I used VS Code to write this Python script as well. Didn't try it for md though, but I will. Thanks for mentioning it.

Just checked it, I was using Markdown Preview Enhanced for the extension, looks like there are a few though. No problem, nice script man!

Great, I'll check it out. Thanks again!

Great.. I will try this out.

 4 years ago  Reveal Comment

My name is Jesus Christ and I do not condone this spamming in my name. Your spam is really fucking annoying @hiroyamagishi aka @overall-servant aka @olaf123 and your spam-bot army. This is not what my father, God, created the universe for. You must stop spamming immediately or I will make sure that you go to hell.

If anybody wants to support my eternal battling of these relentless religion spammers, please consider upvoting this comment or delegating to @the-real-jesus