I.T. Spices The LINUX Way

Python In The Shell: The STEEMIT Ecosystem – Post #98

THE BEAUTY OF PYTHON AND LINUX SHELL COMBINED

In continuation as per the previous post here:

https://steemit.com/blockchain/@lightingmacsteem/3vfqfb-i-t-spices-the-linux-way

After all the lines counting and identification on the previous lines of python codes, it is time to use such data to extract said JSON lines and save it to a temp file:

1 ###EXTRACT THE LINE STARTING FROM MAXID UP TO LAST AND SAVE TO DESTFILE
2 ###DELETE DESTFILE FIRST IF EXIST
3 try:
4   os.remove(destfile)
5 except OSError:
6   pass
7 if int(line) != 0:
8   eee = ("sed -n '" + str(line) + "," + str(lastline) + "p;" + str(endline) + "q' " + insertfile + " > " + destfile)
9   os.system(eee)
10 else:
11  print('LINE is equal or greater than COUNTLINE.......')



JSON Lines Extraction And Saving To A File

1 ###EXTRACT THE LINE STARTING FROM MAXID UP TO LAST AND SAVE TO DESTFILE
2 ###DELETE DESTFILE FIRST IF EXIST

Lines 1 and 2 are just comments once again telling us that the said lines will extract certain line numbers from the source file and save it to a temp file. The temp file if it exists, will be deleted first before writing a new one.


3 try:
4   os.remove(destfile)
5 except OSError:
6   pass

Lines 3 to 6 are try lines. In python a try is a very useful command to be able to capture any errors and so that the script will not exit prematurely. If something will go wrong here, then do not exit just go to the other line. Of course this is a very simple deletion of the temp file, just to tell python that if the file does not exist then do not bother to exit, just continue on to the except line.


7 if int(line) != 0:
8   eee = ("sed -n '" + str(line) + "," + str(lastline) + "p;" + str(endline) + "q' " + insertfile + " > " + destfile)
9   os.system(eee)

Lines 7 to 9 are another error capturing lines that will catch all instances if in case the line variable is not equal to zero. This is a very inclusive line as only the value zero of the line variable can render this script not acting at this routine. This routine is another one-shot go using the sed command of the linux shell, extracting only specific JSON lines in the fastest way possible, the results of which will be saved into the destfile variable.

Once Line 9 is executed by python, we can be sure that the source file of multi-million lines of JSON blockchain records will only be extracted with 300 thousand lines and saved into a temp file.


10 else:
11  print('LINE is equal or greater than COUNTLINE.......')

Lines 10 to 11 are the executed routines if the value of the line variable is equal to zero, which will also mean that the database is still empty of course. We are just printing any info here of the line numbers.


JSON Lines Extraction Is Now Done

Up to this point we are finished with the gathering of lines from the source file so that whatever it will be, the next routines will just have to examine these lines one by one from a temp file as we tell python to insert and save it to the database.

By the way, we are placing the said temp file on a memory folder in linux (/dev/shm) to make it much much faster in reads and writes, very important in the overall approach for speed and stability.

These codes may seem long due to the explanations but actually these routines are so fast once executed. We will deal with the database manipulations on the next post, this time, in pure python.

Just imagine all these routines like climbing up a ladder, one step at a time; makes things much simpler.


“Anything Made By Men Can Be Fixed By Men…….”