A better methodology is to count the spaces to obtain word count. Also to trim out any markdown syntax, links or html using regex.
Have you had a play with my Hive Report Card tool? It does a lot of analytics on a per author basis, and pulls from the condenser API. I do not include rewards in this analysis as I want it to be about the content :)