I made a wordcounter!

Prognostic Hannya · Dec 3, 2019

Hi everyone,

I'm new to QQ, and an amateur python programmer.

I was reading With this Ring by Mr. Zoat, and was shocked by how absolutely massive it is compared to most other fanfics. But for the life of me, I couldn't find a way to see the wordcount of the story only thread that didn't just count the threadmarks. The story's currently at 3,400,000 words, if you were wondering.

So, like the amateur programmer I am, I decided to write a script to do it myself! Just input the url for a QQ thread (that's not behind an account wall), and it will spit out the wordcount. It even has a loading bar for longer threads!

Please let me know if this post is in the wrong place, or if you have any improvements to my code!

Code:

## Made by Sam Ravenwood
 
import bs4 as bs
import requests
import re
from tqdm import tqdm
 
#creates a list of the urls of every page
def iterate(url):
	http = requests.get(url)
	page = bs.BeautifulSoup(http.text, 'html.parser')
	#finds max pagecount
	pagecount = int(page.find(text="Next >").previous_element.previous_element.previous_element)
	if url[len(url)-1] == "/":
		url = url + "page-"
	else:
		url = url + "/page-"
	links = []
	for i in range(1, pagecount+1):
		links.append(url + str(i))
	return links
 
def get(cat, url):
	assert cat in ["posts","title"], "Incorrect category for get request!"
	http = requests.get(url)
	page = bs.BeautifulSoup(http.text, 'html.parser')
	if cat == "posts":
		return page.find_all(class_="message")
	if cat == "title":
		return page.find("title").get_text().replace(" | Questionable Questing", "")
 
def counter(msg):
	msg = str(msg)
	msg = re.sub("[^a-zA-Z0-9_\s]", "",msg) #deletes all characters that aren't alphanumeric or a space
	msg = msg.split(" ")
	return len(msg) + 1
 
#creates a list of every message in the thread
def wordcount(url):
	links = iterate(url)
	total = 0
	#for each page of the community, get wordcount of each post, add it to total
	for i in tqdm(range(len(links))):  #for each page
		link = links[i]
		posts_loc = get("posts", link)
		for post_loc in posts_loc: #for each post in page
			count = counter(post_loc.get_text())
			total += count
  
	return total
 
url = input("Enter url:\n")
if "http://" not in url and "https://" not in url:
	url = "https://" + url
 
print("Analyzing pages...")
print(f"\nThread '{get('title', url)}' has total wordcount of {wordcount(url):,} words.")
input()

Nekraa · Dec 3, 2019

Well, first things first. Your link doesn't seem to work. You could put the code in a

Code:

code here

using [code]code here[/code]. Unless it's too long, I guess.

I also moved your thread the General, as I believe that it is more topical to your thread.

Prognostic Hannya · Dec 3, 2019

Nekraa said:
Well, first things first. Your link doesn't seem to work. You could put the code in a

Code:

code here

using [code]code here[/code]. Unless it's too long, I guess.

I also moved your thread the General, as I believe that it is more topical to your thread.

Whoops, sorry! I fixed the link. Also the code's like 60 lines, idk if that's "too long"

Nekraa · Dec 3, 2019

I would say it's likely not too long then.

I made a wordcounter!

More options

Prognostic Hannya

Knight of the Yuri Crusade

Nekraa

Nekraa

Prognostic Hannya

Knight of the Yuri Crusade

Nekraa

Nekraa

Users who are viewing this thread