• The site has now migrated to Xenforo 2. If you see any issues with the forum operation, please post them in the feedback thread.
  • Due to issues with external spam filters, QQ is currently unable to send any mail to Microsoft E-mail addresses. This includes any account at live.com, hotmail.com or msn.com. Signing up to the forum with one of these addresses will result in your verification E-mail never arriving. For best results, please use a different E-mail provider for your QQ address.
  • For prospective new members, a word of warning: don't use common names like Dennis, Simon, or Kenny if you decide to create an account. Spammers have used them all before you and gotten those names flagged in the anti-spam databases. Your account registration will be rejected because of it.
  • Since it has happened MULTIPLE times now, I want to be very clear about this. You do not get to abandon an account and create a new one. You do not get to pass an account to someone else and create a new one. If you do so anyway, you will be banned for creating sockpuppets.
  • Due to the actions of particularly persistent spammers and trolls, we will be banning disposable email addresses from today onward.
  • The rules regarding NSFW links have been updated. See here for details.

I made a wordcounter!

Prognostic Hannya

Knight of the Yuri Crusade
Joined
Dec 3, 2019
Messages
1,349
Likes received
14,052
Hi everyone,

I'm new to QQ, and an amateur python programmer.

I was reading With this Ring by Mr. Zoat, and was shocked by how absolutely massive it is compared to most other fanfics. But for the life of me, I couldn't find a way to see the wordcount of the story only thread that didn't just count the threadmarks. The story's currently at 3,400,000 words, if you were wondering.

So, like the amateur programmer I am, I decided to write a script to do it myself! Just input the url for a QQ thread (that's not behind an account wall), and it will spit out the wordcount. It even has a loading bar for longer threads!

Please let me know if this post is in the wrong place, or if you have any improvements to my code!


Code:
## Made by Sam Ravenwood
 
import bs4 as bs
import requests
import re
from tqdm import tqdm
 
#creates a list of the urls of every page
def iterate(url):
	http = requests.get(url)
	page = bs.BeautifulSoup(http.text, 'html.parser')
	#finds max pagecount
	pagecount = int(page.find(text="Next >").previous_element.previous_element.previous_element)
	if url[len(url)-1] == "/":
		url = url + "page-"
	else:
		url = url + "/page-"
	links = []
	for i in range(1, pagecount+1):
		links.append(url + str(i))
	return links
 
def get(cat, url):
	assert cat in ["posts","title"], "Incorrect category for get request!"
	http = requests.get(url)
	page = bs.BeautifulSoup(http.text, 'html.parser')
	if cat == "posts":
		return page.find_all(class_="message")
	if cat == "title":
		return page.find("title").get_text().replace(" | Questionable Questing", "")
 
def counter(msg):
	msg = str(msg)
	msg = re.sub("[^a-zA-Z0-9_\s]", "",msg) #deletes all characters that aren't alphanumeric or a space
	msg = msg.split(" ")
	return len(msg) + 1
 
#creates a list of every message in the thread
def wordcount(url):
	links = iterate(url)
	total = 0
	#for each page of the community, get wordcount of each post, add it to total
	for i in tqdm(range(len(links))):  #for each page
		link = links[i]
		posts_loc = get("posts", link)
		for post_loc in posts_loc: #for each post in page
			count = counter(post_loc.get_text())
			total += count
  
	return total
 
url = input("Enter url:\n")
if "http://" not in url and "https://" not in url:
	url = "https://" + url
 
print("Analyzing pages...")
print(f"\nThread '{get('title', url)}' has total wordcount of {wordcount(url):,} words.")
input()
 
Last edited:
Well, first things first. Your link doesn't seem to work. You could put the code in a
Code:
code here

using [code]code here[/code]. Unless it's too long, I guess.

I also moved your thread the General, as I believe that it is more topical to your thread.
 
Well, first things first. Your link doesn't seem to work. You could put the code in a
Code:
code here

using [code]code here[/code]. Unless it's too long, I guess.

I also moved your thread the General, as I believe that it is more topical to your thread.

Whoops, sorry! I fixed the link. Also the code's like 60 lines, idk if that's "too long"
 
I would say it's likely not too long then.
 

Users who are viewing this thread

Back
Top