Prognostic Hannya
Knight of the Yuri Crusade
- Joined
- Dec 3, 2019
- Messages
- 1,349
- Likes received
- 14,052
Hi everyone,
I'm new to QQ, and an amateur python programmer.
I was reading With this Ring by Mr. Zoat, and was shocked by how absolutely massive it is compared to most other fanfics. But for the life of me, I couldn't find a way to see the wordcount of the story only thread that didn't just count the threadmarks. The story's currently at 3,400,000 words, if you were wondering.
So, like the amateur programmer I am, I decided to write a script to do it myself! Just input the url for a QQ thread (that's not behind an account wall), and it will spit out the wordcount. It even has a loading bar for longer threads!
Please let me know if this post is in the wrong place, or if you have any improvements to my code!
I'm new to QQ, and an amateur python programmer.
I was reading With this Ring by Mr. Zoat, and was shocked by how absolutely massive it is compared to most other fanfics. But for the life of me, I couldn't find a way to see the wordcount of the story only thread that didn't just count the threadmarks. The story's currently at 3,400,000 words, if you were wondering.
So, like the amateur programmer I am, I decided to write a script to do it myself! Just input the url for a QQ thread (that's not behind an account wall), and it will spit out the wordcount. It even has a loading bar for longer threads!
Please let me know if this post is in the wrong place, or if you have any improvements to my code!
Code:
## Made by Sam Ravenwood
import bs4 as bs
import requests
import re
from tqdm import tqdm
#creates a list of the urls of every page
def iterate(url):
http = requests.get(url)
page = bs.BeautifulSoup(http.text, 'html.parser')
#finds max pagecount
pagecount = int(page.find(text="Next >").previous_element.previous_element.previous_element)
if url[len(url)-1] == "/":
url = url + "page-"
else:
url = url + "/page-"
links = []
for i in range(1, pagecount+1):
links.append(url + str(i))
return links
def get(cat, url):
assert cat in ["posts","title"], "Incorrect category for get request!"
http = requests.get(url)
page = bs.BeautifulSoup(http.text, 'html.parser')
if cat == "posts":
return page.find_all(class_="message")
if cat == "title":
return page.find("title").get_text().replace(" | Questionable Questing", "")
def counter(msg):
msg = str(msg)
msg = re.sub("[^a-zA-Z0-9_\s]", "",msg) #deletes all characters that aren't alphanumeric or a space
msg = msg.split(" ")
return len(msg) + 1
#creates a list of every message in the thread
def wordcount(url):
links = iterate(url)
total = 0
#for each page of the community, get wordcount of each post, add it to total
for i in tqdm(range(len(links))): #for each page
link = links[i]
posts_loc = get("posts", link)
for post_loc in posts_loc: #for each post in page
count = counter(post_loc.get_text())
total += count
return total
url = input("Enter url:\n")
if "http://" not in url and "https://" not in url:
url = "https://" + url
print("Analyzing pages...")
print(f"\nThread '{get('title', url)}' has total wordcount of {wordcount(url):,} words.")
input()
Last edited: