I am a researcher at Wharton. I need to extract the text from the 10-K annual reports of the companies (available online at the EDGAR database). The problem is that there is no single format required, so many companies post text files, while others do HTML or XML.
The 10-Ks are divided into sections, [login to view URL] , and I would need the text for each section and the code to extract these sections. I have been using Regex to obtain these sections so far.
11 freelancers are bidding on average $2360 for this job
Hi, Dear. I'm glad to meet you. Thank you for your posting. I've read your post carefully. I'm Web Scraping expert. Please discuss more details on chatting room. Regards. Gao M.