Use Scrapy to crawl 70,000+ apps on the entire network of Wandoujia and conduct exploratory analysis. If you are not interested in the data capture section, you can directly drop down to the data analysis section. 1. Analysis background Previously, we used Scrapy to crawl and analyze 6000+ apps on Kuan.com. Why is this article talking about app grabbing? Because I like tossing apps, haha. Of course, mainly because of the following points: First, the previously crawled webpage is very simple When crawling Kuan.com, we use a for loop to complete the crawling of all the content after traversing hundreds of pages. It is very simple, but in reality it is often not so easy. To take the data of the entire website, in order to enhance the crawler skills, this article chose the "pea pod" website.
The goal is to crawl the app information under all categories of the website and download the app icons . The number is about 70,000 , which is an order of magnitude higher than Kuan. Second, practice using the powerful Scrapy framework again I have only used Scrapy for crawling initially, and I have not fully understood how powerful Scrapy is, so this phone number list article tries to use Scrapy in depth, adding settings such as random UserAgent, proxy IP and image download. Third, compare the two websites of Kuan and Wandoujia I believe that many people are using Wandoujia to download apps, and I use Kuan more, so I also want to compare the app features of these
two websites. Without further ado, let's start the crawling process. 1. Analytical goals First of all, let's take a look at what the Wandoujia webpage to be crawled looks like. You can see that the apps on the website are divided into many categories, including: "App Play", "System Tools", etc. There are a total of 14 major categories. Category, each category is subdivided into multiple subcategories, for example, video playback includes: "video", "live broadcast", etc. Click "Video" to enter the second-level sub-category page, and you can see some information of each App, including: icon, name, number of installations, volume, comments, etc. Then, we can go to the third-level page, that is, the details page of each app, and we can see parameters such as the number of downloads, the