Crawler¶

The crawler plugin allows you to crawl websites and extract data from them without using a browser.

Installation¶

PythonJava

pip install botcity-crawler-plugin

Linux System Dependencies

If you intend to use this package with Linux and JavaScript there are system dependencies that are required to be installed.

For Debian/Ubuntu please run the following command:

apt install libxcomposite1 libxcursor1 libxdamage1 \
libxfixes3 libxi6 libxtst6 libnss3 libnspr4 libcups2 \
libdbus-1-3 libxrandr2 libasound2 libatk1.0-0 libatk-bridge2.0-0 \
libgtk-3-0 libx11-xcb1 --no-install-recommends

Please make sure to install the equivalent libraries for your Linux distribution.

Importing the Plugin¶

After you installed this package, the next step is to import the package into your code and start using the functions.

PythonJava

from botcity.plugins.crawler import BotCrawlerPlugin

Making the Request¶

To make the request you must use the request method which takes as an argument a URL.

Python

# Instantiate the plugin and enable JavaScript
crawler = BotCrawlerPlugin(javascript_enabled=True)

url = "https://www.youtube.com/c/BotCityComputerVisionAutomationRPA"

# Make the request
html = crawler.request(url)

Locating an Element¶

Looking into the page source from the previous example, we can notice that the element holding the subscribers information has the attribute id as subscriber-count.

Here is how we can read the value of the element:

PythonJava

# This sets the current element on the HTML object to the one found
html.get_element_by_id("subscriber-count")

# Read the value into the subscribers variable
subscribers = html.value()