2023-10-09 16:06:05 +02:00
2023-10-10 09:58:18 +02:00
2023-10-09 14:23:37 +02:00
2024-03-24 00:38:41 +00:00
2023-10-09 14:32:49 +02:00

Help Center Spider

The Open Telekom Cloud Helpcenter Spider is a spider tool visiting all links starting from its landing page on https://docs.otc.t-systems.com/ to find and identify urls that are not correct. It parses all types of hyperlinks and normalizes them into a canonical format. The spider descents into the document tree via […] bredth or width first search. [and does what?] [when is logged which event?]

Getting started

Once you installed the code and its required packages into an virtual environment and checked its configuration file config.json, the web spider starts invoking the tool without any arguments. Results are listed in [… TBD].

Requirements and Installation

After you cloned this repository you need to prepare an environment to run the tool. You can easily do this with a Python virtual environment:

$ cd _local_folder_/
$ git clone https://gitea.eco.tsi-dev.otc-service.com/infra/hc-spider.git
$ cd hc-spider
$ python -m venv venv/
$ source venv/bin/activate
(venv)$ python -m pip install -r requirements.txt

Configuration

In config.json you can define several items:

  • watchdog_file: if you run the tool in the background and want to stop it properly (without sending a signal with kill), just send an exit message into the watchdog file: echo exit > watchdog.fifo.
  • timer_runtime: maximum runtime limit in seconds.
  • log_dir: logging folder.
  • logging_interval: frequency of dumping log files.
  • workers: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (number_of_cores - 1)
  • starting_point: base url where to start

Operations

There are two ways to start the spider:

In the foreground

$ source venv/bin/activate
(venv)$ python main.py

In the background

$ source venv/bin/activate
(venv)$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- &

Stopping the process polietely

To stop the tool when run in the background, send a command to the control fifo with: (venv)$ echo exit > _watchdog_file_

Description
No description provided
Readme 300 KiB
Languages
Python 100%