2023-10-09 14:24:07 +02:00
2023-10-09 14:23:37 +02:00
2023-10-09 14:19:20 +02:00

Help Center Spider

About

This is a spider tool with which you can visit all links on https://docs.otc.t-systems.com to find urls that are not correct.

Requirements

After you cloned the repository you need to prepare an environment to run the tool. You can easily do this with python virtual environment:

$ cd <local_folder>/
$ python -m venv venv/
$ python -m pip install -r requirements.txt

Configuration

In config.json you can define a couple items:

  • watchdog_file: if you run the tool in the background and want to stop it properly (not using kill), just send an exit message into the watchdog file: echo exit > watchdog.fifo
  • timer_runtime: maximum runtime limit in seconds
  • log_dir: logging folder
  • logging_interval: frequency of dumping log files
  • workers: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (number_of_cores - 1)
  • starting_point: base url where to start

How to run

There are two ways to do it

In foreground

$ python main.py

In background

$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- &

In case you running the tool in background you can stop the execution with $ echo exit > <watchdog_file>

Description
No description provided
Readme 300 KiB
Languages
Python 100%