# Help Center Spider ## About This is a spider tool with which you can visit all links on https://docs.otc.t-systems.com to find urls that are not correct. ## Requirements After you cloned the repository you need to prepare an environment to run the tool. You can easily do this with python virtual environment: ``` $ cd / $ python -m venv venv/ $ python -m pip install -r requirements.txt ``` ## Configuration In _config.json_ you can define a couple items: - _watchdog_file_: if you run the tool in the background and want to stop it properly (not using `kill`), just send an exit message into the watchdog file: `echo exit > watchdog.fifo` - _timer_runtime_: maximum runtime limit in seconds - _log_dir_: logging folder - _logging_interval_: frequency of dumping log files - _workers_: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (_number_of_cores_ - 1) - _starting_point_: base url where to start ## How to run There are two ways to do it ### In foreground ``` $ source venv/bin/activate $ python main.py ``` ### In background ``` $ source venv/bin/activate $ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- & ``` In case you running the tool in background you can stop the execution with `$ echo exit > `