add details to user documentation

This commit is contained in:
Nils Magnus 2024-03-24 00:38:41 +00:00
parent 7e6d4ee581
commit 3f485011b8

View File

@ -1,42 +1,61 @@
# Help Center Spider # Help Center Spider
## About
This is a spider tool with which you can visit all links on https://docs.otc.t-systems.com to find urls that are not correct.
## Requirements The Open Telekom Cloud Helpcenter Spider is a spider tool visiting all
After you cloned the repository you need to prepare an environment to run the tool. You can easily do this with links starting from its landing page on https://docs.otc.t-systems.com/
python virtual environment: to find and identify urls that are not correct. It parses all types of
hyperlinks and normalizes them into a canonical format. The spider
descents into the document tree via [...] bredth or width first search.
[and does what?] [when is logged which event?]
## Getting started
Once you installed the code and its required packages into an virtual
environment and checked its configuration file `config.json`, the web
spider starts invoking the tool without any arguments. Results are
listed in [... TBD].
## Requirements and Installation
After you cloned this repository you need to prepare an environment to
run the tool. You can easily do this with a Python virtual environment:
``` ```
$ cd <local_folder>/ $ cd _local_folder_/
$ git clone https://gitea.eco.tsi-dev.otc-service.com/infra/hc-spider.git
$ cd hc-spider
$ python -m venv venv/ $ python -m venv venv/
$ source venv/bin/activate $ source venv/bin/activate
(venv)$ python -m pip install -r requirements.txt (venv)$ python -m pip install -r requirements.txt
``` ```
## Configuration ## Configuration
In _config.json_ you can define a couple items: In _config.json_ you can define several items:
- _watchdog_file_: if you run the tool in the background and want to stop it properly (not using `kill`), - _watchdog_file_: if you run the tool in the background and want to
just send an exit message into the watchdog file: `echo exit > watchdog.fifo` stop it properly (without sending a signal with `kill`), just send
- _timer_runtime_: maximum runtime limit in seconds an exit message into the watchdog file: `echo exit > watchdog.fifo`.
- _log_dir_: logging folder - _timer_runtime_: maximum runtime limit in seconds.
- _logging_interval_: frequency of dumping log files - _log_dir_: logging folder.
- _workers_: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (_number_of_cores_ - 1) - _logging_interval_: frequency of dumping log files.
- _workers_: number of workers (background processes) you want to run.
If you set to 0 it will count from the number of cores
(_number_of_cores_ - 1)
- _starting_point_: base url where to start - _starting_point_: base url where to start
## How to run ## Operations
There are two ways to do it There are two ways to start the spider:
### In foreground ### In the foreground
``` ```
$ source venv/bin/activate $ source venv/bin/activate
$ python main.py (venv)$ python main.py
``` ```
### In background ### In the background
``` ```
$ source venv/bin/activate $ source venv/bin/activate
$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- & (venv)$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- &
``` ```
In case you running the tool in background you can stop the execution with `$ echo exit > <watchdog_file>` ### Stopping the process polietely
To stop the tool when run in the background, send a command to the
control fifo with: `(venv)$ echo exit > _watchdog_file_`