Added README.md
This commit is contained in:
commit
d2eb7efa38
36
README.md
Normal file
36
README.md
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
# Help Center Spider
|
||||||
|
## About
|
||||||
|
This is a spider tool with which you can visit all links on https://docs.otc.t-systems.com to find urls that are not correct.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
After you cloned the repository you need to prepare an environment to run the tool. You can easily do this with
|
||||||
|
python virtual environment:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cd <local_folder>/
|
||||||
|
$ python -m venv venv/
|
||||||
|
$ python -m pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
In _config.json_ you can define a couple items:
|
||||||
|
|
||||||
|
- _watchdog_file_: if you run the tool in the background and want to stop it properly (not using `kill`),
|
||||||
|
just send an exit message into the watchdog file: `echo exit > watchdog.fifo`
|
||||||
|
- _timer_runtime_: maximum runtime limit in seconds
|
||||||
|
- _log_dir_: logging folder
|
||||||
|
- _logging_interval_: frequency of dumping log files
|
||||||
|
- _workers_: number of workers (background processes) you want to run. If you set to 0 it will count from the number of cores (_number_of_cores_ - 1)
|
||||||
|
- _starting_point_: base url where to start
|
||||||
|
|
||||||
|
## How to run
|
||||||
|
There are two ways to do it
|
||||||
|
|
||||||
|
### In foreground
|
||||||
|
`$ python main.py`
|
||||||
|
|
||||||
|
### In background
|
||||||
|
`$ nohup python main.py > log/hc_spider.log 2> log/hc_spider.err <&- &`
|
||||||
|
|
||||||
|
In case you running the tool in background you can stop the execution with `$ echo exit > <watchdog_file>`
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user