Components¶

Autocontinue¶

Why do we need it?¶

The auto-continue logic was mainly created for one purpose: Testing long files inside Travis CI. As Travis CI has a time limit of 45 minutes, it became vital for us to be able to stop and continue the test from where we were under those 45 minutes. That’s how it started.

Today, - and it might be controversial - it is used by most people who aren’t under a Travis CI container to continue when the machine or tool crashes.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.engine.auto_continue.AutoContinue()!

We log every subject already tested previously and remove them when the same file path is given again.

How to use it?¶

It is activated by default but if not simply change

auto_continue: False

to

auto_continue: True

into your personal .PyFunceble.yaml or use the --auto-continue argument from the CLI to reactivate it.

Autosave¶

Warning

This component is not activated by default.

Why do we need it?¶

This component comes along with the auto-continue one. Indeed, after constructing the logic to auto-continue we needed something to autosave.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.engine.auto_save.AutoSave()!

After a given amount of minutes, we stop the tool, generate the percentage, run a given command (if found), commit all the changes we made to the repository and finally, push to the git repository.

How to use it?¶

For Travis CI and GitLab CI/CD¶

The following (from the configuration) or their equivalent from the CLI are required.

ci: False
ci_autosave_commit: "Your awesome commit message"
ci_autosave_final_commit: "Your awesome final commit message"
ci_autosave_minutes: 15
ci_branch: master

Note

If you give the command index something, we will run it at the end of each commits except the last one.

The command on the last commit is executed based on the given command_before_end index.

Certificate verification¶

Warning

This component is not activated by default.

Why do we need it?¶

You might sometime be sure that every URL tested with PyFunceble have a valid certificate. This what it’s all about!

How does it work?¶

If the certificate is not valid (catched with requests). An INACTIVE status is returned (if this component is activated of course)

How to use it?¶

Simply change

verify_ssl_certificate: False

to

auto_verify_ssl_certificatecontinue: True

into your personal .PyFunceble.yaml or use the --verify-ssl-certificate argument from the CLI to activate it.

Cleaning¶

Why do we need it?¶

Because we constantly need to clean files which are not needed before starting a new test, we embedded our cleaning logic.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.output.clean.Clean()!

It has an internal map of what has to be deleted and how.

How to clean?¶

For a simple clean, run PyFunceble with the --clean argument.

For a complete cleaning, run PyFunceble with the --clean-all argument.

Differences between simple and complete cleaning?¶

The --clean logic cleans the output/ directory along with the pyfunceble_tested table when the mariadb or mysql database type are choosen.

The --clean-all deletes all files generated along with the content of all database tables when the mariadb or mysql database type are choosen.

Warning

--clean-all does not delete the following files (even if generated by us):

.pyfunceble-env
whois_db.json

Warning

--clean-all does not delete the content of the following tables:

pyfunceble_whois

Complements Generation¶

Warning

This component is not activated by default.

Why do we need it?¶

Let’s say we have example.org but www.example.org (or vice-versa) is not into my list. This component (if activated) let us test www.example.org (or vice-versa) even if it’s not into the given list.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.get_complements()!

At the end of the normal test process, we generate the list of complements and test them.

How to use it?¶

You can simply change

generate_complements: False

to

generate_complements: True

into your personal .PyFunceble.yaml or use the --complements argument from the CLI to activate it.

Configuration¶

Why do we need it?¶

As we wanted to be hybrid and allow different modes and options, we introduced the configuration logic.

How does it work?¶

Note

Want to read the configuration loader code ? It’s here PyFunceble.config.load.Load()!

We first look for the .PyFunceble.yaml. If not found, we get/generate it. Then we parse it to the system.

Note

Because we also wanted to get rid of the configuration for an end-user point of view, almost all configuration indexed can be updated from the CLI.

In that case, we update the configuration with the different argument you gives us before parsing it to the system.

Note

If in the future a new configuration key is introduced, you will be asked to choose if you want to merge it into your .PyFunceble.yaml.

In that case, we get a copy of the new one and keep/set all previously set indexes. Which means that you don’t have to care about reconfiguring previously set indexes.

How to configure?¶

Update the .PyFunceble.yaml file or use the CLI.

Custom DNS Server¶

Why do we need it?¶

Some times the testing environment is setup to use DNS-server which isn’t suited for running a PyFunceble test of actually expired or active domains or urls. This could by example be your own DNS-Firewall.

To avoid these situations, the program allows you to setup test DNS-Server.

How does it work?¶

Thanks to python-dns we can parse the given DNS server.

How to use it?¶

By default, PyFunceble will use the system-wide DNS settings. This can be changed with the ability to configure which DNS-Servers you like PyFunceble to use doing the test.

You set this up with the CLI command --dns or insert it into your personal .PyFunceble.yaml

dns_server: null

to

dns_server:
    - "8.8.8.8"
    - "8.8.8.8"

Since v3.0.0 it is possible to assign a specific port to use with the DNS-Server.

Hint

–dns 95.216.209.53:53 116.203.32.67:53 9.9.9.9:853

Warning

If you don’t append a port number, the default DNS port (53) will be used.

Custom User-Agent¶

Why do we need it?¶

As we need to be one in a middle of a connection to a webserver, the custom user agent is there for that!

How does it work?¶

We set the user-agent every time we request something with the http and https protocols.

If a custom user agent is given, it will be used.

Otherwise, every 24 hours, we update our user-agents.json file which will be fetched by your local version to determine the user-agent to use.

How to use it?¶

Simply choose your browser and platform or provide us your custom one!

user_agent:
    browser: chrome
    platform: linux
    custom: null

into your personal .PyFunceble.yaml or use the --user-agent (custom UA) argument from the CLI.

Available Browser¶

Here is a list of available and accepted browsers at this time.

chrome
edge
firefox
ie
opera
safari

Available Platform¶

Here is a list of available and accepted platform at this time.

linux
macosx
win10

What if we don’t give a custom User-Agent?¶

If you don’t set a custom User-Agent, we will try to get the latest one for the chosen browser and platform.

Databases¶

Why do we use “databases”?¶

We use databases to store data while we run the tests. When globally talking about databases, we are indirectly talking about the following subsystems.

Autocontinue
InactiveDB
Mining
WhoisDB

Warning

There is a difference between what we are talking here and the --database argument which only enable/disable the InactiveDB subsystem.

How do we manage them?¶

They consist of simple JSON files which are read and updated on the fly.

Warnings around Database (self) management¶

Warning

If you plan to delete everything and still manage to use PyFunceble in the future, please use the --clean-all argument.

Indeed, it will delete everything which is related to what we generated, except things like the whois database file/table which saves (almost) static data which can be reused in the future.

Deleting, for example, the whois database file/table will just make your test run for a much longer time if you retest subject that used to be indexed into the whois database file/table.

Databases types¶

Since PyFunceble 2.0.0 (equivalent of >=1.18.0.dev), we offer multiple database types which are (as per configuration) json (default), mariadb and mysql.

Why different database types?¶

With the introduction of the multiprocessing logic, it became natural to introduce other database format as it’s a nightmare to update a JSON formatted file.

In order to write or use a JSON formatted database, we have to load it and overwrite it completely. It’s great while working with a single CPU/process but as soon as we get out of that scope it become unmanageable.

How to use the `mysql` or `mariadb` format?¶

Create a new user, password and database (optional) for PyFunceble to work with.
Create a .pyfunceble-env file at the root of your configuration directory.
Complete it with the following content (example)
PYFUNCEBLE_DB_CHARSET=utf8mb4 PYFUNCEBLE_DB_HOST=localhost PYFUNCEBLE_DB_NAME=PyFunceble PYFUNCEBLE_DB_PASSWORD=Hello,World! PYFUNCEBLE_DB_PORT=3306 PYFUNCEBLE_DB_USERNAME=pyfunceble
Note

Since version 2.4.3.dev it is possible to use the UNIX socket for the PYFUNCEBLE_DB_HOST environment variable.

The typical location for mysqld.sock is /var/run/mysqld/mysqld.sock.

This have been done to make

1. It easier to use the socket in conjunction with a supported CI environment/platform.
1. Leaving more space on the IP-stack on local DB installations.
3. The UNIX:SOCKET is usually faster than the IP connection on local runs.
PYFUNCEBLE_DB_CHARSET=utf8mb4 PYFUNCEBLE_DB_HOST=/var/run/mysqld/mysqld.sock PYFUNCEBLE_DB_NAME=PyFunceble PYFUNCEBLE_DB_PASSWORD=Hello,World! PYFUNCEBLE_DB_PORT=3306 PYFUNCEBLE_DB_USERNAME=pyfunceble
Switch the db_type index of your configuration file to mysql or mariadb.
Play with PyFunceble!

Note

If the environment variables are not found, you will be asked to prompt the information.

Directory Structure¶

Why do we need it?¶

As we wanted the end-user to be able to work from everywhere into the filesystem, we created a logic which will create and keep the output/ directory which complies with our code.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.output.constructor.Constructor()!

After each version, the maintainer does a --production which will prepare the repository for production. That has the side effect to map the maintainer version of the output/ directory into a file called dir_structure_production.json.

Once pushed, on the end-user side, when testing for file, that file is downloaded into a file called dir_structure.json which is then used to restore/create a a perfect copy of the output directory the maintainer had when pushing the new version.

Note

If you find yourself in a case that a directory is not found, please try first to delete the dir_structure*.json files to force a resynchronization.

How to generate it manually?¶

You can’t. But using the --dir-structure argument will do the job on purpose.

DNS Lookup¶

Why do we need it?¶

As our main purpose is to check the availability of the given subjects, we make a DNS lookup to determine it.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.lookup.dns.DNSLookup.request()!

For domains¶

In order:

Request the NS record.
If not found, request the A record.
If not found, request the AAAA record.
If not found, request the CNAME record.
If not found, request the DNAME record.

Warning

If none is found, we call the UNIX/C equivalent of getaddrinfo().

For IP¶

We request the PTR record for the IP.

Warning

If none is found, we call the UNIX/C equivalent of gethostbyaddr().

Environment variables¶

Dotenv files¶

Since PyFunceble 2.0.0 (equivalent of PyFunceble-dev >=1.18.0), we load (thanks to python-dotenv) the content of the following files into the (local) list of environment variables.

.env (current directory)
.pyfunceble-env (current directory)
.env (configuration directory)
.pyfunceble-env (configuration directory)

To quote the python-dotenv documentation, a .env should look like the following:

# a comment and that will be ignored.
REDIS_ADDRESS=localhost:6379
MEANING_OF_LIFE=42
MULTILINE_VAR="hello\nworld"

What do we use and why ?¶

Here is the list of environment variables we use and how we use them if they are set.

Environment Variable	How do we use it?
`DEBUG_PYFUNCEBLE`	Same as `PYFUNCEBLE_DEBUG` it’s just present for retro-compatibility.
`DEBUG_PYFUNCEBLE_ON_SCREEN`	Same as `PYFUNCEBLE_DEBUG_ON_SCREEN` it’s just present for retro-compatibility.
`PYFUNCEBLE_AUTO_CONFIGURATION`	Tell us if we have to install/update the configuration file automatically.
`PYFUNCEBLE_DB_CHARSET`	Tell us the MySQL charset to use.
`PYFUNCEBLE_DB_HOST`	Tell us the host or the Unix socket (absolute file path) of the MySQL database.
`PYFUNCEBLE_DB_NAME`	Tell us the name of the MySQL database to use.
`PYFUNCEBLE_DB_PASSWORD`	Tell us the MySQL user password to use.
`PYFUNCEBLE_DB_PORT`	Tell us the MySQL connection port to use.
`PYFUNCEBLE_DB_USERNAME`	Tell us the MySQL user-name to use.
`PYFUNCEBLE_DEBUG`	Tell us to log everything into the `output/logs/*.log` files.
`PYFUNCEBLE_DEBUG_ON_SCREEN`	Tell us to log everything to `stdout`
`PYFUNCEBLE_CONFIG_DIR`	Tell us the location of the directory to use as the configuration directory.
`PYFUNCEBLE_OUTPUT_DIR`	Same as `PYFUNCEBLE_CONFIG_DIR` it’s just present for retro-compatibility.
`PYFUNCEBLE_OUTPUT_LOCATION`	Tell us where we should generate the `output/` directory.
`APPDATA`	Used under Windows to construct/get the configuration directory if `PYFUNCEBLE_CONFIG_DIR` is not found.
`GH_TOKEN`	Tell us the GitHub token to set into the repository configuration when using PyFunceble under Travis CI.
`GL_TOKEN`	Tell us the GitLab token to set into the repository configuration when using PyFunceble under GitLab CI/CD.
`GIT_EMAIL`	Tell us the `git.email` configuration to set when using PyFunceble under any supported CI environment.
`GIT_NAME`	Tell us the `git.name` configuration to set when using PyFunceble under any supported CI environment.
`TRAVIS_BUILD_DIR`	Used to confirm that we are running under a Travis CI container.
`GITLAB_CI`	Used to confirm that we are running under a GitLab CI/CD environment.
`GITLAB_USER_ID`	Used to confirm that we are running under a GitLab CI/CD environment.

Execution time¶

Warning

This component is not activated by default.

Why do we need it?¶

As it is always nice to see how long we worked, we added this logic!

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.cli.execution_time.ExecutionTime()!

It shows the exection time on screen (stdout) and at the end of the output/logs/percentage/percentage.txt file if show_percentage is activated.

How to use it?¶

You can simply change

show_execution_time: False

to

show_execution_time: True

into your personal .PyFunceble.yaml or use the --execution argument from the CLI to activate it.

List filtering¶

Warning

This component is not activated by default.

Why do we need it?¶

While testing for file, you may find yourself in a situation where you only want to test subject which matches a given pattern. That’s what this component do.

How does it work?¶

We scan the list against the given pattern/regex and only test those who match it.

How to use it?¶

You can simply change

filter: ""

to

filter: "\.org"

(for example)

into your personal .PyFunceble.yaml or use the --filter argument from the CLI.

IANA Root Zone Database¶

Why do we need it?¶

We use it to check if an extension is valid/exists.

How does it work?¶

Note

Want to read the parser code ? It’s here PyFunceble.lookup.iana.IANA()!

The root zone database is saved into the iana-domains-db.json file. It is formatted like below and is automatically merged for the end-user before each test run.

{
    "extension": "whois_server"
}

In-app, while testing for a domain, we check if the extension is listed there before doing some extra verifications. If not, domain(s) will be flagged as INVALID.

How to generate it manually?¶

You can’t and should not as we are automatically generating it every 24 hours. But using the --iana argument will do the job on purpose.

Test in/for local hostnames, IPs, components¶

Warning

This component is not activated by default.

Why do we need it?¶

As we may need to test for local hostnames, IPs, components in a local network, this component allows a less aggressive way of syntax validation.

How does it work?¶

We simply use a less aggressive syntax validation so that everything you give us is being tested.

How to use it?¶

Simply change

local:                   False

to

local:                   True

into your personal .PyFunceble.yaml or use the --local argument from the CLI to activate it.

Mining¶

Warning

This component is not activated by default.

Why do we need it?¶

Sometimes you might, for example, want to get the list of domain(s) / URL(s) in a redirecting loop. This feature reveals them.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.engine.mining.Mining()!

We access the given domain/URL and get the redirection history which we then test once we finished the normal test.

Note

This component might evolve with time.

How to use it?¶

You can simply change

mining: False

to

mining: True

into your personal .PyFunceble.yaml or use the --mining argument from the CLI to activate it.

Multiprocessing¶

Warning

This component is not activated by default.

Why do we need it?¶

Many people around the web who talked about PyFunceble were talking about one thing: We take time to run.

This component allows you to use more than one process if your machine has multiple CPU.

Note

If you use this component you have to consider some limits:

Your connection speed.
Your machine.

You might not even see a speed if one of both is slow or very slow.

The following might not be touched by the limits but it really depends:

URL availability test.
Syntax test.
Test with DNS LOOKUP only - without WHOIS.

How does it work?¶

We test multiple subjects at the same time over several processes (1 process = 1 subject tested) and generate our results normally.

Note

While using the JSON format for the database you might have to wait a bit at the very end as we need to merge all data we generated across the past created processes.

Therefore, we recommend using the MySQL/MariaDB format which will get rid of that as everything is saved/synchronized at an almost real-time scale.

How to use it?¶

Activation¶

You can simply change

multiprocess: False

to

multiprocess: True

Number of processes to create¶

Simply update the default value of

maximal_processes: 25

Warning

If you do not explicitly set the --processes argument, we overwrite the default to the number of available CPU.

Warning

If this value is less than 2, the system will automatically deactivate the multiprocessing.

Merging mode¶

2 merging cross process (data) merging mode are available:

end

live

With the end mode, we merge all data at the very end of the current instance. With the live mode, we merge all data while testing.

Simply update the default value of

multiprocess_merging_mode: end

to the mode you want.

Outputed Files¶

Note

This section does not cover the logs files.

Why do we need it?¶

We need a way to deliver our results.

How does it work?¶

After testing a given subject, we generate its output file based on what’s needed.

Host format¶

This is the default output file.

A line is formatted like 0.0.0.0 example.org.

Note

A custom IP can be set with the help of the custom_ip index or the -ip argument from the CLI.

Don’t need it? Simply change

generate_hosts: True

to

generate_hosts: False

into your personal .PyFunceble.yaml or use the --hosts argument from the CLI to deactivate it.

Plain format¶

A line is formatted like example.org. . Need it? Simply change

plain_list_domain: False

to

plain_list_domain: True

into your personal .PyFunceble.yaml or use the --plain argument from the CLI to activate it.

JSON format¶

Need it? Simply change

generate_json: False

to

generate_json: True

into your personal .PyFunceble.yaml or use the --json argument from the CLI to activate it.

Percentage¶

Warning

This component is activated by default while testing files.

Note

The percentage doesn’t show up - by design - while testing for single domains (whilst using --domain).

Why do we need it?¶

We need it in order to get information about the amount of data we just tested.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.output.percentage.Percentage()!

Regularly or at the very end of a test we get the number of subjects for each status along with the number of tested subjects. We then generate and print the percentage calculation on the screen (stdout) and into output/logs/percentage/percentage.txt

How to use it?¶

It is activated by default, but if not please update

show_percentage: False

to

show_percentage: True

into your personal .PyFunceble.yaml or use the --percentage argument from the CLI to reactivate it.

The Public Suffix List¶

Why do we need it?¶

We use it in the process of checking the validity of domains.

How does it work?¶

Note

Want to read the parser code ? It’s here PyFunceble.lookup.publicsuffix.PublicSuffix()!

The copy of the public suffix list we use is saved into the public-suffix.json file. It is formatted like below and is automatically merged for the end-user before each test run.

{
    "extension": [
        "suffix1.extension",
        "suffix2.extension",
        "suffix3.extension"
    ]
}

In-app, while testing for domain(s), we use it in order to know if we are checking for a subdomain or not.

How to generate it manually?¶

You can’t and should not as we are automatically generating it every 24 hours. But, using the --public-suffix argument will do the job on purpose.

Sorting¶

Note

While using the multiprocessing option, the data are tested as given.

Why do we need it?¶

Because sorted is better, we sort by default!

How does it work?¶

Note

Want to read the code ? It’s here: Sort()!

Alphabetically¶

This is the default one. The default python sorted() function is used for that purpose.

Hierarchically¶

The objective of this is to provide sorting by service/domains.

Note

This is a simplified version of what we do.

Let’s say we have aaa.bbb.ccc.tdl.

Note

The TDL part is determined. Indeed we first look at the IANA Root Zone database, then at the Public Suffix List.
Let’s split the points. We then get a list [aaa, bbb, ccc, tdl]
Put the TDL first. It will gives us [tdl, aaa, bbb, ccc]
Reverse everything after the TDL. It will gives us [tdl, ccc, bbb, aaa].
Get the string to use for sorting. It will gives us tdl.ccc.bbb.aaa.

How to activate the hierarchical sorting?¶

Simply change

hierarchical_sorting: False

to

hierarchical_sorting: True

into your personal .PyFunceble.yaml or use the --hierarchical argument from the CLI to activate it.

Whois Lookup¶

Note

While testing using PyFunceble, subdomains, IPv4 and IPv6 are not used against our whois lookup logic.

Why do we need it?¶

As our main purpose is to check the availability of the given subjects, we make a WHOIS lookup (if authorized) to determine it.

How does it work?¶

Note

Want to read the code ? It’s here PyFunceble.lookup.whois.WhoisLookup()!

For us the only relevant part is the extraction of the expiration date. Indeed, it’s an indicator if a domains is still owned by someone, we use it first to get the availability of domains.

How to use it?¶

It is activated by default but if not simply change

no_whois: True

to

no_whois: False

into your personal .PyFunceble.yaml or use the --no-whois argument from the CLI to reactivate it.

Event	Shared	URL
No WHOIS server (referer) is found.	The extension of the currently tested domain.	`https://pyfunceble.funilrys.com/api/no-referer`
The expiration date is not correctly formatted.	The extracted expiration date. The currently tested domain. The currently used WHOIS server (DNS) name.	`https://pyfunceble.funilrys.com/api/date-format`

Components¶

AdBlock/Filter list decoding¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Autocontinue¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Autosave¶

Why do we need it?¶

How does it work?¶

How to use it?¶

For Travis CI and GitLab CI/CD¶

Certificate verification¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Cleaning¶

Why do we need it?¶

How does it work?¶

How to clean?¶

Differences between simple and complete cleaning?¶

Complements Generation¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Configuration¶

Why do we need it?¶

How does it work?¶

How to configure?¶

Custom DNS Server¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Custom User-Agent¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Available Browser¶

Available Platform¶

What if we don’t give a custom User-Agent?¶

Databases¶

Why do we use “databases”?¶

How do we manage them?¶

Warnings around Database (self) management¶

Databases types¶

Why different database types?¶

How to use the mysql or mariadb format?¶

Directory Structure¶

Why do we need it?¶

How does it work?¶

How to generate it manually?¶

DNS Lookup¶

Why do we need it?¶

How does it work?¶

For domains¶

For IP¶

Environment variables¶

Dotenv files¶

What do we use and why ?¶

Execution time¶

Why do we need it?¶

How does it work?¶

How to use it?¶

List filtering¶

Why do we need it?¶

How does it work?¶

How to use it?¶

IANA Root Zone Database¶

Why do we need it?¶

How does it work?¶

How to generate it manually?¶

Test in/for local hostnames, IPs, components¶

Why do we need it?¶

How does it work?¶

How to use it?¶

Logs Sharing¶

Why do we need it?¶

What do we share/collect?¶

How to use the `mysql` or `mariadb` format?¶