Commit Briefs
loader: Ensure to fetch latest snapshot produced by a git visit type
SWH data model allows an origin to have multiple visit types, in particular a git origin can have visit types 'git' and 'git-checkout'. We must ensure to retrieve the latest snapshot for a git visit type in the git loader implementation as it can break incremental loading of a git origin having both visit types mentioned above. Indeed a 'git-checkout' visit type produces a snapshot with a single branch while a 'git' visit type produces a snapshot containing all branches of the loaded repository. Previously, if the latest snapshot retrieved was produced by a 'git-checkout' visit type, the loader would refetch all branches and associated git objects while most of them have already been archived. Related to swh/meta#5092.
requirements-test: Add missing swh.loader.core[testing] dependency
Side effect of swh.loader.core v5.18.0 release.
dumb: Retry HTTP requests in case of throttling or temporary failures
Some network issues can happen when loading a git repository using the dump protocol so add HTTP retry feature to the GitObjectsFetcher class.
tasks: Simplify implementation and make visit_date parameter optional
Recent changes in swh-scheduler add new parameters to the celery tasks produced from swh.scheduler.model.ListedOrigin instances. So ensure to handle any new parameters by not hardcoding the expected ones in task signatures. Rename date parameter to visit_date in from disk loader tasks and make it non mandatory. Add new tests checking task parameters produced from ListedOrigin instances do no raise error when attempting to create a git loader. Related to T4187
Migrate loader tests to use pytest
This simplifies the base classes indirection introduced to make the tests run. It keeps the unittest scaffolding to allow declaring tests once and reuse amongst the different loader instances (GitLoader, GitLoaderFromDisk, GitLoaderFromArchive). The code coverage increased from 82% to 85%. Related to T2482