Commit Briefs

0dd744df1d Antoine Lambert

tests: Fix numerous tests after recent changes in loader-core (master)

BaseLoader.load now returns a dict with an extra error field when a loading fails.


48d0acfaa1 Renaud Boyer

Fix Commit.extra is deprecated


07211b5207 Antoine Lambert

Fix some formatting after black version bump


00cc6147c8 David Douard

Apply swh-py-template v0.2.3


3f289f0189 Antoine Lambert

tests: Fix mocking of sleep calls with tenacity 8.4.2

Latest tenacity release adds some internal changes that broke the mocking of sleep calls in tests. Fix it by directly mocking time.sleep (was not working previously).


495e2cb468 David Douard

Replace usage of (deprecated) dir_filter by path_filter in Directory.from_disk()

as well as in GitCheckoutLoader.


39d38c4e1e Antoine Lambert

test_loader: Fix implementation of test_loader_with_ref_delta_in_pack

Previous implementation was building an invalid pack file with REF_DELTA object types as it was using the new object to deltify as the base of the delta. This was leading to errors and undefined behavior after building an index for such a pack file as the deltified objects could not be properly resolved by dulwich (observed by stsp while working on git loader improvements). The bases for deltified objects are now objects that were previously loaded into the archive. Tag objects produced in that test are also ensured to be valid.


a8a5077aee Antoine Lambert

loader: Ensure to fetch latest snapshot produced by a git visit type

SWH data model allows an origin to have multiple visit types, in particular a git origin can have visit types 'git' and 'git-checkout'. We must ensure to retrieve the latest snapshot for a git visit type in the git loader implementation as it can break incremental loading of a git origin having both visit types mentioned above. Indeed a 'git-checkout' visit type produces a snapshot with a single branch while a 'git' visit type produces a snapshot containing all branches of the loaded repository. Previously, if the latest snapshot retrieved was produced by a 'git-checkout' visit type, the loader would refetch all branches and associated git objects while most of them have already been archived. Related to swh/meta#5092.


93d43596df Antoine Lambert

requirements-test: Add missing swh.loader.core[testing] dependency

Side effect of swh.loader.core v5.18.0 release.



b15a37c163 David Douard

Apply swh-py-template v0.2.0


fe21f18737 Antoine Lambert

dumb: Fix typo in URL to check protocol support


ba40d01131 Antoine Lambert

dumb: Handle HEAD file legacy format

Some dumb git servers can send a HEAD file in a legacy format that contains a commit id instead of the string: "ref: <ref_name>". So handle that edge case to avoid an error when loading such repository.


b7d16897c2 Antoine Lambert

dumb: Synchronize fetch_pack behavior with smart loader

As with the smart git loader, restrain the maximum size for a pack file to download. Move the code writing pack data bytes and checking size in an utility class to avoid code duplication. Add missing tests covering the cases where the pack size limit is reached.


64ac020485 Antoine Lambert

dumb: Fix streaming of HTTP responses

When using the requests library to perform HTTP requests, if responses need to be streamed the stream parameter must be set to True to ensure content is downloaded by chunks. Previously, a whole HTTP response was cached in memory which could lead to OOM errors when dealing with a repository with large pack files.


038c094d28 Antoine Lambert

test_directory: Fix failures after nar extid version bump

Related to swh/devel/swh-loader-core@c9b51f8.


81a9a907eb Antoine Lambert

tox: Bump mypy to 1.8.0

Related to swh/meta#5075.


518af2ad44 Nicolas Dandrimont

Add type hints for urllib3


a0b6043ad8 Nicolas Dandrimont

Add INFO-level logging every few minutes while loading

Git loading tasks can take a pretty long time, and it's not easy to diagnose if it's stuck or if it's just taking a while. Instead of only logging at the end of the task, print a log line after each object type has been fully processed. Also print a log line every 3 minutes while objects are being processed.


a742d967b1 Nicolas Dandrimont

loader: add some logging during packfile fetching

The packfile fetching operation can take a long time. Send one log line every minute while it progresses.


4c9b38eda9 Nicolas Dandrimont

loader: Push remote messages to a logger instead of stderr

Instead of dumping the dulwich remote communication stream to stderr, add a separate logger for remote messages, and handle the remote stream as proper log entries.


8cc7eb12ea Nicolas Dandrimont

loader: add option to skip certificate verification

This hooks into the right urllib3 and requests settings for both the smart and dumb loader.


8848bd310b Nicolas Dandrimont

loader: add shortcuts for the connect and read timeouts

This sets the connect and read timeout for both the smart loader (via urllib3/dulwich) and for the dumb loader (via requests).


c6b2b577a7 Nicolas Dandrimont

dumb loader: add support for extra requests kwargs

This is useful to override the default settings of the requests Session, e.g. certificate verification of connect/read timeouts.


f51d542ff4 Nicolas Dandrimont

loader: add support for extra urllib3 kwargs

This is useful to override the default settings of the dulwich urllib3 adapter, e.g. certificate verification of connect/read timeouts.