Commits
- Commit:
0dd744df1d1192d4eeb3fbe38c96cce76bda7376
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
tests: Fix numerous tests after recent changes in loader-core
BaseLoader.load now returns a dict with an extra error field when
a loading fails.
- Commit:
48d0acfaa13b01fa5b9dd9d9ea42ffdfb7daf86d
- From:
- Renaud Boyer <renaud.boyer@sofwareheritage.org>
- Date:
Fix Commit.extra is deprecated
- Commit:
07211b52071c4554454e1b4ddfa01e697261c813
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
Fix some formatting after black version bump
- Commit:
00cc6147c8d60a87b626f52cd846048fc4ab6130
- From:
- David Douard <david.douard@sdfa3.org>
- Date:
Apply swh-py-template v0.2.3
- Commit:
3f289f0189fb4857cc9ac9b79be61bc3ba5abcb6
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
tests: Fix mocking of sleep calls with tenacity 8.4.2
Latest tenacity release adds some internal changes that broke the
mocking of sleep calls in tests.
Fix it by directly mocking time.sleep (was not working previously).
- Commit:
495e2cb4689a1f122c23ea2bf685610e6c2bf58e
- From:
- David Douard <david.douard@sdfa3.org>
- Date:
Replace usage of (deprecated) dir_filter by path_filter in Directory.from_disk()
as well as in GitCheckoutLoader.
- Commit:
39d38c4e1ee888c5d978dd6b689987dc9ff43740
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
test_loader: Fix implementation of test_loader_with_ref_delta_in_pack
Previous implementation was building an invalid pack file with REF_DELTA
object types as it was using the new object to deltify as the base of the
delta.
This was leading to errors and undefined behavior after building an index
for such a pack file as the deltified objects could not be properly resolved
by dulwich (observed by stsp while working on git loader improvements).
The bases for deltified objects are now objects that were previously loaded
into the archive.
Tag objects produced in that test are also ensured to be valid.
- Commit:
a8a5077aeedfaab85ff962970f06d4fba99da514
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
loader: Ensure to fetch latest snapshot produced by a git visit type
SWH data model allows an origin to have multiple visit types, in particular
a git origin can have visit types 'git' and 'git-checkout'.
We must ensure to retrieve the latest snapshot for a git visit type in the
git loader implementation as it can break incremental loading of a git origin
having both visit types mentioned above.
Indeed a 'git-checkout' visit type produces a snapshot with a single branch
while a 'git' visit type produces a snapshot containing all branches of the
loaded repository. Previously, if the latest snapshot retrieved was produced
by a 'git-checkout' visit type, the loader would refetch all branches and
associated git objects while most of them have already been archived.
Related to swh/meta#5092.
- Commit:
93d43596df259fc820e027bda298946038243735
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
requirements-test: Add missing swh.loader.core[testing] dependency
Side effect of swh.loader.core v5.18.0 release.
- Commit:
71295d087a44ad0ebfbd78ec4d6d9a8e49a9645b
- From:
- Pierre-Yves David <pierre-yves.david@ens-lyon.org>
- Date:
model: adapt to the renaming of model.TargetType to model.SnapshotTargetType
- Commit:
b15a37c16318dbf1784f9762b4f978431b3eebb9
- From:
- David Douard <david.douard@sdfa3.org>
- Date:
Apply swh-py-template v0.2.0
- Commit:
fe21f18737c77ec9ab00a0c1f022b37de6ee7224
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
dumb: Fix typo in URL to check protocol support
- Commit:
ba40d01131a94614deca71197688d61ec9052529
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
dumb: Handle HEAD file legacy format
Some dumb git servers can send a HEAD file in a legacy format that
contains a commit id instead of the string: "ref: <ref_name>".
So handle that edge case to avoid an error when loading such repository.
- Commit:
b7d16897c24c7d5a0e28d952b027401b37417f92
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
dumb: Synchronize fetch_pack behavior with smart loader
As with the smart git loader, restrain the maximum size for a pack file
to download.
Move the code writing pack data bytes and checking size in an utility
class to avoid code duplication.
Add missing tests covering the cases where the pack size limit is reached.
- Commit:
64ac020485d0968eeeff96292515aa8bb2d858c7
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
dumb: Fix streaming of HTTP responses
When using the requests library to perform HTTP requests, if responses
need to be streamed the stream parameter must be set to True to ensure
content is downloaded by chunks.
Previously, a whole HTTP response was cached in memory which could lead
to OOM errors when dealing with a repository with large pack files.
- Commit:
038c094d28759500e77961be2c30a9d0d5b3df84
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
test_directory: Fix failures after nar extid version bump
Related to swh/devel/swh-loader-core@c9b51f8.
- Commit:
81a9a907eb5ab1bca0c7e0754267a24134dbab1e
- From:
- Antoine Lambert <anlambert@softwareheritage.org>
- Date:
tox: Bump mypy to 1.8.0
Related to swh/meta#5075.
- Commit:
518af2ad44dcf4e6b431e0bd1d9635effd2c887f
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
Add type hints for urllib3
- Commit:
a0b6043ad8bb7d1709d24648ebe155d125373bf9
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
Add INFO-level logging every few minutes while loading
Git loading tasks can take a pretty long time, and it's not easy to diagnose if
it's stuck or if it's just taking a while.
Instead of only logging at the end of the task, print a log line after
each object type has been fully processed. Also print a log line every 3
minutes while objects are being processed.
- Commit:
a742d967b190b295de060fea86e73294009c5c5f
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
loader: add some logging during packfile fetching
The packfile fetching operation can take a long time. Send one log line
every minute while it progresses.
- Commit:
4c9b38eda9a90a091d25041dc45c8669adf9639e
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
loader: Push remote messages to a logger instead of stderr
Instead of dumping the dulwich remote communication stream to stderr,
add a separate logger for remote messages, and handle the remote stream
as proper log entries.
- Commit:
8cc7eb12ea0d2cd26be310c5b02e32ac6a47b5a9
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
loader: add option to skip certificate verification
This hooks into the right urllib3 and requests settings for both the
smart and dumb loader.
- Commit:
8848bd310b2841ff7f8c0bde78f1339b09d57c63
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
loader: add shortcuts for the connect and read timeouts
This sets the connect and read timeout for both the smart loader (via
urllib3/dulwich) and for the dumb loader (via requests).
- Commit:
c6b2b577a7ff49c11c5d1e46162c0acba48305d5
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
dumb loader: add support for extra requests kwargs
This is useful to override the default settings of the requests Session,
e.g. certificate verification of connect/read timeouts.
- Commit:
f51d542ff43af954555c33e192f9a496f4fe11d6
- From:
- Nicolas Dandrimont <nicolas@dandrimont.eu>
- Date:
loader: add support for extra urllib3 kwargs
This is useful to override the default settings of the dulwich urllib3
adapter, e.g. certificate verification of connect/read timeouts.