-
Notifications
You must be signed in to change notification settings - Fork 409
Comparing changes
Open a pull request
base repository: skorch-dev/skorch
base: v0.14.0
head repository: skorch-dev/skorch
compare: v0.15.0
- 16 commits
- 31 files changed
- 4 contributors
Commits on Jun 26, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 468a9c6 - Browse repository at this point
Copy the full SHA 468a9c6View commit details
Commits on Jun 27, 2023
-
This commit fixes the release/deploy script (#987)
Before it was not possible to execute this script without errors as the conda update was broken. I moved the activation of the conda env to `conda run` which is apparently the recommended way.
Configuration menu - View commit details
-
Copy full SHA for 8a1fd6d - Browse repository at this point
Copy the full SHA 8a1fd6dView commit details -
Instead of 0.15.0.dev0, in case we need a patch release.
Configuration menu - View commit details
-
Copy full SHA for 6e63ed9 - Browse repository at this point
Copy the full SHA 6e63ed9View commit details
Commits on Jun 28, 2023
-
Exclude _version.py from coverage report (#988)
We vendor the code in _version.py from scipy. It is thus not necessary for us to have tests for it. Still, it was included in the coverage report, resulting in artificially lowered test coverage. This fix removes the module from being counted for coverage.
Configuration menu - View commit details
-
Copy full SHA for c6c2ef5 - Browse repository at this point
Copy the full SHA c6c2ef5View commit details
Commits on Jun 29, 2023
-
Add option to globally override callback caching (#971)
Resolves #957 (look there for motivation) Description I went with the option to add an argument to the NeuralNet class itself that can override the caching behavior of the scoring callbacks. This seemed more straightforward than going with a context manager, which is not a pattern we typically require of skorch users. The disadvantage is that we have yet one more parameter on the NeuralNet. By default, the caching behavior of the callbacks is not changed, i.e. this should be fully backwards compatible. Implementation Ideally, I would have liked to implement this in a way that any new (or user defined) callback don't have to do anything special to honor the parameter. However, this is not really possible. Although I moved the logic of whether to use inference caching inside of skorch.callbacks.scoring._cache_net_forward_iter, so that the global override is automatically taken into account here, there are other places in the callbacks whose behavior changes depending on the use of caching (e.g. target extraction). The check for whether there is a caching override thus has to be performed multiple times.
Configuration menu - View commit details
-
Copy full SHA for 48cb4a1 - Browse repository at this point
Copy the full SHA 48cb4a1View commit details
Commits on Jul 6, 2023
-
MNT: Update CI, add Python 3.11, fix sklearn issues (#990)
* Update checkout and setup-python GH actions * Add Python 3.11 to CI (only PyTorch 1.13) * Add pytest-pretty to CI * Increase columns in GH logs for better readability * Change check for sklearn error message * Make data_from_dataset work with TensorDataset Details Something somewhere in sklearn changed that now requires the presence of the classes_ attribute on classifiers when scoring. That attribute is not always present. Now, more of an effort is made to infer the classes when a TensorDataset is passed. An sklearn error message that we checked was changed, resulting in the corresponding tests to fail. This commit fixes the error message checked in the test (which was already quite defensive beforehand).
Configuration menu - View commit details
-
Copy full SHA for 8c1e66a - Browse repository at this point
Copy the full SHA 8c1e66aView commit details
Commits on Jul 19, 2023
-
Update docs after accelerate fix was released (#995)
There was an issue with using grid search and other sklearn methods that involve cloning in conjunction with accelerate in a multi-gpu setting. This was due to sklearn creating copies of the Accelerator object. The skorch docs proposed a fix, but this fix is now released in accelerate itself. Therefore, the proposal can be safely removed. Another minor change was to adjust the criterion in an example script. This is not essential for the script to work, but prevents nan losses.
Configuration menu - View commit details
-
Copy full SHA for 549f3e6 - Browse repository at this point
Copy the full SHA 549f3e6View commit details
Commits on Jul 26, 2023
-
Add support for safetensors (#970)
Description net.save_params and net.load_params now support passing use_safetensors=True, which will result in the underlying state_dict being serialized/deserialized using safetensors instead of torch.save and torch.load (which both rely on pickle). Similarly, Checkpoint, TrainEndCheckpoint, and LoadInitState now support use_safetensors=True. By default, nothing changes for the user, safetensors is opt in. Motivation safetensors has a couple of advantages over pickle, which are described [here](https://github.com/huggingface/safetensors/#yet-another-format-). Most notably, it doesn't have the security issues that pickle has. Recently, it has even been [audited](https://huggingface.co/blog/safetensors-security-audit) and found to be safe. Furthermore, it is growing in popularity, being natively supported by Hugging Face Hub for instance, and it is compatible with most major frameworks like tensorflow. Implementation Using safetensors requires users to install the library. I haven't added it as a default dependency, to keep those lean. I also haven't added any special error messages when safetensors is not installed, I think the standard Python message is sufficient. The code changes have been mostly very straightforward. A small caveat is that, contrary to using torch.load/torch.save, this does not work for serializing the optimizer. That is because safetensors is only capable of serializing tensors, but the optimizer state_dict contains other stuff. If users absolutely need to save the optimizer, they have to use pickle for that. I was entertaining the idea of inferring the serialization format by the file name, enabling something like: net = ... net.save_params(f_params='module.safetensors', f_optimizer='optimizer.pkl') i.e. the module to be stored with safetensors and the optimizer with pickle. But this would add quite some complexity and ambiguity for very little benefit. The user can just make a separate call to `save_params` to save the optimizer without safetensors instead. Similarly, they can add a second Checkpoint callback for the optimizer. I added a check in the callbacks to see if f_optimizer is being set when use_safetensors=True, in which case I raise an error. This seemed necessary to me because it is set there by default and because these callbacks intercept errors very broadly when saving params, so it can be easy for users to miss when the optimizer is not actually being saved (there is a message printed if net.verbose but this is easy to miss IMO). Most of the changes were required for testing. There, I had to use quite a few if...else because I needed to exclude the optimizer each time that safetensors is used. I also made two unrelated changes: - For Checkpoint, I moved a docstring argument lower to reflect its order in the argument list. - I deleted the pickle_dump_mock fixture in TestTrainEndCheckpoint, which wasn't being used.
Configuration menu - View commit details
-
Copy full SHA for e83e6a4 - Browse repository at this point
Copy the full SHA e83e6a4View commit details
Commits on Aug 1, 2023
-
FIX: typo in notebook name (#1001)
Was "tranformer", not "transformer".
Configuration menu - View commit details
-
Copy full SHA for ca1bae6 - Browse repository at this point
Copy the full SHA ca1bae6View commit details
Commits on Aug 2, 2023
-
Better error message when NeuralNetClassifier misses the classes_ att…
…ribute (#1004) On NeuralNetClassifier, when classes_ is accessed but not known, we try to use classes_inferred_ instead. If that is also not defined, users currently just get an AttributeError that classes_inferred_ does not exist. This can be very confusing. This PR intercepts the error, provides a (hopefully) helpful error message, then reraises. It is unfortunately not easily possible to deduce what exactly went wrong to get to this, thus the error message makes an educated guess.
Configuration menu - View commit details
-
Copy full SHA for e6023a1 - Browse repository at this point
Copy the full SHA e6023a1View commit details
Commits on Aug 18, 2023
-
FIX: Issues with saving/loading with accelerate (#1008)
* FIX: Issues with saving/loading with accelerate Description There were a few issue with saving and loading parameters for an accelerated net (some only in a multi-GPU setting though): - load_params set the device if device=None to CPU, but we need None - not waiting for all processes to finish before saving parameters - all processes saving the parameters, when only main should - an issue with parameter names depending on the wrapping state Regarding the last point, the issue was that if the module(s) are wrapped with accelerate, the parameters have an additional prefix, "module.". So e.g. "dense0.weight" would become "module.dense0.weight". This could result in a key mismatch and error when loading a model. The solution to this is to always unwrap the net before saving and before loading. That way, the extra prefix is never present and there is no mismatch. A test was added to check this behavior, but since the GitHub CI does not offer multi-GPU support, it does not test for all failure cases. Therefore, I added a script, examples/accelerate-multigpu/run-save-load.py, that can be run on a multi-GPU setup to test the issue. This unit test checks the correct behavior on CPU, iterating through all 4 combinations of wrapping/not wrapping the initial/loaded model. Implementation The changes needed were often just a few lines that sync the processes or place a guard to only run on the main process. A few of these were quite unintuitive to me, so I added a comment for them. The one big change is that the preparation of the components by the accelerator is now moved to a separate method, _initialize_accelerator. This way, it is now possible to unwrap and re-wrap the model with a single method call each. Without that change, re-wrapping was only possible by calling net.initialize(), which would re-initialize everything, which is not desired. This change can be backwards incompatible: If a user saved the parameters of an accelerated net while it's still wrapped (not the default), and tries to load it into a wrapped net, it will no longer work. I think this case is rare enough that we can accept it. In the worst case, the user can still apply the state dict manually on the wrapped net. I did consider an alternative solution that would inspect the names of the keys in the state dict and try to determine from those if the loaded/current weights are from a wrapped model or not, and consequently rename the keys of the state dict. However, this method is unreliable and also not easy to implement with the current code, so I opted for the solution described above. Also note that this PR does _not_ fix potential issues that might occur during checkpointing of the model while it's training. For this, we need to use accelerator.{save_state,load_state}, see here: https://huggingface.co/docs/accelerate/usage_guides/checkpoint Probably this use case is best served with a separate checkpoint callback. * Changelog entry, references to the GH PR * Temporarily set device in load_params if necessary Reviewer feedback: To prevent a confusing warning message, temporarily set the device to 'cpu' when it was None during load_params. After loading has finished, set it back to the original value.Configuration menu - View commit details
-
Copy full SHA for 07fc260 - Browse repository at this point
Copy the full SHA 07fc260View commit details -
Accelerate some adjustment for mixed precision (#1009)
* Use accelerator.autocast when computing loss According to the accelerate docs, loss computation should be performed within the accelerator.autocast context manager: https://huggingface.co/docs/accelerate/v0.21.0/en/quicktour#mixed-precision-training I tested if this makes a difference by running the following notebook with fp16 precision: https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb I found no difference at all: The runtime was practially the same and the losses were identical. Still, I think it's better to have this than not, as it is recommended by the accelerate docs. * Update LR scheduler callback to work w/ accelerate According to the accelerate docs: https://huggingface.co/docs/accelerate/quicktour#mixed-precision-training the LR scheduler step should sometimes be skipped when using mixed precision training because accelerate may skip update steps internally. Therefore, I updated the LR scheduler callback to check if the net has an accelerator and if it does, to check if a step is necessary. This is actually quite hard to test because the necessity of stepping depends on accelerate's internal logic, which we don't want to test, and which might change in the future. Therefore, the added test just runs training with accelerate, mixed precision, and some lr schedulers, verifying that there is no error. When running these tests + the normal lr scheduler tests locally on a machine that supports fp16, I get 100% line coverage of lr_scheduler.py. I think this is good enough. * Non-functional clean ups related to lr schedulers While working on the fixes in this PR, I also cleaned up some lr scheduler code. These clean ups are non-functional. 1. We imported CyclicLR as TorchCyclicLR. I'm not sure why but it is somehow related to very old PyTorch versions we no longer support, so I removed this. 2. Fixed some indentations for conditional checks to improve readability. * Reviewer comment: Simplify conditional code
Configuration menu - View commit details
-
Copy full SHA for 312daaa - Browse repository at this point
Copy the full SHA 312daaaView commit details
Commits on Aug 28, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b8185cf - Browse repository at this point
Copy the full SHA b8185cfView commit details
Commits on Aug 30, 2023
-
MNT Fix install for testing (#1015)
Installing [testing] now includes all required dev dependencies.
Configuration menu - View commit details
-
Copy full SHA for 3641273 - Browse repository at this point
Copy the full SHA 3641273View commit details
Commits on Sep 2, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 7fd4b6e - Browse repository at this point
Copy the full SHA 7fd4b6eView commit details
Commits on Sep 4, 2023
-
* Update VERSION * Remove code for backwards compatibility with v0.11 * Update CHANGES.md Since skorch nets pickled with <= v0.11 can no longer be unpickled, I added an entry on how to transition these pickle files. Release text: This is a smaller release, but it still contains changes which will be interesting to some of you. We added the possibility to store weights using safetensors. This can have several advantages, listed here. When calling net.save_params and net.load_params, just pass use_safetensors=True to use safetensors instead of pickle. Moreover, there is a new argument on NeuralNet: You can now pass use_caching=False or True to disable or enable caching for all callbacks at once. This is useful if you have a lot of scoring callbacks and don't want to toggle caching on each individually. Finally, we fixed a few issues related to using skorch with accelerate. Thanks Zach Mueller (@ muellerzr) for his first contribution to skorch.
Configuration menu - View commit details
-
Copy full SHA for 17c7675 - Browse repository at this point
Copy the full SHA 17c7675View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.14.0...v0.15.0