Comparing v0.14.0...v0.15.0 · skorch-dev/skorch

Commits on Jun 26, 2023

Bump dev version

ottonemo committed Jun 26, 2023
Configuration menu
View commit details

Copy full SHA for 468a9c6

Browse repository at this point
Copy the full SHA

468a9c6 View commit details

Browse the repository at this point in the history

Commits on Jun 27, 2023

This commit fixes the release/deploy script (#987 )
```
Before it was not possible to execute this script without errors as
the conda update was broken. I moved the activation of the conda
env to `conda run` which is apparently the recommended way.
```
ottonemo authored Jun 27, 2023
Configuration menu
View commit details

Copy full SHA for 8a1fd6d

Browse repository at this point
Copy the full SHA

8a1fd6d View commit details

Browse the repository at this point in the history
Change version to 0.14.1dev0
```
Instead of 0.15.0.dev0, in case we need a patch release.
```
BenjaminBossan committed Jun 27, 2023
Configuration menu
View commit details

Copy full SHA for 6e63ed9

Browse repository at this point
Copy the full SHA

6e63ed9 View commit details

Browse the repository at this point in the history

Commits on Jun 28, 2023

Exclude _version.py from coverage report (#988 )
```
We vendor the code in _version.py from scipy. It is thus not necessary
for us to have tests for it. Still, it was included in the coverage
report, resulting in artificially lowered test coverage. This fix
removes the module from being counted for coverage.
```
BenjaminBossan authored Jun 28, 2023
Configuration menu
View commit details

Copy full SHA for c6c2ef5

Browse repository at this point
Copy the full SHA

c6c2ef5 View commit details

Browse the repository at this point in the history

Commits on Jun 29, 2023

Add option to globally override callback caching (#971 )

Resolves #957 (look there for motivation)

Description

I went with the option to add an argument to the NeuralNet class itself
that can override the caching behavior of the scoring callbacks. This
seemed more straightforward than going with a context manager, which is
not a pattern we typically require of skorch users. The disadvantage is
that we have yet one more parameter on the NeuralNet.

By default, the caching behavior of the callbacks is not changed, i.e.
this should be fully backwards compatible.

Implementation

Ideally, I would have liked to implement this in a way that any new (or
user defined) callback don't have to do anything special to honor the
parameter. However, this is not really possible. Although I moved the
logic of whether to use inference caching inside of
skorch.callbacks.scoring._cache_net_forward_iter, so that the global
override is automatically taken into account here, there are other
places in the callbacks whose behavior changes depending on the use of
caching (e.g. target extraction). The check for whether there is a
caching override thus has to be performed multiple times.

BenjaminBossan authored Jun 29, 2023

48cb4a1

Commits on Jul 6, 2023

MNT: Update CI, add Python 3.11, fix sklearn issues (#990 )

* Update checkout and setup-python GH actions
* Add Python 3.11 to CI (only PyTorch 1.13)
* Add pytest-pretty to CI
* Increase columns in GH logs for better readability
* Change check for sklearn error message
* Make data_from_dataset work with TensorDataset

Details

Something somewhere in sklearn changed that now requires the presence of
the classes_ attribute on classifiers when scoring. That attribute is
not always present. Now, more of an effort is made to infer the classes
when a TensorDataset is passed.

An sklearn error message that we checked was changed, resulting in the
corresponding tests to fail. This commit fixes the error message checked
in the test (which was already quite defensive beforehand).

BenjaminBossan authored Jul 6, 2023

8c1e66a

Commits on Jul 19, 2023

Update docs after accelerate fix was released (#995 )

There was an issue with using grid search and other sklearn methods that
involve cloning in conjunction with accelerate in a multi-gpu setting.
This was due to sklearn creating copies of the Accelerator object. The
skorch docs proposed a fix, but this fix is now released in accelerate
itself. Therefore, the proposal can be safely removed.

Another minor change was to adjust the criterion in an example script.
This is not essential for the script to work, but prevents nan losses.

BenjaminBossan authored Jul 19, 2023

549f3e6

Commits on Jul 26, 2023

Add support for safetensors (#970 )

Description

net.save_params and net.load_params now support passing
use_safetensors=True, which will result in the underlying state_dict
being serialized/deserialized using safetensors instead of torch.save
and torch.load (which both rely on pickle).

Similarly, Checkpoint, TrainEndCheckpoint, and LoadInitState now support
use_safetensors=True.

By default, nothing changes for the user, safetensors is opt in.

Motivation

safetensors has a couple of advantages over pickle, which are described
[here](https://github.com/huggingface/safetensors/#yet-another-format-).
Most notably, it doesn't have the security issues that pickle has.
Recently, it has even been
[audited](https://huggingface.co/blog/safetensors-security-audit) and
found to be safe. Furthermore, it is growing in popularity, being
natively supported by Hugging Face Hub for instance, and it is
compatible with most major frameworks like tensorflow.

Implementation

Using safetensors requires users to install the library. I haven't added
it as a default dependency, to keep those lean. I also haven't added any
special error messages when safetensors is not installed, I think the
standard Python message is sufficient.

The code changes have been mostly very straightforward. A small caveat
is that, contrary to using torch.load/torch.save, this does not work
for serializing the optimizer. That is because safetensors is only
capable of serializing tensors, but the optimizer state_dict contains
other stuff. If users absolutely need to save the optimizer, they have
to use pickle for that.

I was entertaining the idea of inferring the serialization format by the
file name, enabling something like:

net = ...
net.save_params(f_params='module.safetensors', f_optimizer='optimizer.pkl')

i.e. the module to be stored with safetensors and the optimizer with
pickle. But this would add quite some complexity and ambiguity for very
little benefit. The user can just make a separate call to `save_params`
to save the optimizer without safetensors instead. Similarly, they can
add a second Checkpoint callback for the optimizer.

I added a check in the callbacks to see if f_optimizer is being set when
use_safetensors=True, in which case I raise an error. This seemed
necessary to me because it is set there by default and because these
callbacks intercept errors very broadly when saving params, so it can be
easy for users to miss when the optimizer is not actually being
saved (there is a message printed if net.verbose but this is easy to
miss IMO).

Most of the changes were required for testing. There, I had to use quite
a few if...else because I needed to exclude the optimizer each time that
safetensors is used.

I also made two unrelated changes:

- For Checkpoint, I moved a docstring argument lower to reflect its
  order in the argument list.
- I deleted the pickle_dump_mock fixture in TestTrainEndCheckpoint,
  which wasn't being used.

BenjaminBossan authored Jul 26, 2023

e83e6a4

Commits on Aug 1, 2023

FIX: typo in notebook name (#1001 )
```
Was "tranformer", not "transformer".
```
BenjaminBossan authored Aug 1, 2023
Configuration menu
View commit details

Copy full SHA for ca1bae6

Browse repository at this point
Copy the full SHA

ca1bae6 View commit details

Browse the repository at this point in the history

Commits on Aug 2, 2023

Better error message when NeuralNetClassifier misses the classes_ att…

…ribute (#1004)

On NeuralNetClassifier, when classes_ is accessed but not known, we try
to use classes_inferred_ instead. If that is also not defined, users
currently just get an AttributeError that classes_inferred_ does not
exist. This can be very confusing.

This PR intercepts the error, provides a (hopefully) helpful error
message, then reraises. It is unfortunately not easily possible to
deduce what exactly went wrong to get to this, thus the error message
makes an educated guess.

BenjaminBossan authored Aug 2, 2023

e6023a1

Commits on Aug 18, 2023

FIX: Issues with saving/loading with accelerate (#1008 )

* FIX: Issues with saving/loading with accelerate

Description

There were a few issue with saving and loading parameters for an
accelerated net (some only in a multi-GPU setting though):

- load_params set the device if device=None to CPU, but we need None
- not waiting for all processes to finish before saving parameters
- all processes saving the parameters, when only main should
- an issue with parameter names depending on the wrapping state

Regarding the last point, the issue was that if the module(s) are
wrapped with accelerate, the parameters have an additional prefix,
"module.". So e.g. "dense0.weight" would become "module.dense0.weight".
This could result in a key mismatch and error when loading a model.

The solution to this is to always unwrap the net before saving and
before loading. That way, the extra prefix is never present and there is
no mismatch.

A test was added to check this behavior, but since the GitHub CI does
not offer multi-GPU support, it does not test for all failure cases.
Therefore, I added a script,
examples/accelerate-multigpu/run-save-load.py, that can be run on a
multi-GPU setup to test the issue.

This unit test checks the correct behavior on CPU, iterating through all
4 combinations of wrapping/not wrapping the initial/loaded model.

Implementation

The changes needed were often just a few lines that sync the processes
or place a guard to only run on the main process. A few of these were
quite unintuitive to me, so I added a comment for them.

The one big change is that the preparation of the components by the
accelerator is now moved to a separate method, _initialize_accelerator.
This way, it is now possible to unwrap and re-wrap the model with a
single method call each. Without that change, re-wrapping was only
possible by calling net.initialize(), which would re-initialize
everything, which is not desired.

This change can be backwards incompatible: If a user saved the
parameters of an accelerated net while it's still wrapped (not the
default), and tries to load it into a wrapped net, it will no longer
work. I think this case is rare enough that we can accept it. In the
worst case, the user can still apply the state dict manually on the
wrapped net.

I did consider an alternative solution that would inspect the names of
the keys in the state dict and try to determine from those if the
loaded/current weights are from a wrapped model or not, and consequently
rename the keys of the state dict. However, this method is unreliable
and also not easy to implement with the current code, so I opted for the
solution described above.

Also note that this PR does _not_ fix potential issues that might occur
during checkpointing of the model while it's training. For this, we need
to use accelerator.{save_state,load_state}, see here:

https://huggingface.co/docs/accelerate/usage_guides/checkpoint

Probably this use case is best served with a separate checkpoint
callback.

* Changelog entry, references to the GH PR

* Temporarily set device in load_params if necessary

Reviewer feedback: To prevent a confusing warning message, temporarily
set the device to 'cpu' when it was None during load_params. After
loading has finished, set it back to the original value.

BenjaminBossan authored Aug 18, 2023

07fc260

Accelerate some adjustment for mixed precision (#1009 )

* Use accelerator.autocast when computing loss

According to the accelerate docs, loss computation should be performed
within the accelerator.autocast context manager:

https://huggingface.co/docs/accelerate/v0.21.0/en/quicktour#mixed-precision-training

I tested if this makes a difference by running the following notebook
with fp16 precision:

https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb

I found no difference at all: The runtime was practially the same and
the losses were identical. Still, I think it's better to have this than
not, as it is recommended by the accelerate docs.

* Update LR scheduler callback to work w/ accelerate

According to the accelerate docs:

https://huggingface.co/docs/accelerate/quicktour#mixed-precision-training

the LR scheduler step should sometimes be skipped when using mixed
precision training because accelerate may skip update steps internally.
Therefore, I updated the LR scheduler callback to check if the net has
an accelerator and if it does, to check if a step is necessary.

This is actually quite hard to test because the necessity of stepping
depends on accelerate's internal logic, which we don't want to test, and
which might change in the future. Therefore, the added test just runs
training with accelerate, mixed precision, and some lr schedulers,
verifying that there is no error.

When running these tests + the normal lr scheduler tests locally on a
machine that supports fp16, I get 100% line coverage of lr_scheduler.py.
I think this is good enough.

* Non-functional clean ups related to lr schedulers

While working on the fixes in this PR, I also cleaned up some lr
scheduler code. These clean ups are non-functional.

1. We imported CyclicLR as TorchCyclicLR. I'm not sure why but it is
somehow related to very old PyTorch versions we no longer support, so I
removed this.
2. Fixed some indentations for conditional checks to improve
readability.

* Reviewer comment: Simplify conditional code

BenjaminBossan authored Aug 18, 2023

312daaa

Commits on Aug 28, 2023

Accelerate bug in specific versions (#1007 )

BenjaminBossan authored Aug 28, 2023
Configuration menu
View commit details

Copy full SHA for b8185cf

Browse repository at this point
Copy the full SHA

b8185cf View commit details

Browse the repository at this point in the history

Commits on Aug 30, 2023

MNT Fix install for testing (#1015 )
```
Installing [testing] now includes all required dev dependencies.
```
muellerzr authored Aug 30, 2023
Configuration menu
View commit details

Copy full SHA for 3641273

Browse repository at this point
Copy the full SHA

3641273 View commit details

Browse the repository at this point in the history

Commits on Sep 2, 2023

DOC: Attempt at fixing theme on RTD (#1019 )

BenjaminBossan authored Sep 2, 2023
Configuration menu
View commit details

Copy full SHA for 7fd4b6e

Browse repository at this point
Copy the full SHA

7fd4b6e View commit details

Browse the repository at this point in the history

Commits on Sep 4, 2023

Release 0.15.0 (#1014 )

* Update VERSION
* Remove code for backwards compatibility with v0.11
* Update CHANGES.md

Since skorch nets pickled with <= v0.11 can no longer be unpickled, I
added an entry on how to transition these pickle files.

Release text:

This is a smaller release, but it still contains changes which will be
interesting to some of you.

We added the possibility to store weights using safetensors. This can have
several advantages, listed here. When calling net.save_params and
net.load_params, just pass use_safetensors=True to use safetensors instead of
pickle.

Moreover, there is a new argument on NeuralNet: You can now pass
use_caching=False or True to disable or enable caching for all callbacks at
once. This is useful if you have a lot of scoring callbacks and don't want to
toggle caching on each individually.

Finally, we fixed a few issues related to using skorch with accelerate.

Thanks Zach Mueller (@ muellerzr) for his first contribution to skorch.

BenjaminBossan authored Sep 4, 2023

17c7675

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request