Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: skorch-dev/skorch
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.14.0
Choose a base ref
...
head repository: skorch-dev/skorch
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.15.0
Choose a head ref
  • 16 commits
  • 31 files changed
  • 4 contributors

Commits on Jun 26, 2023

  1. Bump dev version

    ottonemo committed Jun 26, 2023
    Configuration menu
    Copy the full SHA
    468a9c6 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2023

  1. This commit fixes the release/deploy script (#987)

    Before it was not possible to execute this script without errors as
    the conda update was broken. I moved the activation of the conda
    env to `conda run` which is apparently the recommended way.
    ottonemo authored Jun 27, 2023
    Configuration menu
    Copy the full SHA
    8a1fd6d View commit details
    Browse the repository at this point in the history
  2. Change version to 0.14.1dev0

    Instead of 0.15.0.dev0, in case we need a patch release.
    BenjaminBossan committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    6e63ed9 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2023

  1. Exclude _version.py from coverage report (#988)

    We vendor the code in _version.py from scipy. It is thus not necessary
    for us to have tests for it. Still, it was included in the coverage
    report, resulting in artificially lowered test coverage. This fix
    removes the module from being counted for coverage.
    BenjaminBossan authored Jun 28, 2023
    Configuration menu
    Copy the full SHA
    c6c2ef5 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2023

  1. Add option to globally override callback caching (#971)

    Resolves #957 (look there for motivation)
    
    Description
    
    I went with the option to add an argument to the NeuralNet class itself
    that can override the caching behavior of the scoring callbacks. This
    seemed more straightforward than going with a context manager, which is
    not a pattern we typically require of skorch users. The disadvantage is
    that we have yet one more parameter on the NeuralNet.
    
    By default, the caching behavior of the callbacks is not changed, i.e.
    this should be fully backwards compatible.
    
    Implementation
    
    Ideally, I would have liked to implement this in a way that any new (or
    user defined) callback don't have to do anything special to honor the
    parameter. However, this is not really possible. Although I moved the
    logic of whether to use inference caching inside of
    skorch.callbacks.scoring._cache_net_forward_iter, so that the global
    override is automatically taken into account here, there are other
    places in the callbacks whose behavior changes depending on the use of
    caching (e.g. target extraction). The check for whether there is a
    caching override thus has to be performed multiple times.
    BenjaminBossan authored Jun 29, 2023
    Configuration menu
    Copy the full SHA
    48cb4a1 View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2023

  1. MNT: Update CI, add Python 3.11, fix sklearn issues (#990)

    * Update checkout and setup-python GH actions
    * Add Python 3.11 to CI (only PyTorch 1.13)
    * Add pytest-pretty to CI
    * Increase columns in GH logs for better readability
    * Change check for sklearn error message
    * Make data_from_dataset work with TensorDataset
    
    Details
    
    Something somewhere in sklearn changed that now requires the presence of
    the classes_ attribute on classifiers when scoring. That attribute is
    not always present. Now, more of an effort is made to infer the classes
    when a TensorDataset is passed.
    
    An sklearn error message that we checked was changed, resulting in the
    corresponding tests to fail. This commit fixes the error message checked
    in the test (which was already quite defensive beforehand).
    BenjaminBossan authored Jul 6, 2023
    Configuration menu
    Copy the full SHA
    8c1e66a View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. Update docs after accelerate fix was released (#995)

    There was an issue with using grid search and other sklearn methods that
    involve cloning in conjunction with accelerate in a multi-gpu setting.
    This was due to sklearn creating copies of the Accelerator object. The
    skorch docs proposed a fix, but this fix is now released in accelerate
    itself. Therefore, the proposal can be safely removed.
    
    Another minor change was to adjust the criterion in an example script.
    This is not essential for the script to work, but prevents nan losses.
    BenjaminBossan authored Jul 19, 2023
    Configuration menu
    Copy the full SHA
    549f3e6 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2023

  1. Add support for safetensors (#970)

    Description
    
    net.save_params and net.load_params now support passing
    use_safetensors=True, which will result in the underlying state_dict
    being serialized/deserialized using safetensors instead of torch.save
    and torch.load (which both rely on pickle).
    
    Similarly, Checkpoint, TrainEndCheckpoint, and LoadInitState now support
    use_safetensors=True.
    
    By default, nothing changes for the user, safetensors is opt in.
    
    Motivation
    
    safetensors has a couple of advantages over pickle, which are described
    [here](https://github.com/huggingface/safetensors/#yet-another-format-).
    Most notably, it doesn't have the security issues that pickle has.
    Recently, it has even been
    [audited](https://huggingface.co/blog/safetensors-security-audit) and
    found to be safe. Furthermore, it is growing in popularity, being
    natively supported by Hugging Face Hub for instance, and it is
    compatible with most major frameworks like tensorflow.
    
    Implementation
    
    Using safetensors requires users to install the library. I haven't added
    it as a default dependency, to keep those lean. I also haven't added any
    special error messages when safetensors is not installed, I think the
    standard Python message is sufficient.
    
    The code changes have been mostly very straightforward. A small caveat
    is that, contrary to using torch.load/torch.save, this does not work
    for serializing the optimizer. That is because safetensors is only
    capable of serializing tensors, but the optimizer state_dict contains
    other stuff. If users absolutely need to save the optimizer, they have
    to use pickle for that.
    
    I was entertaining the idea of inferring the serialization format by the
    file name, enabling something like:
    
    net = ...
    net.save_params(f_params='module.safetensors', f_optimizer='optimizer.pkl')
    
    i.e. the module to be stored with safetensors and the optimizer with
    pickle. But this would add quite some complexity and ambiguity for very
    little benefit. The user can just make a separate call to `save_params`
    to save the optimizer without safetensors instead. Similarly, they can
    add a second Checkpoint callback for the optimizer.
    
    I added a check in the callbacks to see if f_optimizer is being set when
    use_safetensors=True, in which case I raise an error. This seemed
    necessary to me because it is set there by default and because these
    callbacks intercept errors very broadly when saving params, so it can be
    easy for users to miss when the optimizer is not actually being
    saved (there is a message printed if net.verbose but this is easy to
    miss IMO).
    
    Most of the changes were required for testing. There, I had to use quite
    a few if...else because I needed to exclude the optimizer each time that
    safetensors is used.
    
    I also made two unrelated changes:
    
    - For Checkpoint, I moved a docstring argument lower to reflect its
      order in the argument list.
    - I deleted the pickle_dump_mock fixture in TestTrainEndCheckpoint,
      which wasn't being used.
    BenjaminBossan authored Jul 26, 2023
    Configuration menu
    Copy the full SHA
    e83e6a4 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2023

  1. FIX: typo in notebook name (#1001)

    Was "tranformer", not "transformer".
    BenjaminBossan authored Aug 1, 2023
    Configuration menu
    Copy the full SHA
    ca1bae6 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2023

  1. Better error message when NeuralNetClassifier misses the classes_ att…

    …ribute (#1004)
    
    On NeuralNetClassifier, when classes_ is accessed but not known, we try
    to use classes_inferred_ instead. If that is also not defined, users
    currently just get an AttributeError that classes_inferred_ does not
    exist. This can be very confusing.
    
    This PR intercepts the error, provides a (hopefully) helpful error
    message, then reraises. It is unfortunately not easily possible to
    deduce what exactly went wrong to get to this, thus the error message
    makes an educated guess.
    BenjaminBossan authored Aug 2, 2023
    Configuration menu
    Copy the full SHA
    e6023a1 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2023

  1. FIX: Issues with saving/loading with accelerate (#1008)

    * FIX: Issues with saving/loading with accelerate
    
    Description
    
    There were a few issue with saving and loading parameters for an
    accelerated net (some only in a multi-GPU setting though):
    
    - load_params set the device if device=None to CPU, but we need None
    - not waiting for all processes to finish before saving parameters
    - all processes saving the parameters, when only main should
    - an issue with parameter names depending on the wrapping state
    
    Regarding the last point, the issue was that if the module(s) are
    wrapped with accelerate, the parameters have an additional prefix,
    "module.". So e.g. "dense0.weight" would become "module.dense0.weight".
    This could result in a key mismatch and error when loading a model.
    
    The solution to this is to always unwrap the net before saving and
    before loading. That way, the extra prefix is never present and there is
    no mismatch.
    
    A test was added to check this behavior, but since the GitHub CI does
    not offer multi-GPU support, it does not test for all failure cases.
    Therefore, I added a script,
    examples/accelerate-multigpu/run-save-load.py, that can be run on a
    multi-GPU setup to test the issue.
    
    This unit test checks the correct behavior on CPU, iterating through all
    4 combinations of wrapping/not wrapping the initial/loaded model.
    
    Implementation
    
    The changes needed were often just a few lines that sync the processes
    or place a guard to only run on the main process. A few of these were
    quite unintuitive to me, so I added a comment for them.
    
    The one big change is that the preparation of the components by the
    accelerator is now moved to a separate method, _initialize_accelerator.
    This way, it is now possible to unwrap and re-wrap the model with a
    single method call each. Without that change, re-wrapping was only
    possible by calling net.initialize(), which would re-initialize
    everything, which is not desired.
    
    This change can be backwards incompatible: If a user saved the
    parameters of an accelerated net while it's still wrapped (not the
    default), and tries to load it into a wrapped net, it will no longer
    work. I think this case is rare enough that we can accept it. In the
    worst case, the user can still apply the state dict manually on the
    wrapped net.
    
    I did consider an alternative solution that would inspect the names of
    the keys in the state dict and try to determine from those if the
    loaded/current weights are from a wrapped model or not, and consequently
    rename the keys of the state dict. However, this method is unreliable
    and also not easy to implement with the current code, so I opted for the
    solution described above.
    
    Also note that this PR does _not_ fix potential issues that might occur
    during checkpointing of the model while it's training. For this, we need
    to use accelerator.{save_state,load_state}, see here:
    
    https://huggingface.co/docs/accelerate/usage_guides/checkpoint
    
    Probably this use case is best served with a separate checkpoint
    callback.
    
    * Changelog entry, references to the GH PR
    
    * Temporarily set device in load_params if necessary
    
    Reviewer feedback: To prevent a confusing warning message, temporarily
    set the device to 'cpu' when it was None during load_params. After
    loading has finished, set it back to the original value.
    BenjaminBossan authored Aug 18, 2023
    Configuration menu
    Copy the full SHA
    07fc260 View commit details
    Browse the repository at this point in the history
  2. Accelerate some adjustment for mixed precision (#1009)

    * Use accelerator.autocast when computing loss
    
    According to the accelerate docs, loss computation should be performed
    within the accelerator.autocast context manager:
    
    https://huggingface.co/docs/accelerate/v0.21.0/en/quicktour#mixed-precision-training
    
    I tested if this makes a difference by running the following notebook
    with fp16 precision:
    
    https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Finetuning.ipynb
    
    I found no difference at all: The runtime was practially the same and
    the losses were identical. Still, I think it's better to have this than
    not, as it is recommended by the accelerate docs.
    
    * Update LR scheduler callback to work w/ accelerate
    
    According to the accelerate docs:
    
    https://huggingface.co/docs/accelerate/quicktour#mixed-precision-training
    
    the LR scheduler step should sometimes be skipped when using mixed
    precision training because accelerate may skip update steps internally.
    Therefore, I updated the LR scheduler callback to check if the net has
    an accelerator and if it does, to check if a step is necessary.
    
    This is actually quite hard to test because the necessity of stepping
    depends on accelerate's internal logic, which we don't want to test, and
    which might change in the future. Therefore, the added test just runs
    training with accelerate, mixed precision, and some lr schedulers,
    verifying that there is no error.
    
    When running these tests + the normal lr scheduler tests locally on a
    machine that supports fp16, I get 100% line coverage of lr_scheduler.py.
    I think this is good enough.
    
    * Non-functional clean ups related to lr schedulers
    
    While working on the fixes in this PR, I also cleaned up some lr
    scheduler code. These clean ups are non-functional.
    
    1. We imported CyclicLR as TorchCyclicLR. I'm not sure why but it is
       somehow related to very old PyTorch versions we no longer support, so I
       removed this.
    2. Fixed some indentations for conditional checks to improve
       readability.
    
    * Reviewer comment: Simplify conditional code
    BenjaminBossan authored Aug 18, 2023
    Configuration menu
    Copy the full SHA
    312daaa View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2023

  1. Configuration menu
    Copy the full SHA
    b8185cf View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2023

  1. MNT Fix install for testing (#1015)

    Installing [testing] now includes all required dev dependencies.
    muellerzr authored Aug 30, 2023
    Configuration menu
    Copy the full SHA
    3641273 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2023

  1. Configuration menu
    Copy the full SHA
    7fd4b6e View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2023

  1. Release 0.15.0 (#1014)

    * Update VERSION
    * Remove code for backwards compatibility with v0.11
    * Update CHANGES.md
    
    Since skorch nets pickled with <= v0.11 can no longer be unpickled, I
    added an entry on how to transition these pickle files.
    
    Release text:
    
    This is a smaller release, but it still contains changes which will be
    interesting to some of you.
    
    We added the possibility to store weights using safetensors. This can have
    several advantages, listed here. When calling net.save_params and
    net.load_params, just pass use_safetensors=True to use safetensors instead of
    pickle.
    
    Moreover, there is a new argument on NeuralNet: You can now pass
    use_caching=False or True to disable or enable caching for all callbacks at
    once. This is useful if you have a lot of scoring callbacks and don't want to
    toggle caching on each individually.
    
    Finally, we fixed a few issues related to using skorch with accelerate.
    
    Thanks Zach Mueller (@ muellerzr) for his first contribution to skorch.
    BenjaminBossan authored Sep 4, 2023
    Configuration menu
    Copy the full SHA
    17c7675 View commit details
    Browse the repository at this point in the history
Loading