Add Example for Skorch DataLoader#1105
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| " def __iter__(self):\n", | ||
| " for _ in range(self.length):\n", | ||
| " X = torch.randn(20, generator=self.rng)\n", | ||
| " y = torch.randint(0, 2, (1,), generator=self.rng).item()\n", |
There was a problem hiding this comment.
Just a proposal: When y is not completely random, the net can actually learn something and improve the loss.
| " y = torch.randint(0, 2, (1,), generator=self.rng).item()\n", | |
| " y = (X.sum() > 10).sum()\n", |
There was a problem hiding this comment.
Hmm, I think that adding a rule-based approach for a single variable is a bit unfavorable since we are adding a dependency to a specific variable(X).
Do you think that adding controlled noise to synthetic data with deterministic logic could be useful?
A proposed example could be:
class StreamingDataset(IterableDataset):
def __init__(self, length=1000, seed=42, noise_prob=0.1, threshold=3.0):
self.length = length
self.rng = torch.Generator().manual_seed(seed)
self.noise_prob = noise_prob
self.threshold = thresholdThere was a problem hiding this comment.
Sorry, I don't understand the reply, could you please elaborate? My suggestion was simply to make y correlate with X. That way, when we train the model, we can see the loss improving. When y is completely random, the loss is not improving. For the purpose of this notebook, one could say it doesn't matter, but I can imagine some viewers being confused by the stagnating loss, perhaps assuming there is an error, hence my suggestion.
There was a problem hiding this comment.
What I meant was to add controlled noise/entropy to a synthetic dataset instead of having a correlation that could be seen as less realistic. But I agree that the stagnating loss could be seen as an error. I'll update the variables.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
|
Could you please review the changes? cc: @BenjaminBossan |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the updates. I just left 2 small comments, please check, the rest looks good.
| " def __iter__(self):\n", | ||
| " for _ in range(self.length):\n", | ||
| " X = torch.randn(20, generator=self.rng)\n", | ||
| " y = torch.randint(0, 2, (1,), generator=self.rng).item()\n", |
There was a problem hiding this comment.
Sorry, I don't understand the reply, could you please elaborate? My suggestion was simply to make y correlate with X. That way, when we train the model, we can see the loss improving. When y is completely random, the loss is not improving. For the purpose of this notebook, one could say it doesn't matter, but I can imagine some viewers being confused by the stagnating loss, perhaps assuming there is an error, hence my suggestion.
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
|
@ParagEkbote Great, thanks for the updates. Could you please run the notebook and check it in with the cell outputs, then we should be good to merge. |
|
I have been able to execute the notebook completely and the final output is being shown correctly as well: Could you please review? cc: @BenjaminBossan |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for contributing this notebook all looks good.
# Version 1.2.0 This is a smaller release, most changes concern examples and development and thus don't affect users of skorch. ## Changed - Loading of skorch nets using pickle: When unpickling a skorch net, you may come across a PyTorch warning that goes: "FutureWarning: You are using torch.load with weights_only=False [...]"; to avoid this warning, pickle the net again and use the new pickle file (#1092) ## Added - Add Contributing Guidelines for skorch. (#1097) - Add an example of hyper-parameter optimization using [Optuna](https://optuna.org/) [here](https://github.com/skorch-dev/skorch/tree/master/examples/optuna) (#1098) - Add Example for Streaming Dataset(#1105) - Add pyproject.toml to Improve CI/CD and Tooling (#1108) Thanks @raphaelrubrice, @omahs, and @ParagEkbote for their contributions. **Full Changelog**: v1.1.0...v1.2.0 Release commit specific: * Bump verison to 1.2.0 * Update CHANGES.md * Remove workarounds that have been fixed in sklearn Only affects tests

Refs #82
In this example, we use the
Iterable Datasetclass from torch for a synthetic streaming dataset. We use a custom callback for validation sincetrain_splitcannot be used for streaming datasets. I believe I have named the notebook a bit incorrectly and would appreciate feed-back on it.Could you please review the changes?
cc: @githubnemo, @BenjaminBossan