Add Example for Skorch DataLoader by ParagEkbote · Pull Request #1105 · skorch-dev/skorch

ParagEkbote · 2025-05-29T19:30:19Z

Refs #82

In this example, we use the Iterable Dataset class from torch for a synthetic streaming dataset. We use a custom callback for validation since train_split cannot be used for streaming datasets. I believe I have named the notebook a bit incorrectly and would appreciate feed-back on it.

Could you please review the changes?

cc: @githubnemo, @BenjaminBossan

review-notebook-app · 2025-05-29T19:30:25Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenjaminBossan

Thanks for this PR to add an example notebook for streaming data. It is clear and precise, nicely done. I have a few small suggestions for improvement, please check. Also, please run the notebook and check it in with the output cells.

BenjaminBossan · 2025-06-05T09:47:13Z

+    "    def __iter__(self):\n",
+    "        for _ in range(self.length):\n",
+    "            X = torch.randn(20, generator=self.rng)\n",
+    "            y = torch.randint(0, 2, (1,), generator=self.rng).item()\n",


Just a proposal: When y is not completely random, the net can actually learn something and improve the loss.

Suggested change

" y = torch.randint(0, 2, (1,), generator=self.rng).item()\n",

" y = (X.sum() > 10).sum()\n",

Hmm, I think that adding a rule-based approach for a single variable is a bit unfavorable since we are adding a dependency to a specific variable(X).

Do you think that adding controlled noise to synthetic data with deterministic logic could be useful?

A proposed example could be:

class StreamingDataset(IterableDataset): def __init__(self, length=1000, seed=42, noise_prob=0.1, threshold=3.0): self.length = length self.rng = torch.Generator().manual_seed(seed) self.noise_prob = noise_prob self.threshold = threshold

Sorry, I don't understand the reply, could you please elaborate? My suggestion was simply to make y correlate with X. That way, when we train the model, we can see the loss improving. When y is completely random, the loss is not improving. For the purpose of this notebook, one could say it doesn't matter, but I can imagine some viewers being confused by the stagnating loss, perhaps assuming there is an error, hence my suggestion.

What I meant was to add controlled noise/entropy to a synthetic dataset instead of having a correlation that could be seen as less realistic. But I agree that the stagnating loss could be seen as an error. I'll update the variables.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

ParagEkbote · 2025-06-09T13:45:46Z

Could you please review the changes?

cc: @BenjaminBossan

BenjaminBossan

Thanks for the updates. I just left 2 small comments, please check, the rest looks good.

BenjaminBossan · 2025-06-12T09:55:17Z

+    "    def __iter__(self):\n",
+    "        for _ in range(self.length):\n",
+    "            X = torch.randn(20, generator=self.rng)\n",
+    "            y = torch.randint(0, 2, (1,), generator=self.rng).item()\n",


Sorry, I don't understand the reply, could you please elaborate? My suggestion was simply to make y correlate with X. That way, when we train the model, we can see the loss improving. When y is completely random, the loss is not improving. For the purpose of this notebook, one could say it doesn't matter, but I can imagine some viewers being confused by the stagnating loss, perhaps assuming there is an error, hence my suggestion.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan · 2025-06-13T13:34:50Z

@ParagEkbote Great, thanks for the updates. Could you please run the notebook and check it in with the cell outputs, then we should be good to merge.

ParagEkbote · 2025-06-13T13:54:37Z

I have been able to execute the notebook completely and the final output is being shown correctly as well:

Could you please review?

cc: @BenjaminBossan

BenjaminBossan

Thanks for contributing this notebook all looks good.

@raphaelrubrice

# Version 1.2.0 This is a smaller release, most changes concern examples and development and thus don't affect users of skorch. ## Changed - Loading of skorch nets using pickle: When unpickling a skorch net, you may come across a PyTorch warning that goes: "FutureWarning: You are using torch.load with weights_only=False [...]"; to avoid this warning, pickle the net again and use the new pickle file (#1092) ## Added - Add Contributing Guidelines for skorch. (#1097) - Add an example of hyper-parameter optimization using [Optuna](https://optuna.org/) [here](https://github.com/skorch-dev/skorch/tree/master/examples/optuna) (#1098) - Add Example for Streaming Dataset(#1105) - Add pyproject.toml to Improve CI/CD and Tooling (#1108) Thanks @raphaelrubrice, @omahs, and @ParagEkbote for their contributions. **Full Changelog**: v1.1.0...v1.2.0 Release commit specific: * Bump verison to 1.2.0 * Update CHANGES.md * Remove workarounds that have been fixed in sklearn Only affects tests

add the initial notebook

6b0e766

ParagEkbote added 5 commits May 29, 2025 19:38

add the changelog, update name of notebook and readme.

284db34

update the example.

42de8d6

update the description

bfb71df

update the subheadings of the notebook.

ee3924f

give a suitable name to the example notebook.

493902e

ParagEkbote marked this pull request as ready for review May 31, 2025 14:47

BenjaminBossan requested changes Jun 5, 2025

View reviewed changes

ParagEkbote and others added 2 commits June 5, 2025 21:08

Apply the suggestions from code review

1251e75

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

add a note about the usage of callbacks.

a466bcb

ParagEkbote requested a review from BenjaminBossan June 5, 2025 17:44

BenjaminBossan reviewed Jun 12, 2025

View reviewed changes

ParagEkbote and others added 2 commits June 12, 2025 17:46

Update notebooks/Streaming_Dataset.ipynb

2f52e01

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update notebooks/Streaming_Dataset.ipynb

abe3d95

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

ParagEkbote requested a review from BenjaminBossan June 12, 2025 13:29

execute the notebook.

5070425

BenjaminBossan approved these changes Jun 13, 2025

View reviewed changes

BenjaminBossan merged commit b40d905 into skorch-dev:master Jun 13, 2025
16 checks passed

ParagEkbote deleted the Example-for-DataLoader branch June 13, 2025 14:11

ParagEkbote mentioned this pull request Jun 13, 2025

*Examples* Add an example how to handle data streams with Dataloader #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Example for Skorch DataLoader#1105

Add Example for Skorch DataLoader#1105
BenjaminBossan merged 11 commits intoskorch-dev:masterfrom
ParagEkbote:Example-for-DataLoader

ParagEkbote commented May 29, 2025 •

edited

Loading

review-notebook-app Bot commented May 29, 2025

BenjaminBossan left a comment •

edited

Loading

Uh oh!

Uh oh!

BenjaminBossan Jun 5, 2025

ParagEkbote Jun 5, 2025

BenjaminBossan Jun 12, 2025

ParagEkbote Jun 12, 2025

Uh oh!

ParagEkbote commented Jun 9, 2025

BenjaminBossan left a comment

BenjaminBossan Jun 12, 2025

Uh oh!

BenjaminBossan commented Jun 13, 2025

ParagEkbote commented Jun 13, 2025

BenjaminBossan left a comment

Uh oh!

Labels

2 participants

	" y = torch.randint(0, 2, (1,), generator=self.rng).item()\n",
	" y = (X.sum() > 10).sum()\n",

Conversation

ParagEkbote commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

review-notebook-app Bot commented May 29, 2025

BenjaminBossan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan Jun 5, 2025

Choose a reason for hiding this comment

ParagEkbote Jun 5, 2025

Choose a reason for hiding this comment

BenjaminBossan Jun 12, 2025

Choose a reason for hiding this comment

ParagEkbote Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

ParagEkbote commented Jun 9, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Jun 13, 2025

ParagEkbote commented Jun 13, 2025

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

ParagEkbote commented May 29, 2025 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading