Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

2
  • QQ - Why do you say that "The first has a particularly nice feature that it can also work with numeric dimensions as well."? seqnum is a number in both cases. The only diff is that in one case you are (trying to) take a fixed percentage of samples per category, whereas in the 2nd one you are taking (at most) a fixed (and equal) number of samples per category, right? Commented Aug 26, 2020 at 20:26
  • @Josh . . . What I mean is that an nth sample will work if you want to stratify by a numeric columns, for instance row_number() over (order by income) would also work with the modulo approach. Commented Aug 27, 2020 at 0:23