Timeline for answer to Stratified random sampling with BigQuery? by Gordon Linoff
Current License: CC BY-SA 4.0
Post Revisions
3 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Aug 27, 2020 at 0:23 | comment | added | Gordon Linoff |
@Josh . . . What I mean is that an nth sample will work if you want to stratify by a numeric columns, for instance row_number() over (order by income) would also work with the modulo approach.
|
|
| Aug 26, 2020 at 20:26 | comment | added | Josh |
QQ - Why do you say that "The first has a particularly nice feature that it can also work with numeric dimensions as well."? seqnum is a number in both cases. The only diff is that in one case you are (trying to) take a fixed percentage of samples per category, whereas in the 2nd one you are taking (at most) a fixed (and equal) number of samples per category, right?
|
|
| Oct 20, 2018 at 12:44 | history | answered | Gordon Linoff | CC BY-SA 4.0 |