How to extract N elements every M elements from an array?

Question

Suppose I have a numpy array [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16], How do I take 4 elements every 8 elements). Here is the expected result:

a -> [1,2,3,4, 9,10,11,12]
b -> [5,6,7,8, 13,14,15,16]

My array has hundreds of elements. I went through the numpy array documentation but I never succeeded to perform this computation other then a loop which is very slow.

EDIT: The array can have up to 3 interleave sub-array of 4 elements

4 elt sample0, 4 elt sample 1, 4 elt  sample2, 4 elt sample0, 4 elt sample 1, 4 elt sample2, 4 elt sample0, 4 elt sample 1, 4 elt sample2 ...

My array has 499875840 elements !

Something similar to b = [a[cut:cut + 4] for cut in range(0, len(a), 8)] — lord_haffi
– lord_haffi, Commented Jun 10, 2024 at 17:19
Could you please clarify the output if you have more than 16 elements? — mozway
– mozway, Commented Jun 10, 2024 at 17:41
What's the size of N and M relative the length of the array. Does result have to be a list of P arrays (each M` long), or will it always be a list 2 arrayd? How about a (P,M) shaped array instead of a list? — hpaulj
– hpaulj, Commented Jun 11, 2024 at 0:42
@hpaulj: Added clarification about number of sub-array and array structure — sleli
– sleli, Commented Jun 11, 2024 at 5:26
@sleli Check out my new answer, it has the fastest code for you — U13-Forward
– U13-Forward, Commented Jun 13, 2024 at 22:20

mozway · Accepted Answer · 2024-06-14 08:13:26Z

For a generic and pure numpy approach, you could argsort then split:

N = 4 # number of consecutive elements
M = 2 # number of output arrays

idx = np.argsort(np.arange(len(arr))%(N*M)//N, kind='stable')
# array([ 0,  1,  2,  3,  8,  9, 10, 11,  4,  5,  6,  7, 12, 13, 14, 15])

a, b = np.split(arr[idx], M)

As a one liner:

out = np.split(arr[np.argsort(np.arange(len(arr))%(N*M)//N, kind='stable')], M)

Output:

# a / out[0]
array([ 1,  2,  3,  4,  9, 10, 11, 12])

# b / out[1]
array([ 5,  6,  7,  8, 13, 14, 15, 16])

Output with arr = np.arange(32) as input:

# a
array([ 0,  1,  2,  3,  8,  9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27])

# b
array([ 4,  5,  6,  7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31])

Output with arr = np.arange(32), N = 4, M = 4:

(array([ 0,  1,  2,  3, 16, 17, 18, 19]),
 array([ 4,  5,  6,  7, 20, 21, 22, 23]),
 array([ 8,  9, 10, 11, 24, 25, 26, 27]),
 array([12, 13, 14, 15, 28, 29, 30, 31])

timings

Paul's approach is faster than mine (but limited to 2 arrays as output).

generalization

A reshaping approach, as proposed by @hpaulj, can be generalized using:

N = 4 # number of consecutive elements
M = 3 # number of samples/output arrays

out = arr.reshape(-1, M, N).transpose(1, 0, 2).reshape(-1, arr.size//M)

# or to map to individual variables
a,b,c = arr.reshape(-1, M, N).transpose(1, 0, 2).reshape(-1, arr.size//M)

@U13-Forward's approach only works when M = 2, if can however be generalized using a list comprehension:

N = 4 # number of consecutive elements
M = 3 # number of samples/output arrays

reshaped = arr.reshape(-1, N*M)
out = [reshaped[:, n*N:n*N+N].ravel() for n in range(M)]

@U13-Forward I actually used a different logic, which I think should be easier than reshaping ;)
@mozway could you please add my answer to the graph please ?
@VincentBénet can you please check the output of your approach? I think it's incorrect
Thank you, it is working perfectly. I choose mozway solution as I can have 3 sub-arrays

PaulS · Accepted Answer · 2024-06-10 17:43:07Z

5

Another possible solution, which, simply, uses boolean masking (x is the original array):

i = np.arange(1, len(x)+1) % 8 
m = (i >= 1) & (i <= 4) 
a = x[m]
b = x[~m]
a, b

Output:

(array([ 1,  2,  3,  4,  9, 10, 11, 12]),
 array([ 5,  6,  7,  8, 13, 14, 15, 16]))

edited Jun 10, 2024 at 17:43

answered Jun 10, 2024 at 17:32

PaulS

27.1k3 gold badges19 silver badges40 bronze badges

4 Comments

mozway Over a year ago

That's a nice one (+1) but you might want to use the indices as reference, not the values

PaulS Over a year ago

Thanks, @mozway! Following your suggestion, I have meanwhile edited my solution to use indices instead of values. Thanks!

mozway Over a year ago

I added some timings

PaulS Over a year ago

Thanks, @mozway, for having added those timings!

hpaulj · Accepted Answer · 2024-06-10 20:04:06Z

Another method using reshape and transpose - U13Forward uses a 2d reshaping, I doing a 3d one:

In [262]: arr = np.arange(1,17); arr
Out[262]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])


In [264]: arr.reshape(-1,2,4)
Out[264]: 
array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8]],

       [[ 9, 10, 11, 12],
        [13, 14, 15, 16]]])

In [265]: arr.reshape(-1,2,4).transpose(1,0,2)
Out[265]: 
array([[[ 1,  2,  3,  4],
        [ 9, 10, 11, 12]],

       [[ 5,  6,  7,  8],
        [13, 14, 15, 16]]])

In [266]: arr.reshape(-1,2,4).transpose(1,0,2).reshape(-1,8)
Out[266]: 
array([[ 1,  2,  3,  4,  9, 10, 11, 12],
       [ 5,  6,  7,  8, 13, 14, 15, 16]])

And to get a list of arrays instead of a 2d array, just use list:

In [268]: list(_266)
Out[268]: 
[array([ 1,  2,  3,  4,  9, 10, 11, 12]),
 array([ 5,  6,  7,  8, 13, 14, 15, 16])]

Individually reshape and transpose make views, but combined at least one copy is required.

Quick timing tests show that this performs well for the small example, but it doesn't scale as well. That final list step adds a lot of time for large problems - since it produces 1000's of (8,) arrays (as opposed to just 2 (n*4) arrays).

U13-Forward · Accepted Answer · 2024-06-14 10:39:05Z

TL;DR, for fastest solution, go to Solution #1.

Solutions:

Input Data!

data = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])

Output:

a
array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])
b
array([[ 5,  6,  7,  8],
       [13, 14, 15, 16]])

NumPy:

Solution 1: Fastest one + Generalized:

We reshape the matrix to already done chunks, and then we use a list comprehension to slice the arrays and accomplish the task.

I use reshaped[:, i, :] to filter out the specific chunk from the 3D matrix.

Code below:

reshaped = data.reshape(-1, m, n)
chunks = [reshaped[:, i, :] for i in range(m)]

For 1D:

reshaped = data.reshape(-1, m, n)
chunks = [reshaped[:, i, :].ravel() for i in range(m)]

Above is the best and cleanest answer!

Solution 2: Slower with array_split

np.array_split splits numpy arrays into chunks, but normally it splits to a certain amount of chunks, which is not what we want, we want to split the array to chunks of certain amount of items.

I'm using (data.shape[0] + 3) // 4, which gets the length of the array, adds 3 for insurance of the division (in case of extra items), and use floor division to divide by chunks of 4, to know how many chunks we need.

Then I use [::2] and [1::2] to split every other element into the two lists.

Code below:

chunks = np.array_split(data, (data.shape[0] + 3) // 4)
a, b = chunks[::2], chunks[1::2]

Solution 3: Another faster one with reshape

I use reshape to calculate how many chunks with using floor division to divide 4 with the shape of the matrix.

Then I use [::2] and [1::2] to split every other element into the two lists.

chunks = data.reshape((data.shape[0] // 4, 4))
a, b = chunks[::2], chunks[1::2]

I reshape the array into a 2D array with chunks of 4 items.

Solution 4: Reshape and chunking

I use reshape to change the shape of the array, to chunks of 8 using (-1, 8).

If you don't understand what -1 does, check out the post here.

As mentioned in the comments my @BallpointBen:

... When reshaping an array, the new shape must contain the same number of elements as the old shape, meaning the products of the two shapes' dimensions must be equal. When using a -1, the dimension corresponding to the -1 will be the product of the dimensions of the original array divided by the product of the dimensions given to reshape so as to maintain the same number of elements.

I then use numpy slicing to properly slice the matrix by chunks of 4.

Code below:

reshaped = data.reshape(-1, 8)
a = reshaped[:, :4]
b = reshaped[:, 4:]

List Comprehension

Solution 5: Using modulo for conditional slicing in list comprehension

I iterate through the length of the array, then every 4 items, I use slicing to get chunks of 4.

Then I use [::2] and [1::2] to split every other element into the two lists.

Code below:

chunks = [data[i:i+4] for i in range(len(data)) if (i % 4) == 0]
a, b = chunks[::2], chunks[1::2]

Solution 6: Using list comprehension and zip:

I iterate through the length of the array, then every 8 items, I use slicing to get 2 chunks of 4.

After that, I use zip to alternate the items to achieve the desired result.

Code below:

chunks = [[data[i:i+4], data[i+4:i+8]] for i in range(len(data)) if (i % 8) == 0]
a, b = zip(*chunks)

Timings:

I'm testing the following of exactly 499875840 items, as the OP specified to determine the speed of each solution:

data = np.arange(1, 499875841)

Code used to test:

import timeit

def U13_1():
    # Generalized
    data = np.arange(1, 499875841)
    reshaped = data.reshape(-1, n, m)
    chunks = [reshaped[:, i, :] for i in range(n)]

def U13_2():
    data = np.arange(1, 499875841)
    chunks = np.array_split(data, (data.shape[0] + 3) // 4)
    a, b = chunks[::2], chunks[1::2]

def U13_3():
    data = np.arange(1, 499875841)
    chunks = data.reshape((data.shape[0] // 4, 4))
    a, b = chunks[::2], chunks[1::2]

def U13_4():
    data = np.arange(1, 499875841)
    reshaped = data.reshape(-1, 8)
    a = reshaped[:, :4]
    b = reshaped[:, 4:]

def U13_5():
    data = np.arange(1, 499875841)
    chunks = [data[i:i+4] for i in range(len(data)) if (i % 4) == 0]
    a, b = chunks[::2], chunks[1::2]

def U13_6():
    data = np.arange(1, 499875841)
    chunks = [[data[i:i+4], data[i+4:i+8]] for i in range(len(data)) if (i % 8) == 0]
    a, b = zip(*chunks)

def hpaulj():
    # Generalized
    data = np.arange(1, 499875841)
    a,b,c = arr.reshape(-1, M, N).transpose(1, 0, 2).reshape(-1, arr.size//M) 

def mozway():
    # Generalized
    data = np.arange(1, 499875841)
    out = np.split(np.argsort(np.arange(len(data))%(4*2)//4, kind='stable'), 2)

def pauls():
    data = np.arange(1, 499875841)
    i = np.arange(1, len(data)+1) % 8 
    m = (i >= 1) & (i <= 4) 
    a = data[m]
    b = data[~m]

U13_1 = timeit.timeit('U13_1()', 'from __main__ import U13_1', number=1)
print('U13_1 Speed:', U13_1)
U13_2 = timeit.timeit('U13_2()', 'from __main__ import U13_2', number=1)
print('U13_2 Speed:', U13_2)
U13_3 = timeit.timeit('U13_3()', 'from __main__ import U13_3', number=1)
print('U13_3 Speed:', U13_3)
U13_4 = timeit.timeit('U13_4()', 'from __main__ import U13_4', number=1)
print('U13_4 Speed:', U13_4)
U13_5 = timeit.timeit('U13_5()', 'from __main__ import U13_5', number=1)
print('U13_5 Speed:', U13_5)
hpaulj = timeit.timeit('hpaulj()', 'from __main__ import hpaulj', number=1)
print('U13_6 Speed:', U13_6)
hpaulj = timeit.timeit('hpaulj()', 'from __main__ import hpaulj', number=1)
print('hpaulj Speed:', hpaulj)
mozway = timeit.timeit('mozway()', 'from __main__ import mozway', number=1)
print('mozway Speed:', mozway)
pauls = timeit.timeit('pauls()', 'from __main__ import pauls', number=1)
print('pauls Speed:', pauls)

import matplotlib.pyplot as plt
df = pd.DataFrame({'U13_1': U13_1, 'U13_2': U13_2, 'U13_3': U13_3, 'U13_4': U13_4, 'U13_5': U13_5, 'U13_6': U13_6, 'hpaulj': hpaulj, 'mozway': mozway, 'pauls': pauls}, index=[0])

plt.plot(df.columns, df.iloc[0])
plt.show()

Output:

U13_1 Speed: 0.45211420000032376
U13_2 Speed: 378.43887300000006
U13_3 Speed: 0.6713313000000198
U13_4 Speed: 0.6900013000004037
U13_5 Speed: 298.6191995999998
U13_6 Speed: 601.4342458000001
hpaulj Speed: 1.2215797999997449
mozway Speed: 21.53476950000004
pauls Speed: 4.3681056999994325

Fastest ones:

My solution #1 Generalized
My solution #3 Not Generalized
My solution #4 Not Generalized

Fastest ones (only including generalized):

My solution #1 Generalized
@hpaulj's solution Generalized
@mozway's solution Generalized

Graph:

Timing on 16 items is most likely not meaningful. If it was, then a better option would be a fixed size indexing. As you can see in my original timings, the fastest approaches for small arrays scale badly. You should time using a wide range of sizes
Also, generalizing to M output arrays (as indicated by OP, M = 3 in their current case), the "fastest" approach would require "M" slicing steps, this is not taken into account in these timings.
OK, so it's more or less back to hpaulj's 3D reshaping (which was also my original approach when the question was not yet clarified) ;) Note that your current updated #1 gives 2D arrays, you need an extra step to get 1D. Also, the repeated slicing starts to be expensive when M is large
@mozway I tested with 499875840 elements, as the op specified.

Soudipta Dutta · Accepted Answer · 2024-07-05 09:10:45Z

Use : np.einsum

import numpy as np

arr = np.arange(1, 17)
gr_size = 8 
half_size = gr_size // 2 

reshaped = arr.reshape(-1,gr_size)
print(reshaped) 
'''
[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
'''
aa = np.einsum('ij -> ji',reshaped[:,:half_size]).T.reshape(-1)

bb = np.einsum('ij -> ji',reshaped[:,half_size:]).T.reshape(-1)

print('aa ->',aa)#[ 1  2  3  4  9 10 11 12]
print('bb ->',bb)#[ 5  6  7  8 13 14 15 16]
'''
aa -> [ 1  2  3  4  9 10 11 12]
bb -> [ 5  6  7  8 13 14 15 16]
'''

Generalized Solution using np.einsum

import numpy as np

def split_and_reshape(arr, group_size):
    # Ensure the array can be reshaped into the desired format
    assert len(arr) % group_size == 0, "Array length must be divisible by the group size"
    
    # Reshape the array into groups of group_size
    reshaped = arr.reshape(-1, group_size)
    
    # Calculate half of the group size for splitting
    half_size = group_size // 2
    
    # Apply einsum and reshape
    a = np.einsum('ij -> ji', reshaped[:, :half_size]).T.reshape(-1)
    b = np.einsum('ij -> ji', reshaped[:, half_size:]).T.reshape(-1)
    
    return a, b


arr = np.arange(1, 17)
a, b = split_and_reshape(arr, 8)

print("a ->", a)
print("b ->", b)
'''
a -> [ 1  2  3  4  9 10 11 12]
b -> [ 5  6  7  8 13 14 15 16]
'''

Collectives™ on Stack Overflow

How to extract N elements every M elements from an array?

5 Answers 5

timings

generalization

6 Comments

4 Comments

2 Comments

Solutions:

NumPy:

Solution 1: Fastest one + Generalized:

List Comprehension

Timings:

5 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

timings

generalization

6 Comments

4 Comments

2 Comments

Solutions:

NumPy:

Solution 1: Fastest one + Generalized:

List Comprehension

Timings:

5 Comments

Comments

Linked

Related