+3 votes
in Programming Languages by (40.5k points)

I want to select all rows and some columns from a CSR matrix using the following code, but it gives the error: "IndexError: Index dimension must be <= 2".

data = data[:, imp_features_idx]

The variable 'imp_features_idx' is a set of columns that I want to select. The details of the error are as follows:

  File "/usr/local/lib64/python3.6/site-packages/scipy/sparse/_index.py", line 33, in __getitem__

    row, col = self._validate_indices(key)

  File "/usr/local/lib64/python3.6/site-packages/scipy/sparse/_index.py", line 147, in _validate_indices

    col = self._asindices(col, N)

  File "/usr/local/lib64/python3.6/site-packages/scipy/sparse/_index.py", line 162, in _asindices

    raise IndexError('Index dimension must be <= 2')

IndexError: Index dimension must be <= 2

1 Answer

+1 vote
by (355k points)
selected by
 
Best answer

The code is giving error because the variable 'imp_features_idx' is a set of columns. You need to use a list instead of a set. To fix the error, convert set to a list.

Make the following change in your code and it should work fine.

data = data[:, list(imp_features_idx)]

 E.g.

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> from copy import deepcopy
>>> row = np.array([0, 0, 1, 2, 2, 2,3,4,4])
>>> col = np.array([0, 2, 2, 0, 1, 2,2,0,2])
>>> data = np.array([1]*len(row))
>>> X=csr_matrix((data, (row, col)), shape=(5, 3))
>>> X
<5x3 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in Compressed Sparse Row format>
>>> X.toarray()
array([[1, 0, 1],
       [0, 0, 1],
       [1, 1, 1],
       [0, 0, 1],
       [1, 0, 1]])

>>> c=set([0,2])
>>> X[:,list(c)].toarray()
array([[1, 1],
       [0, 1],
       [1, 1],
       [0, 1],
       [1, 1]])


...