BigQueryReadClient.create_read_session returning multiple empty streams

Summary

The BigQueryReadClient returns multiple 'empty' streams when instantiating a read session via create_read_session. Attempting to invoke to_pandas() on the stream reader yeilds an AttributeError.

Firstly, is this behaviour abnormal? If not, I will just wrap the method in a try/catch and plow on.

Environment details

OS type and version: Linux - WSL - Ubuntu 22.04.03 LTS
Python version: 3.9.18
pip version: 23.3.1
package manager: poetry@1.7.1
google-cloud-bigquery-storage version: 2.24.0

Steps to reproduce

    from google.cloud.bigquery_storage_v1 import BigQueryReadClient, types

    client = BigQueryReadClient()
    requested_session = types.ReadSession()
    requested_session.table = "projects/<project>/datasets/<dataset>/tables/<table>"

    requested_session.data_format = types.DataFormat.AVRO 

    requested_session.read_options.selected_fields = <some_fields>

    requested_session.read_options.row_restriction = <some_row_restriction>

    parent = "projects/<project_id>"
    session = client.create_read_session(
        parent=parent,
        read_session=requested_session,
    )

    dfs = []
    for stream in session.streams:
        reader = client.read_rows(stream.name)
        sub_df = reader.to_dataframe() # < error raised here, for all but 1 of the streams: 'NoneType' object has no attribute '_parse_avro_schema'

        dfs.append(sub_df)
        ...

Stack trace

Exception has occurred: AttributeError
'NoneType' object has no attribute '_parse_avro_schema'
  File "reader.py", line 424, in to_dataframe
    self._stream_parser._parse_avro_schema()
  File "reader.py", line 299, in to_dataframe
    return self.rows(read_session=read_session).to_dataframe(dtypes=dtypes)
AttributeError: 'NoneType' object has no attribute '_parse_avro_schema'

The relevent line:

python-bigquery-storage/google/cloud/bigquery_storage_v1/reader.py

Line 422 in fe09e3b

self._stream_parser._parse_avro_schema()

So clearly, the object is not being populated as expected. After inspecting the data from the one stream that does yeild data, it seems that the remaining streams are empty.

Detail

The emergence of this problem is something specific to the table that I am accessing, and the combination of filtering and type of the requested field. The minimal case where this occurs is when quering a single BYTES type field. The approximate size of this field is 0.1Mb.

The issue persists when querying one row. I can query 1 row, of just this BYTES field from the BigQuery table and I will get some 13 empty streams and 1 populated stream.

If I try catch over the streams, I am able to successfully grab the data from the one stream.

Am I doing something wrong here, or is this normal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`BigQueryReadClient.create_read_session` returning multiple empty streams #733

Summary

Environment details

Steps to reproduce

Stack trace

Detail

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BigQueryReadClient.create_read_session returning multiple empty streams #733

Description

Summary

Environment details

Steps to reproduce

Stack trace

Detail

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`BigQueryReadClient.create_read_session` returning multiple empty streams #733