4,858 questions
Best practices
0
votes
0
replies
48
views
Building a Restart-Safe Kafka to BigQuery Pipeline with Durable Checkpointing Using Apache Beam - Dataflow
Problem Statement:
I needed a robust way to ingest data from Kafka to BigQuery using Apache Beam/Dataflow, with at-least-once delivery, durable checkpointing, and safe offset progression—even when ...
1
vote
3
answers
59
views
Flink and beam pipeline having duplicate messages in kafka consumer
We are running a pipeline which is runnig of managed apached flink runner on AWS. The version of the flink we are using is 1.19 and the beam version is 2.61.0. First I start the application with ...
Advice
0
votes
1
replies
59
views
Apache Beam update of the source table
I'm new to Apache Beam running on GCP, but my question is more theoretical than practical.
I have a source spanner table and a destination spanner table and I'm fetching data from source table to ...
1
vote
1
answer
61
views
How to access topic/payload in PulsarMessage since it has private getters in apache beam pulsar io connector 2.69.0
I noticed that the class PulsarMessage has private getters in version 2.69.0. Shouldn't they be public inorder to access the topics names and/or payload of the message. Artifact link : https://...
1
vote
1
answer
42
views
Apache Beam: yield from works for TaggedOutput, but yield beam.TaggedOutput in except is ignored
I'm working with Apache Beam(2.62) and ran into a confusing behavior with DoFn.process() when using yield from and TaggedOutput.
When do_something_second function yields multiple TaggedOutput, ...
-3
votes
1
answer
83
views
Why are there no records read from spanner change stream?
I'm trying to write a Python GCP dataflow that processes records from a Spanner change stream and prints them out. I am running it locally and it appears to work but prints no records when I update a ...
0
votes
1
answer
121
views
Apache Beam 2.68.0 throws "Using fallback deterministic coder for type" warning
In the latest Apache Beam 2.68.0, they have changed the behavior of Coders for non-primitive objects. (see the changelog here).
Therefore, I get a warning like this on GCP Dataflow.
"Using ...
0
votes
1
answer
63
views
Issue with getSoftDeletePolicy() in google-api-services-storage after Apache Beam Upgrade
I'm currently upgrading the Apache Beam version for my Dataflow application from 2.51.0 to 2.67.0. As part of this process, I'm encountering a compatibility issue with the google-api-services-storage ...
0
votes
1
answer
111
views
Dataflow Python SDK failing to Autheticate to Kafka using truststore and keystore jks files with custom docker image
I am trying to build a Python based Apache Beam pipeline which s going to read from Kafka. Kafka requires Truststore and Keystore JKS file based authentication.
kafka_consumer_config = {
"...
0
votes
0
answers
56
views
Calcite - how to use bigquery dialect with apache beam
I'm currently working on migrating from ZetaSQL to Calcite within an Apache Beam pipeline.
I need to use specific transformations that are only available when the BigQuery dialect is enabled. I ...
0
votes
0
answers
66
views
Solving Version Conflict using Apache beam with ml transforms library
I've been trying for a some time to got a beam pipeline to do data transformations for a fairly simple machine learning transformation, but apache beam and Tensorflow-transform won't play nicely ...
0
votes
0
answers
62
views
how to use the Interface ErrorHandler in Apache beam?
I would like to use an ErrorHandler to catch all the errors that happens during my pipeline.
I have seen that there is an interface which allows to do so : https://beam.apache.org/releases/javadoc/...
0
votes
0
answers
57
views
Creating Global dataset combining multiple regions in BigQuery using Apache Beam
I have four regions (a, b, c, d) and I want to create a single data set concatenating all the 4 and store in c how can this be done? Tried with dbt- Python but had to hard code a lot looking for a ...
0
votes
1
answer
56
views
AvroCoder requires default constructor in DirectRunner locally but works on GCP Dataflow - Why?
I'm experiencing inconsistent behavior between Apache Beam's DirectRunner (local) and DataflowRunner (GCP) when using AvroCoder with an immutable class.
Problem
I have an immutable class defined using ...
0
votes
1
answer
80
views
PCollection Objects Format for Apache Beam to write on BigQuery using CDC in Python
I'm trying to write to BigQuery using Apache Beam, in python.
However, I want to use the newest CDC features to write on Bigquery.
However, I can't get the correct format of the objects in the ...