Skip to main content
Best practices
0 votes
0 replies
48 views

Problem Statement: I needed a robust way to ingest data from Kafka to BigQuery using Apache Beam/Dataflow, with at-least-once delivery, durable checkpointing, and safe offset progression—even when ...
Parag Ghosh's user avatar
1 vote
3 answers
59 views

We are running a pipeline which is runnig of managed apached flink runner on AWS. The version of the flink we are using is 1.19 and the beam version is 2.61.0. First I start the application with ...
ranidu harshana's user avatar
Advice
0 votes
1 replies
59 views

I'm new to Apache Beam running on GCP, but my question is more theoretical than practical. I have a source spanner table and a destination spanner table and I'm fetching data from source table to ...
otto's user avatar
  • 181
1 vote
1 answer
61 views

I noticed that the class PulsarMessage has private getters in version 2.69.0. Shouldn't they be public inorder to access the topics names and/or payload of the message. Artifact link : https://...
Vaibhav Chandra's user avatar
1 vote
1 answer
42 views

I'm working with Apache Beam(2.62) and ran into a confusing behavior with DoFn.process() when using yield from and TaggedOutput. When do_something_second function yields multiple TaggedOutput, ...
Dogil's user avatar
  • 117
-3 votes
1 answer
83 views

I'm trying to write a Python GCP dataflow that processes records from a Spanner change stream and prints them out. I am running it locally and it appears to work but prints no records when I update a ...
Joe P's user avatar
  • 525
0 votes
1 answer
121 views

In the latest Apache Beam 2.68.0, they have changed the behavior of Coders for non-primitive objects. (see the changelog here). Therefore, I get a warning like this on GCP Dataflow. "Using ...
Praneeth Peiris's user avatar
0 votes
1 answer
63 views

I'm currently upgrading the Apache Beam version for my Dataflow application from 2.51.0 to 2.67.0. As part of this process, I'm encountering a compatibility issue with the google-api-services-storage ...
Optimizer's user avatar
  • 271
0 votes
1 answer
111 views

I am trying to build a Python based Apache Beam pipeline which s going to read from Kafka. Kafka requires Truststore and Keystore JKS file based authentication. kafka_consumer_config = { "...
Bhargav Velisetti's user avatar
0 votes
0 answers
56 views

I'm currently working on migrating from ZetaSQL to Calcite within an Apache Beam pipeline. I need to use specific transformations that are only available when the BigQuery dialect is enabled. I ...
Florian Ferreira's user avatar
0 votes
0 answers
66 views

I've been trying for a some time to got a beam pipeline to do data transformations for a fairly simple machine learning transformation, but apache beam and Tensorflow-transform won't play nicely ...
George Chapman-Brown's user avatar
0 votes
0 answers
62 views

I would like to use an ErrorHandler to catch all the errors that happens during my pipeline. I have seen that there is an interface which allows to do so : https://beam.apache.org/releases/javadoc/...
Dev Yns's user avatar
  • 229
0 votes
0 answers
57 views

I have four regions (a, b, c, d) and I want to create a single data set concatenating all the 4 and store in c how can this be done? Tried with dbt- Python but had to hard code a lot looking for a ...
N_epiphany's user avatar
0 votes
1 answer
56 views

I'm experiencing inconsistent behavior between Apache Beam's DirectRunner (local) and DataflowRunner (GCP) when using AvroCoder with an immutable class. Problem I have an immutable class defined using ...
Nihal sharma's user avatar
0 votes
1 answer
80 views

I'm trying to write to BigQuery using Apache Beam, in python. However, I want to use the newest CDC features to write on Bigquery. However, I can't get the correct format of the objects in the ...
José Fonseca's user avatar

15 30 50 per page
1
2 3 4 5
324