Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: org.xbill.DNS.spi.DnsjavaInetAddressResolverProvider when running multi-lang Python jobs on Dataflow with Java 21 #33471

Closed
17 tasks
chamikaramj opened this issue Dec 31, 2024 · 5 comments · Fixed by #33472
Assignees
Labels

Comments

@chamikaramj
Copy link
Contributor

chamikaramj commented Dec 31, 2024

What happened?

Some of the Python pipelines (internal to Google) fails with following error when using Java SQL transform via multi-lang.

DEFAULT 2024-12-30T23:37:02.819320530Z Exception in thread "grpc-default-executor-0" java.util.ServiceConfigurationError: java.net.spi.InetAddressResolverProvider: Provider org.xbill.DNS.spi.DnsjavaInetAddressResolverProvider not found
DEFAULT 2024-12-30T23:37:02.819913526Z at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:593)
DEFAULT 2024-12-30T23:37:02.820028778Z at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1219)
DEFAULT 2024-12-30T23:37:02.820176017Z at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1228)
DEFAULT 2024-12-30T23:37:02.820271109Z at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1273)
DEFAULT 2024-12-30T23:37:02.820398047Z at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1309)
DEFAULT 2024-12-30T23:37:02.820531718Z at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1393)
DEFAULT 2024-12-30T23:37:02.820630146Z at java.base/java.util.ServiceLoader.findFirst(ServiceLoader.java:1812)
DEFAULT 2024-12-30T23:37:02.820734542Z at java.base/java.net.InetAddress.loadResolver(InetAddress.java:508)
DEFAULT 2024-12-30T23:37:02.820858964Z at java.base/java.net.InetAddress.resolver(InetAddress.java:488)
DEFAULT 2024-12-30T23:37:02.820959923Z at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1826)
DEFAULT 2024-12-30T23:37:02.821301769Z at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:1139)
DEFAULT 2024-12-30T23:37:02.821810837Z at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1818)
DEFAULT 2024-12-30T23:37:02.821954953Z at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1688)
DEFAULT 2024-12-30T23:37:02.822056706Z at org.apache.beam.vendor.grpc.v1p60p1.io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:632)
DEFAULT 2024-12-30T23:37:02.822204252Z at org.apache.beam.vendor.grpc.v1p60p1.io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
DEFAULT 2024-12-30T23:37:02.822299518Z at org.apache.beam.vendor.grpc.v1p60p1.io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
DEFAULT 2024-12-30T23:37:02.822406484Z at org.apache.beam.vendor.grpc.v1p60p1.io.grpc.grpclb.GrpclbNameResolver.doResolve(GrpclbNameResolver.java:63)
DEFAULT 2024-12-30T23:37:02.822509145Z at org.apache.beam.vendor.grpc.v1p60p1.io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
DEFAULT 2024-12-30T23:37:02.822601931Z at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
DEFAULT 2024-12-30T23:37:02.822688916Z at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
DEFAULT 2024-12-30T23:37:02.822788464Z at java.base/java.lang.Thread.run(Thread.java:1583)

The failure occurs when trying to startup the Java SDK Harness container in worker VMs.

I believe this is due to the recent Hadoop upgrade to 3.4.1: #33312

Which probably results in following bug being hit.

dnsjava/dnsjava#338

We probably need a fix similar to this: https://github.com/netsec-ethz/scion-java-packet-example/pull/1/files

Assigning to @Abacn to look into the fix and making this a 2.62.0 release blocker.

cc: @kennknowles

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@chamikaramj
Copy link
Contributor Author

So according to https://issues.apache.org/jira/browse/HADOOP-19288, this is fixed for Hadoop 3.4.1, so I wonder if we are pulling in an old Hadoop version when building the SQL expansion service somehow.

@Abacn
Copy link
Contributor

Abacn commented Dec 31, 2024

HADOOP-19288 fixes the same issue in hadoop-client-runtime shadow jar. However as Beam expansion service jar depends on hadoop, it assembles the shadow jar with all transient dependencies, and introduced the service file.

@chamikaramj
Copy link
Contributor Author

Ack.

I can confirm that the pipeline passes when "hadoop_version" below is reverted to "2.10.2".

However we should probably do a forward fix here since the Hadoop version bump to 3.4.1 fixes a number of critical vulnerabilities for us.

@Abacn
Copy link
Contributor

Abacn commented Dec 31, 2024

a proposed fix is #33472. Let me try to find a way to test it.

@Abacn
Copy link
Contributor

Abacn commented Jan 2, 2025

reopen to track cherry-pick for release-2.62.0 branch

@Abacn Abacn reopened this Jan 2, 2025
@Abacn Abacn modified the milestones: 2.63.0 Release, 2.62.0 Release Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants