Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not write a batch of spans to model table #440

Open
hranitely2k opened this issue Oct 30, 2024 · 5 comments
Open

Could not write a batch of spans to model table #440

hranitely2k opened this issue Oct 30, 2024 · 5 comments

Comments

@hranitely2k
Copy link

We have k8s cluster with installed Signoz helm chart 0.46.0 version (app 0.50.0).
Clickhouse instance on external standalone server.

In signoz open telemetry collector container stdout logs we see:

"level":"error","ts":1730190923.3544044,"caller":"clickhousetracesexporter/writer.go:411","msg":"Could not write a batch of spans to model table: ","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","stacktrace":"github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).WriteBatchOfSpans\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/writer.go:411\ngithub.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*storage).pushTraceData\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/clickhouse_exporter.go:436\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:59\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:49\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:89\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:159\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:99\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43"}
{"level":"info","ts":1730190923.3545008,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","interval":"42.604904216s"}
{"level":"warn","ts":1730190956.3624632,"caller":"clickhousemetricsexporter/exporter.go:279","msg":"Dropped exponential histogram metric with no data points","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"warn","ts":1730190957.1311152,"caller":"clickhousemetricsexporter/exporter.go:272","msg":"Dropped cumulative histogram metric","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"error","ts":1730190974.8033404,"caller":"clickhousetracesexporter/writer.go:411","msg":"Could not write a batch of spans to model table: ","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","stacktrace":"github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).WriteBatchOfSpans\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/writer.go:411\ngithub.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*storage).pushTraceData\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/clickhouse_exporter.go:436\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:59\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:49\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:89\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:159\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:99\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43"}
{"level":"info","ts":1730190974.803478,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","interval":"19.918767688s"}
{"level":"warn","ts":1730190992.2632306,"caller":"clickhousemetricsexporter/exporter.go:279","msg":"Dropped exponential histogram metric with no data points","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"warn","ts":1730190993.0498488,"caller":"clickhousemetricsexporter/exporter.go:272","msg":"Dropped cumulative histogram metric","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"error","ts":1730191002.9784613,"caller":"clickhousetracesexporter/writer.go:411","msg":"Could not write a batch of spans to model table: ","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","stacktrace":"github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).WriteBatchOfSpans\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/writer.go:411\ngithub.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*storage).pushTraceData\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/clickhouse_exporter.go:436\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:59\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:49\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:89\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:159\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:99\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43"}
{"level":"info","ts":1730191002.978587,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","interval":"38.222669725s"}
{"level":"warn","ts":1730191026.4258018,"caller":"clickhousemetricsexporter/exporter.go:279","msg":"Dropped exponential histogram metric with no data points","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"warn","ts":1730191027.2593737,"caller":"clickhousemetricsexporter/exporter.go:272","msg":"Dropped cumulative histogram metric","kind":"exporter","data_type":"metrics","name":"clickhousemetricswrite","name":"signoz_latency"}
{"level":"error","ts":1730191048.4586544,"caller":"clickhousetracesexporter/writer.go:411","msg":"Could not write a batch of spans to model table: ","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","stacktrace":"github.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*SpanWriter).WriteBatchOfSpans\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/writer.go:411\ngithub.com/SigNoz/signoz-otel-collector/exporter/clickhousetracesexporter.(*storage).pushTraceData\n\t/home/runner/work/signoz-otel-collector/signoz-otel-collector/exporter/clickhousetracesexporter/clickhouse_exporter.go:436\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesRequest).Export\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:59\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:49\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/retry_sender.go:89\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/traces.go:159\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:37\ngo.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:99\ngo.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:52\ngo.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1\n\t/home/runner/go/pkg/mod/go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43"}
{"level":"info","ts":1730191048.4587796,"caller":"exporterhelper/retry_sender.go:118","msg":"Exporting failed. Will retry the request after interval.","kind":"exporter","data_type":"traces","name":"clickhousetraces","error":"context deadline exceeded","interval":"19.364608697s"}

How correct it?

@Bewalticus
Copy link

We do encounter the same errors.

@srikanthccv
Copy link
Member

Please share more details. What is your collector config? What is the resource usage for collector and clickhouse?

@hranitely2k
Copy link
Author

hranitely2k commented Nov 7, 2024

collector config:

exporters:
  clickhouselogsexporter:
    dsn: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_LOG_DATABASE}
    timeout: 10s
  clickhousemetricswrite:
    endpoint: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_DATABASE}
    resource_to_telemetry_conversion:
      enabled: true
    timeout: 15s
  clickhousetraces:
    datasource: tcp://${CLICKHOUSE_USER}:${CLICKHOUSE_PASSWORD}@${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT}/${CLICKHOUSE_TRACE_DATABASE}
    low_cardinal_exception_grouping: ${LOW_CARDINAL_EXCEPTION_GROUPING}
  prometheus:
    endpoint: 0.0.0.0:8889
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: localhost:1777
  zpages:
    endpoint: localhost:55679
processors:
  batch:
    send_batch_size: 50000
    timeout: 10s
  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
      - k8s.deployment.name
      - k8s.node.name
    filter:
      node_from_env_var: K8S_NODE_NAME
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  resourcedetection:
    detectors:
    - env
    - system
    system:
      hostname_sources:
      - dns
      - os
    timeout: 2s
  signozspanmetrics/cumulative:
    dimensions:
    - default: default
      name: service.namespace
    - default: default
      name: deployment.environment
    - name: signoz.collector.id
    dimensions_cache_size: 100000
    latency_histogram_buckets:
    - 100us
    - 1ms
    - 2ms
    - 6ms
    - 10ms
    - 50ms
    - 100ms
    - 250ms
    - 500ms
    - 1000ms
    - 1400ms
    - 2000ms
    - 5s
    - 10s
    - 20s
    - 40s
    - 60s
    metrics_exporter: clickhousemetricswrite
  signozspanmetrics/delta:
    aggregation_temporality: AGGREGATION_TEMPORALITY_DELTA
    dimensions:
    - default: default
      name: service.namespace
    - default: default
      name: deployment.environment
    - name: signoz.collector.id
    dimensions_cache_size: 100000
    latency_histogram_buckets:
    - 100us
    - 1ms
    - 2ms
    - 6ms
    - 10ms
    - 50ms
    - 100ms
    - 250ms
    - 500ms
    - 1000ms
    - 1400ms
    - 2000ms
    - 5s
    - 10s
    - 20s
    - 40s
    - 60s
    metrics_exporter: clickhousemetricswrite
receivers:
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu: {}
      disk: {}
      filesystem: {}
      load: {}
      memory: {}
      network: {}
  httplogreceiver/heroku:
    endpoint: 0.0.0.0:8081
    source: heroku
  httplogreceiver/json:
    endpoint: 0.0.0.0:8082
    source: json
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
        max_recv_msg_size_mib: 80
      http:
        endpoint: 0.0.0.0:4318
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: localhost:12345
service:
  extensions:
  - health_check
  - zpages
  - pprof
  pipelines:
    logs:
      exporters:
      - clickhouselogsexporter
      processors:
      - batch
      receivers:
      - otlp
      - httplogreceiver/heroku
      - httplogreceiver/json
    metrics:
      exporters:
      - clickhousemetricswrite
      processors:
      - batch
      receivers:
      - otlp
    metrics/internal:
      exporters:
      - clickhousemetricswrite
      processors:
      - resourcedetection
      - k8sattributes
      - batch
      receivers:
      - hostmetrics
    traces:
      exporters:
      - clickhousetraces
      processors:
      - signozspanmetrics/cumulative
      - signozspanmetrics/delta
      - batch
      receivers:
      - otlp
      - jaeger
  telemetry:
    logs:
      encoding: json
    metrics:
      address: 0.0.0.0:8888

@srikanthccv
Copy link
Member

Please the resource usage data of collectors and clickhouse at the time of issue.

@hranitely2k
Copy link
Author

Screenshot 2024-11-07 at 16 57 48

Apologies for the delay. The metrics looked like this.
We have partially localized the issue. It was observed when sending large traces (100,000+ spans in a single trace)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants