Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime2 format as tracking collumn with MSSQL server #351

Open
jborovi opened this issue Sep 25, 2019 · 2 comments
Open

datetime2 format as tracking collumn with MSSQL server #351

jborovi opened this issue Sep 25, 2019 · 2 comments

Comments

@jborovi
Copy link

jborovi commented Sep 25, 2019

The bug:
Using datetime2 format as tracking_collumn with logstash jdbc-input plugin and MSSQL server, results in duplicated data.

config:

input {
    jdbc {
    	jdbc_connection_string => "jdbc:sqlserver://127.0.0.1\SQLExpress;database=TestDB;user=test;password=Heslo@111;portNumber=1433"
        jdbc_user => nil
        jdbc_validate_connection => true
        jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mssql-jdbc-7.4.1.jre11.jar"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        schedule => "* * * * *"
        statement => "SELECT id_column
        	from changelog where t_datetime2 > :sql_last_value"
        tracking_column=>"t_datetime2"
		record_last_run => true
		tracking_column_type => "timestamp"
}
}

output{
redis {
    host => "127.0.0.1"
    key => "logstash-test"
    data_type => "list"
  }
}

screenshots from database input and redis output shows there is 398800 rows in input and 401791 results in output

datetime2_sqlcount

datetime2redisresult

Screenshot of min and max t_datetime2 values

datetime2_min_max

screenhots of actualy queries jdbc-input used to get the data from database

datetime2query1

datetime2query2

Sql queries are loosing the precision and this may result in duplicated data.

I am using following docker logstash image
logstash/logstash:7.3.2

mssql server image
microsoft/mssql-server-linux

Tried to search forum for similar issue, only found this comment describing same problem.
#140 (comment)

Sample data
db definition
dbdefinition.txt

dataimport
dataimport.txt

Steps to reprduce:

  1. docker-compose -f docker-compose-config.yml down
  2. docker-compose -f docker-compose.config.yml up --build
  3. docker exec -it rediscontainerid redis-cli llen logstash-test
  4. observe more rows inserted into output than excpected, this number is random but never less than inserted count in input database

OS debian 9
Version logstash 7.3.2

Thank you for response.

@ngwwm
Copy link

ngwwm commented Jan 8, 2025

I have the same issue. My tracking column is a datetime2 as well. The timestamp in logstash_jdbc_last_run looks good. However, it lost its precision when using it in the SQL statement.

From Logstash

logstash01  | [2025-01-08T09:35:00,490][INFO ][logstash.inputs.jdbc     ][main][def6a17c9ec1ce2ef62f22c761ddd037290efc702480f407683d4445c1622a65] (0.007329s) SELECT RequestDtm, JsonData
logstash01  |             FROM [User].[RequestAudit]
logstash01  |             WHERE RequestDtm > '2025-01-08T03:39:07.542'
logstash01  |             ORDER BY RequestDtm

Content of the file logstash_jdbc_last_run

$ cat logstash_jdbc_last_run
--- 2025-01-08 03:39:07.542451900 Z

I am using Logtash docker image docker.elastic.co/logstash/logstash:8.16.0

@jborovi
Copy link
Author

jborovi commented Jan 8, 2025

Hi @ngwwm as I remember correctly from 5 years, we solved issue in our query to no receive duplicates in a way, that we dont ask for precise last timestamp value, but we decreased the last digit in timestamp, this resolved duplicates, but obviously we dont get all the last logs, until there is new logs with increased timestamp. I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants