-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion]: AWS Redshift Data API Ballerina Connector #7446
Comments
Do we need to provide the database config each time we send a request? I'd prefer one config per client. (That's how this would be usually used). We can get the database configs at the init method and use them internally when we send a request. |
Agreed. But, Redshift Data API is bit flexible in that context. A user can query different database with each request. That is why we thought of providing that flexibility in the Ballerina API. But IMO we can improve it like this. If a user wants to use a single database instance for all queries, he/she can provide a Database Config during the client initialization. And when executing a query we can get that and use that as the db config. But, if user does not provide a DB config during the client initialization, he/she can provide the db-config with the client action. I think we can print a waring during the runtime if the user provides both the options. And we can specify a priority in our internal implementation to which db-config to use. @ThisaruGuruge WDYT ? |
IMO, we should follow the other DB connectors, where we create a single client for a given connection. Even though the Data API is flexible, I don't think the 80% case would be to connect multiple DBs in a single application. Even if we have multiple DBs in a single application, the more intuitive approach would be to keep a client and use it for a single DB (even though we won't be keeping open connections like the other connectors). @shafreenAnfar @daneshk thoughts? |
How does the API look like in Java, Go and C#? |
JavaIn the Java SDK, executing a query and retrieving the results involves invoking three distinct API operations:
execstmt_req, execstmt_err := redshiftclient.ExecuteStatement(&redshiftdataapiservice.ExecuteStatementInput{
ClusterIdentifier: aws.String(redshift_cluster_id),
DbUser: aws.String(redshift_user),
Database: aws.String(redshift_database),
Sql: aws.String(query),
})
descstmt_req, descstmt_err := redshiftclient.DescribeStatement(&redshiftdataapiservice.DescribeStatementInput{
Id: execstmt_req.Id,
})
getresult_req, getresult_err := redshiftclient.GetStatementResult(&redshiftdataapiservice.DetStatementResultInput{
Id: execstmt_req.Id,
}) For more details, refer to the C#There is not SDK available in C# |
I am +1 for sticking with the current SQL connector APIs, with a few minor adjustments. Here are my suggestions:
As we do HTTP calls underneath, we need to think through how we could map HTTP call to our APIs |
Design Review Notes
|
Synchronous and Asynchronous Client ComparisonRedshift Data API Java SDK: Synchronous and Asynchronous Client Comparison |
Client API DesignClient Definitionpublic type Client distinct client object {
# Initializes AWS Redshift Data API client.
#
# + connectionConfig - Configurations related to redshift data api
# + return - The `redshiftdata:Client` or `redshiftdata:Error` if the initialization fails
public isolated function init(*ConnectionConfig connectionConfig) returns Error?;
# Executes the SQL query.
#
# + sqlStatement - The SQL statement to be executed
# + databaseConfig - The database configurations.
# + return - The statementId that can be used to retrieve the results or an error
remote function executeStatement(sql:ParameterizedQuery sqlStatement, DatabaseConfig databaseConfig? = ())
returns string|Error;
# Executes the SQL queries in a batch.
#
# + sqlStatements - The SQL statements to be executed
# + databaseConfig - The database configurations.
# + return - The statementIds that can be used to retrieve the results or an error
remote function batchExecuteStatement(sql:ParameterizedQuery[] sqlStatements, DatabaseConfig? databaseConfig = ())
returns string[]|Error;
# Retrieves the results of a previously executed SQL statement.
#
# + statementId - The identifier of the SQL statement
# + timeout - The timeout in seconds to retrieve the results.
# + rowTypes - The typedesc of the record to which the result needs to be returned
# + return - Stream of records in the type of rowTypes or possible error
remote isolated function getQueryResult(string statementId, decimal? timeout = (),
typedesc<record {}> rowTypes = <>)
returns stream<rowTypes, sql:Error?>|Error;
# Retrieves the execution result of a previously executed SQL statement.
#
# + statementId - The identifier of the SQL statement
# + timeout - The timeout in seconds to retrieve the results.
# + return - Metadata of the query execution as an sql:ExecutionResult or an error
remote isolated function getExecutionResult(string statementId, decimal? timeout = ())
returns sql:ExecutionResult|Error;
};
ConnectionConfig# Additional configurations related to redshift data api
#
# + region - The AWS region with which the connector should communicate
# + authConfig - The authentication configurations for the redshift data api
# + databaseConfig - The database configurations
# This can be overridden in the individual execute and batchExecute requests.
# + timeout - The timeout to be used to get the query results and execution results in `seconds`
# + pollingInterval - The polling interval to be used to get the query results and execution results in `seconds`
public type ConnectionConfig record {|
Region region;
AuthConfig authConfig;
DatabaseConfig databaseConfig;
decimal timeout = 30;
decimal pollingInterval = 5;
|};
Region# An Amazon Web Services region that hosts a set of Amazon services.
public enum Region {
AF_SOUTH_1 = "af-south-1",
AP_EAST_1 = "ap-east-1"
// more regions
}
AuthConfig# Auth configurations for the redshift data api
#
# + awsAccessKeyId - The AWS access key ID
# + awsSecretAccessKey - The AWS secret access key
# + sessionToken - The session token if the credentials are temporary
public type AuthConfig record {|
string awsAccessKeyId;
string awsSecretAccessKey;
string sessionToken?;
|};
DatabaseConfig# Database configurations
#
# + clusterId - The cluster identifier
# + databaseName - The name of the database
# + databaseUser - The database user
public type DatabaseConfig record {|
string clusterId;
string databaseName;
string databaseUser;
|};
Errorspublic type Error distinct error;
Sample Usageimport ballerinax/redshiftdata;
redshiftdata:AuthConfig authConfig = {
awsAccessKeyId: "<AWS_ACCESS_KEY_ID>",
awsSecretAccessKey: "<AWS_SECRET_ACCESS_KEY>"
};
redshiftdata:DatabaseConfig databaseConfig = {
clusterId: "<CLUSTER_ID>",
databaseName: "<DATABASE_NAME>",
databaseUser: "<DATABASE_USER>"
};
redshiftdata:ConnectionConfig connectionConfig = {
region: "af-south-1",
authConfig: authConfig,
databaseConfig: databaseConfig
};
type User record {|
string name;
string email;
string city;
|};
public function main() returns error? {
redshiftdata:Client redshift = check new (connectionConfig);
// Insert records
User[] users = [
{name: "John Doe", email: "[email protected]", city: "New York"},
{name: "Jane Doe", email: "[email protected]", city: "California"}
];
sql:ParameterizedQuery[] insertQueries = from User user in users
select `INSERT INTO users (name, email, city) VALUES (${user.name}, ${user.email}, ${user.city})`;
string[] insertRequestIds = check redshift->batchExecuteStatement(insertQueries);
// Retrieve the execution result of the first insert statement
sql:ExecutionResult insertResults = check redshift->getExecutionResult(insertRequestIds[0]);
// Retrieve records
string retrieveRequestId = check redshift->executeStatement(`SELECT * FROM users`, databaseConfig);
// Retrieve the query result
stream<User, sql:Error?> result = check redshift->getQueryResult(retrieveRequestId, 40);
check result.forEach(function(User user) {
io:println(user);
});
}
Redshift Data API Java SDK: Synchronous vs. Asynchronous Client ComparisonKey Details from the Comparison
DecisionGiven the slightly better performance and simpler handling provided by the synchronous client, I decided to proceed with the synchronous client. Thread Handling in Asynchronous Client
|
@chathushkaayash in the public type ExecutionConfig record {|
decimal timeout = 30;
decimal pollingInterval = 5;
|};
public type Client distinct client object {
// other methods
remote isolated function getQueryResult(string statementId, *ExecutionConfig executionConfig,
typedesc<record {}> rowTypes = <>)
returns stream<rowTypes, sql:Error?>|Error;
remote isolated function getExecutionResult(string statementId, *ExecutionConfig executionConfig)
returns sql:ExecutionResult|Error;
} |
Batch Statement Execution in Redshift Data APIWhen executing batch statements using the
|
Revised API design for Ballerina Redshift Data connector.As per the offline discussion we had with @daneshk and @ThisaruGuruge following is the revised API design for Ballerina Redshift Data connector. 1. ConfigurationsConnectionConfigpublic type ConnectionConfig record {|
Region region;
AuthConfig authConfig;
Cluster|WorkGroup dbAccessConfig?;
|}; Regionpublic enum Region {
AF_SOUTH_1 = "af-south-1",
AP_EAST_1 = "ap-east-1"
// more regions
} AuthConfigpublic type AuthConfig record {|
string accessKeyId;
string secretAccessKey;
string sessionToken?;
|}; DbAccessConfigpublic type Cluster record {|
@constraint:String {
minLength: 1,
maxLength: 63
}
string id;
string database;
string dbUser?;
string secretArn?;
@constraint:Int {
minValue: 0,
maxValue: 86400
}
int sessionKeepAliveSeconds?;
|};
public type WorkGroup record {|
string name;
string database;
string secretArn?;
@constraint:Int {
minValue: 0,
maxValue: 86400
}
int sessionKeepAliveSeconds?;
|};
@constraint:String {
pattern: re `^[a-z0-9]{8}(-[a-z0-9]{4}){3}-[a-z0-9]{12}(:\d+)?$`
}
public type SessionId string; 2. Client APIpublic type Client distinct client object {
public isolated function init(*ConnectionConfig connectionConfig) returns Error?;
remote isolated function executeStatement(sql:ParameterizedQuery sqlStatement,
*ExecuteStatementConfig executeStatementConfig)
returns ExecuteStatementResponse|Error;
remote isolated function batchExecuteStatement(
sql:ParameterizedQuery[] sqlStatements,
*ExecuteStatementConfig batchExecuteStatementConfig)
returns ExecuteStatementResponse|Error;
remote isolated function getStatementResult(StatementId statementId,
typedesc<record {}> rowTypes = <>)
returns stream<rowTypes, sql:Error?>|Error;
remote isolated function describeStatement(StatementId statementId)
returns DescribeStatementResponse|Error;
remote isolated function close() returns Error?;
}; 3. Types related to executeStatement API and batchExecuteStatement APIpublic type ExecuteStatementConfig record {|
Cluster|WorkGroup|SessionId dbAccessConfig?;
string clientToken?;
@constraint:String {
minLength: 0,
maxLength: 500
}
string statementName?;
boolean withEvent?;
|};
public type ExecuteStatementResponse record {|
time:Utc createdAt;
string[] dbGroups?;
StatementId statementId;
SessionId sessionId?;
|}; 4. Types related to describeStatement APIpublic type DescribeStatementResponse record {|
*StatementData;
StatementData[] subStatements?;
int redshiftPid;
SessionId sessionId?;
|};
public type StatementData record {|
StatementId statementId;
time:Utc createdAt;
decimal duration; // in seconds
string 'error?;
boolean hasResultSet;
string queryString?;
int redshiftQueryId;
int resultRows;
int resultSize;
Status status;
time:Utc updatedAt;
|};
public enum Status {
SUBMITTED,
PICKED,
STARTED,
FINISHED,
ABORTED,
FAILED,
ALL
} 5. Errorspublic type Error distinct error; For more details and findings, please refer to this Google Doc. |
Summary
This proposal aims to develop a Ballerina connector for the AWS Redshift Data API. The connector will offer a simplified API for executing queries, enhancing Ballerina+Redshift integration capabilities by addressing performance bottlenecks, such as open connections commonly associated with JDBC-based APIs.
Goals
Motivation
While a JDBC connector for Redshift is already available, its performance is limited compared to the data API. This connector is built to address performance bottlenecks, enabling faster response times and simplifying integration with Redshift databases.
Description
Client API Design
Client Definition
ConnectionConfig
Region
AuthConfig
HttpClientOptions
DatabaseConfig
Sample Usage
Dependencies
The text was updated successfully, but these errors were encountered: