PDI /Kettle - Passing data from previous hop to database query
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
add a comment |
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 '18 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 '18 at 10:29
add a comment |
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
pentaho kettle
asked Nov 20 '18 at 2:17
luchollandlucholland
302311
302311
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 '18 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 '18 at 10:29
add a comment |
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 '18 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 '18 at 10:29
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table
you'll loose all the field which are not in the table.– AlainD
Nov 20 '18 at 8:58
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table
you'll loose all the field which are not in the table.– AlainD
Nov 20 '18 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 '18 at 10:29
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 '18 at 10:29
add a comment |
2 Answers
2
active
oldest
votes
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 '18 at 13:40
add a comment |
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385287%2fpdi-kettle-passing-data-from-previous-hop-to-database-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 '18 at 13:40
add a comment |
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 '18 at 13:40
add a comment |
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
answered Nov 21 '18 at 10:48
AlainDAlainD
4,18831126
4,18831126
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 '18 at 13:40
add a comment |
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 '18 at 13:40
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 '18 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.– AlainD
Nov 21 '18 at 13:40
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.– AlainD
Nov 21 '18 at 13:40
add a comment |
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
add a comment |
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
add a comment |
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
answered Nov 20 '18 at 9:16
AlainDAlainD
4,18831126
4,18831126
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
add a comment |
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
I'll update once I test this out. Thanks!
– lucholland
Nov 20 '18 at 10:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385287%2fpdi-kettle-passing-data-from-previous-hop-to-database-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 '18 at 8:46
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.– AlainD
Nov 20 '18 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 '18 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 '18 at 10:03
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 '18 at 10:29