MiniDFSCluster: HDFS triple slash schema extension wrong FS
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am using defaultFS
setting in configuration of HDFS. I create configuration and then set it explicitly.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
val config = new Configuration()
config.set("fs.defaultFS", "hdfs://localhost:8020")
val fs = FileSystem.get(new URI(filePath), config)
Code seems to work fine most of the time, but for filePath with triple slash I get an error only on a few machines:
Wrong FS: hdfs:/tmp/hdfstest, expected: hdfs://localhost:8020
One slash appears only in exception message.
Everywhere else in the system I see triple slash: hdfs:///tmp/hdfstest
.
Also for the paths like /tmp/hdfstest
without triple slash, defaultFS perfectly works.
Would appreciate any piece of advice. Thank you in advance!
UPD: Exception was seen in tests run on MiniDFSCluster. During the tests I used the same MiniDFSCluster with different configurations.
scala hadoop hdfs
add a comment |
I am using defaultFS
setting in configuration of HDFS. I create configuration and then set it explicitly.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
val config = new Configuration()
config.set("fs.defaultFS", "hdfs://localhost:8020")
val fs = FileSystem.get(new URI(filePath), config)
Code seems to work fine most of the time, but for filePath with triple slash I get an error only on a few machines:
Wrong FS: hdfs:/tmp/hdfstest, expected: hdfs://localhost:8020
One slash appears only in exception message.
Everywhere else in the system I see triple slash: hdfs:///tmp/hdfstest
.
Also for the paths like /tmp/hdfstest
without triple slash, defaultFS perfectly works.
Would appreciate any piece of advice. Thank you in advance!
UPD: Exception was seen in tests run on MiniDFSCluster. During the tests I used the same MiniDFSCluster with different configurations.
scala hadoop hdfs
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
@SachinJanani oh, missed NOT in your comment. Should I just specifylocalhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.
– Valentina
Jan 3 at 12:40
add a comment |
I am using defaultFS
setting in configuration of HDFS. I create configuration and then set it explicitly.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
val config = new Configuration()
config.set("fs.defaultFS", "hdfs://localhost:8020")
val fs = FileSystem.get(new URI(filePath), config)
Code seems to work fine most of the time, but for filePath with triple slash I get an error only on a few machines:
Wrong FS: hdfs:/tmp/hdfstest, expected: hdfs://localhost:8020
One slash appears only in exception message.
Everywhere else in the system I see triple slash: hdfs:///tmp/hdfstest
.
Also for the paths like /tmp/hdfstest
without triple slash, defaultFS perfectly works.
Would appreciate any piece of advice. Thank you in advance!
UPD: Exception was seen in tests run on MiniDFSCluster. During the tests I used the same MiniDFSCluster with different configurations.
scala hadoop hdfs
I am using defaultFS
setting in configuration of HDFS. I create configuration and then set it explicitly.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
val config = new Configuration()
config.set("fs.defaultFS", "hdfs://localhost:8020")
val fs = FileSystem.get(new URI(filePath), config)
Code seems to work fine most of the time, but for filePath with triple slash I get an error only on a few machines:
Wrong FS: hdfs:/tmp/hdfstest, expected: hdfs://localhost:8020
One slash appears only in exception message.
Everywhere else in the system I see triple slash: hdfs:///tmp/hdfstest
.
Also for the paths like /tmp/hdfstest
without triple slash, defaultFS perfectly works.
Would appreciate any piece of advice. Thank you in advance!
UPD: Exception was seen in tests run on MiniDFSCluster. During the tests I used the same MiniDFSCluster with different configurations.
scala hadoop hdfs
scala hadoop hdfs
edited Jan 7 at 12:01
Valentina
asked Jan 3 at 9:12


ValentinaValentina
8817
8817
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
@SachinJanani oh, missed NOT in your comment. Should I just specifylocalhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.
– Valentina
Jan 3 at 12:40
add a comment |
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
@SachinJanani oh, missed NOT in your comment. Should I just specifylocalhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.
– Valentina
Jan 3 at 12:40
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
@SachinJanani oh, missed NOT in your comment. Should I just specify
localhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.– Valentina
Jan 3 at 12:40
@SachinJanani oh, missed NOT in your comment. Should I just specify
localhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.– Valentina
Jan 3 at 12:40
add a comment |
2 Answers
2
active
oldest
votes
Turned out, that it was not an HDFS issue, but MiniDFSCluster test problem.
In test suite, I was creating test cluster and then checking different defaultFS
scenarios on it.
MiniDFSCluster has some issues due to sharing of config and certain use cases can result in unexpected results and falsely failing or passing unit tests.
For more info, there is a ticket in Apache.
add a comment |
If you want to use the fs.defaultFS
, you should not specify any scheme or authority, so yours paths should look like /path/to/file
. Using a URI with a scheme, like hdfs://localhost:port/path/to/file
, will ignore the default FS. You shouldn't ever use the HDFS scheme without a host/port like hdfs:///
-- instead you should either rely on the default FS, or explicitly specify a host/port combination.
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019231%2fminidfscluster-hdfs-triple-slash-schema-extension-wrong-fs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Turned out, that it was not an HDFS issue, but MiniDFSCluster test problem.
In test suite, I was creating test cluster and then checking different defaultFS
scenarios on it.
MiniDFSCluster has some issues due to sharing of config and certain use cases can result in unexpected results and falsely failing or passing unit tests.
For more info, there is a ticket in Apache.
add a comment |
Turned out, that it was not an HDFS issue, but MiniDFSCluster test problem.
In test suite, I was creating test cluster and then checking different defaultFS
scenarios on it.
MiniDFSCluster has some issues due to sharing of config and certain use cases can result in unexpected results and falsely failing or passing unit tests.
For more info, there is a ticket in Apache.
add a comment |
Turned out, that it was not an HDFS issue, but MiniDFSCluster test problem.
In test suite, I was creating test cluster and then checking different defaultFS
scenarios on it.
MiniDFSCluster has some issues due to sharing of config and certain use cases can result in unexpected results and falsely failing or passing unit tests.
For more info, there is a ticket in Apache.
Turned out, that it was not an HDFS issue, but MiniDFSCluster test problem.
In test suite, I was creating test cluster and then checking different defaultFS
scenarios on it.
MiniDFSCluster has some issues due to sharing of config and certain use cases can result in unexpected results and falsely failing or passing unit tests.
For more info, there is a ticket in Apache.
answered Jan 7 at 12:07


ValentinaValentina
8817
8817
add a comment |
add a comment |
If you want to use the fs.defaultFS
, you should not specify any scheme or authority, so yours paths should look like /path/to/file
. Using a URI with a scheme, like hdfs://localhost:port/path/to/file
, will ignore the default FS. You shouldn't ever use the HDFS scheme without a host/port like hdfs:///
-- instead you should either rely on the default FS, or explicitly specify a host/port combination.
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
add a comment |
If you want to use the fs.defaultFS
, you should not specify any scheme or authority, so yours paths should look like /path/to/file
. Using a URI with a scheme, like hdfs://localhost:port/path/to/file
, will ignore the default FS. You shouldn't ever use the HDFS scheme without a host/port like hdfs:///
-- instead you should either rely on the default FS, or explicitly specify a host/port combination.
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
add a comment |
If you want to use the fs.defaultFS
, you should not specify any scheme or authority, so yours paths should look like /path/to/file
. Using a URI with a scheme, like hdfs://localhost:port/path/to/file
, will ignore the default FS. You shouldn't ever use the HDFS scheme without a host/port like hdfs:///
-- instead you should either rely on the default FS, or explicitly specify a host/port combination.
If you want to use the fs.defaultFS
, you should not specify any scheme or authority, so yours paths should look like /path/to/file
. Using a URI with a scheme, like hdfs://localhost:port/path/to/file
, will ignore the default FS. You shouldn't ever use the HDFS scheme without a host/port like hdfs:///
-- instead you should either rely on the default FS, or explicitly specify a host/port combination.
answered Jan 4 at 20:05
krogkrog
280213
280213
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
add a comment |
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
Do you mean that hdfs:/// shouldn't be used at all? Or in which cases? I am very confused as I saw triple slash used (and working) a lot in EMR, Spark and so on.
– Valentina
Jan 5 at 18:51
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54019231%2fminidfscluster-hdfs-triple-slash-schema-extension-wrong-fs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You should not use URI scheme if you are passing it in config
– Sachin Janani
Jan 3 at 12:21
@SachinJanani oh, missed NOT in your comment. Should I just specify
localhost:8020
? Didn't make difference for my test case so far I see. Also I am very confused, why did it work for cases without triple slash. And that's the way it is written in core-site.xml.– Valentina
Jan 3 at 12:40