Create words' stream using scanner












1















There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question

























  • Which Java version?

    – shmosel
    Nov 19 '18 at 21:48











  • Did you mean to call s.next(pattern)?

    – shmosel
    Nov 19 '18 at 21:53











  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

    – PhaseRush
    Nov 19 '18 at 22:04











  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

    – a_chubenko
    Nov 19 '18 at 22:06
















1















There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question

























  • Which Java version?

    – shmosel
    Nov 19 '18 at 21:48











  • Did you mean to call s.next(pattern)?

    – shmosel
    Nov 19 '18 at 21:53











  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

    – PhaseRush
    Nov 19 '18 at 22:04











  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

    – a_chubenko
    Nov 19 '18 at 22:06














1












1








1








There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}









share|improve this question
















There is needed to return a stream of all words that have 3letters and more from a file. Is there better way then following, maybe using Stream.iterate:



private Stream<String> getWordsStream(String path){
Stream.Builder<String> wordsStream = Stream.builder();
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);
s.useDelimiter("([^a-zA-Z])");
Pattern pattern = Pattern.compile("([a-zA-Z]{3,})");
while ((s.hasNext())){
if(s.hasNext(pattern)){
wordsStream.add(s.next().toUpperCase());
}
else {
s.next();
}
}
s.close();
return wordsStream.build();
}






java loops java-stream builder word






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 15:49







a_chubenko

















asked Nov 19 '18 at 21:43









a_chubenkoa_chubenko

4615




4615













  • Which Java version?

    – shmosel
    Nov 19 '18 at 21:48











  • Did you mean to call s.next(pattern)?

    – shmosel
    Nov 19 '18 at 21:53











  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

    – PhaseRush
    Nov 19 '18 at 22:04











  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

    – a_chubenko
    Nov 19 '18 at 22:06



















  • Which Java version?

    – shmosel
    Nov 19 '18 at 21:48











  • Did you mean to call s.next(pattern)?

    – shmosel
    Nov 19 '18 at 21:53











  • Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

    – PhaseRush
    Nov 19 '18 at 22:04











  • Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

    – a_chubenko
    Nov 19 '18 at 22:06

















Which Java version?

– shmosel
Nov 19 '18 at 21:48





Which Java version?

– shmosel
Nov 19 '18 at 21:48













Did you mean to call s.next(pattern)?

– shmosel
Nov 19 '18 at 21:53





Did you mean to call s.next(pattern)?

– shmosel
Nov 19 '18 at 21:53













Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

– PhaseRush
Nov 19 '18 at 22:04





Maybe reading the entire stream as a string, then splitting it with a space (or whatever you're using), then checking each for their length.

– PhaseRush
Nov 19 '18 at 22:04













Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

– a_chubenko
Nov 19 '18 at 22:06





Java 9. I mean: is it possible to write this method more closely to stream style, without while loop at all

– a_chubenko
Nov 19 '18 at 22:06












3 Answers
3






active

oldest

votes


















2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

    – a_chubenko
    Nov 20 '18 at 12:05






  • 1





    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

    – Holger
    Nov 20 '18 at 12:07













  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

    – a_chubenko
    Nov 20 '18 at 12:10











  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

    – Lino
    Nov 20 '18 at 12:19





















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method took about 0.6s.

    – a_chubenko
    Nov 20 '18 at 12:06



















0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer





















  • 1





    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

    – Carlos Heuberger
    Nov 19 '18 at 22:17













  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

    – a_chubenko
    Nov 19 '18 at 22:29








  • 1





    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

    – Carlos Heuberger
    Nov 20 '18 at 1:05











  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

    – nullpointer
    Nov 20 '18 at 2:12











  • Fixed. Same lazy load approach. Pease a cake.

    – oleg.cherednik
    Nov 20 '18 at 5:22











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

    – a_chubenko
    Nov 20 '18 at 12:05






  • 1





    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

    – Holger
    Nov 20 '18 at 12:07













  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

    – a_chubenko
    Nov 20 '18 at 12:10











  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

    – Lino
    Nov 20 '18 at 12:19


















2














You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

    – a_chubenko
    Nov 20 '18 at 12:05






  • 1





    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

    – Holger
    Nov 20 '18 at 12:07













  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

    – a_chubenko
    Nov 20 '18 at 12:10











  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

    – Lino
    Nov 20 '18 at 12:19
















2












2








2







You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}





share|improve this answer















You can use Files.lines() and a Pattern:



private static final Pattern SPACES = Pattern.compile("[^a-zA-Z]+");

public static Stream<String> getWordStream(String path) throws IOException{
return Files.lines(Paths.get(path))
.flatMap(SPACES::splitAsStream)
.filter(word -> word.length() >= 3);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 12:43

























answered Nov 20 '18 at 7:23









LinoLino

7,88421936




7,88421936













  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

    – a_chubenko
    Nov 20 '18 at 12:05






  • 1





    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

    – Holger
    Nov 20 '18 at 12:07













  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

    – a_chubenko
    Nov 20 '18 at 12:10











  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

    – Lino
    Nov 20 '18 at 12:19





















  • There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

    – a_chubenko
    Nov 20 '18 at 12:05






  • 1





    @Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

    – Holger
    Nov 20 '18 at 12:07













  • Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

    – a_chubenko
    Nov 20 '18 at 12:10











  • @Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

    – Lino
    Nov 20 '18 at 12:19



















There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

– a_chubenko
Nov 20 '18 at 12:05





There is tested for a book with 105 K words. This method is the fastest, took 0.29s.

– a_chubenko
Nov 20 '18 at 12:05




1




1





@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

– Holger
Nov 20 '18 at 12:07







@Alex well, this method also skips the conversion to uppercase. Further, your original code was designed to process words consisting of ASCII letters only, whereas this code treats everything separated by a single space character as a word.

– Holger
Nov 20 '18 at 12:07















Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

– a_chubenko
Nov 20 '18 at 12:10





Pattern was changed to Pattern SPACES = Pattern.compile("([^a-zA-Z])");

– a_chubenko
Nov 20 '18 at 12:10













@Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

– Lino
Nov 20 '18 at 12:19







@Alex You should probably use Pattern.compile("[^a-zA-Z]+") (notice the + at the end). So you won't get "empty" words, e.g a text like: "I have 100 dollars" would produce an array: ["I", "have", "", "", "", "", "dollars"] with the current pattern

– Lino
Nov 20 '18 at 12:19















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method took about 0.6s.

    – a_chubenko
    Nov 20 '18 at 12:06
















5














The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer


























  • There is tested for a book with 105 K words. This method took about 0.6s.

    – a_chubenko
    Nov 20 '18 at 12:06














5












5








5







The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}





share|improve this answer















The worst part of your code is the following part



FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(path);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Scanner s = new Scanner(inputStream);


So when the file is absent, you will print the FileNotFoundException stack trace and proceed with a null input stream, leading to a NullPointerException. Instead of requiring the caller to deal with a spurious NullPointerException, you should declare the FileNotFoundException in the method signature. Otherwise, return an empty stream in the erroneous case.



But you don’t need to contruct a FileInputStream at all, as Scanner offers constructors accepting a File or Path. Combine this with the capability of returning a stream of matches (since Java 9) and you get:



private Stream<String> getWordsStream(String path) {
try {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
} catch(IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
return Stream.empty();
}
}


or preferably



private Stream<String> getWordsStream(String path) throws IOException {
Scanner s = new Scanner(Paths.get(path));
return s.findAll("([a-zA-Z]{3,})").map(mr -> mr.group().toUpperCase());
}


You don’t even need .useDelimiter("([^a-zA-Z])") here, as skipping all nonmatching stuff is the default behavior.



Closing the returned Stream will also close the Scanner.



So the caller should use it like this



try(Stream<String> s = getWordsStream("path/to/file")) {
s.forEach(System.out::println);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 8:19

























answered Nov 20 '18 at 8:03









HolgerHolger

163k23231438




163k23231438













  • There is tested for a book with 105 K words. This method took about 0.6s.

    – a_chubenko
    Nov 20 '18 at 12:06



















  • There is tested for a book with 105 K words. This method took about 0.6s.

    – a_chubenko
    Nov 20 '18 at 12:06

















There is tested for a book with 105 K words. This method took about 0.6s.

– a_chubenko
Nov 20 '18 at 12:06





There is tested for a book with 105 K words. This method took about 0.6s.

– a_chubenko
Nov 20 '18 at 12:06











0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer





















  • 1





    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

    – Carlos Heuberger
    Nov 19 '18 at 22:17













  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

    – a_chubenko
    Nov 19 '18 at 22:29








  • 1





    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

    – Carlos Heuberger
    Nov 20 '18 at 1:05











  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

    – nullpointer
    Nov 20 '18 at 2:12











  • Fixed. Same lazy load approach. Pease a cake.

    – oleg.cherednik
    Nov 20 '18 at 5:22
















0














Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer





















  • 1





    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

    – Carlos Heuberger
    Nov 19 '18 at 22:17













  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

    – a_chubenko
    Nov 19 '18 at 22:29








  • 1





    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

    – Carlos Heuberger
    Nov 20 '18 at 1:05











  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

    – nullpointer
    Nov 20 '18 at 2:12











  • Fixed. Same lazy load approach. Pease a cake.

    – oleg.cherednik
    Nov 20 '18 at 5:22














0












0








0







Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}





share|improve this answer















Thre're much easier approach: read lines from file to the Stream and filter it with required condition (e.g. length >= 3). Files.lines() has lazy loading, so it does not ready all words from the file at the beginning, it does it every time when next word is required



public static void main(String... args) throws IOException {
getWordsStream(Paths.get("d:/words.txt")).forEach(System.out::println);
}

public static Stream<String> getWordsStream(Path path) throws IOException {
final Scanner scan = new Scanner(path);

return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
while (scan.hasNext()) {
String word = scan.next();

// you can use RegExp if you have more complicated condition
if (word.length() < 3)
continue;

action.accept(word);
return true;
}

return false;
}
}, false).onClose(scan::close);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 8:34

























answered Nov 19 '18 at 22:04









oleg.cherednikoleg.cherednik

6,08221118




6,08221118








  • 1





    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

    – Carlos Heuberger
    Nov 19 '18 at 22:17













  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

    – a_chubenko
    Nov 19 '18 at 22:29








  • 1





    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

    – Carlos Heuberger
    Nov 20 '18 at 1:05











  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

    – nullpointer
    Nov 20 '18 at 2:12











  • Fixed. Same lazy load approach. Pease a cake.

    – oleg.cherednik
    Nov 20 '18 at 5:22














  • 1





    and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

    – Carlos Heuberger
    Nov 19 '18 at 22:17













  • It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

    – a_chubenko
    Nov 19 '18 at 22:29








  • 1





    well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

    – Carlos Heuberger
    Nov 20 '18 at 1:05











  • With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

    – nullpointer
    Nov 20 '18 at 2:12











  • Fixed. Same lazy load approach. Pease a cake.

    – oleg.cherednik
    Nov 20 '18 at 5:22








1




1





and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

– Carlos Heuberger
Nov 19 '18 at 22:17







and lines() will return a stream of words? would be a very bad name for such method - maybe Java 9, but for sure in Java 11 it is returning a stream of lines

– Carlos Heuberger
Nov 19 '18 at 22:17















It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

– a_chubenko
Nov 19 '18 at 22:29







It returns lines that have more then 3 letter symbols in any Java. I have an unprepared text. So needs a loop again for finding all matches in a line.

– a_chubenko
Nov 19 '18 at 22:29






1




1





well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

– Carlos Heuberger
Nov 20 '18 at 1:05





well, actually lines() returns all lines, not only the ones with more than 3 letter. It's getWordsStream() that returns the lines with 3 or more letters. But question is about words with 3 or more letters, not about lines

– Carlos Heuberger
Nov 20 '18 at 1:05













With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

– nullpointer
Nov 20 '18 at 2:12





With java-11, you need not to change the signature of that method and can use Files.lines(Path.of(path)) instead.

– nullpointer
Nov 20 '18 at 2:12













Fixed. Same lazy load approach. Pease a cake.

– oleg.cherednik
Nov 20 '18 at 5:22





Fixed. Same lazy load approach. Pease a cake.

– oleg.cherednik
Nov 20 '18 at 5:22


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383064%2fcreate-words-stream-using-scanner%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

How to fix TextFormField cause rebuild widget in Flutter