Fetch HTML part in java












0















I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question























  • Perhaps you can try Jsoup?

    – manfromnowhere
    Nov 20 '18 at 10:31











  • It builds dom from whole page. It quite fast but not enough

    – Vlad Doronin
    Nov 20 '18 at 10:41
















0















I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question























  • Perhaps you can try Jsoup?

    – manfromnowhere
    Nov 20 '18 at 10:31











  • It builds dom from whole page. It quite fast but not enough

    – Vlad Doronin
    Nov 20 '18 at 10:41














0












0








0








I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.










share|improve this question














I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream method and BufferedReader but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.







java html inputstreamreader






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 10:20









Vlad DoroninVlad Doronin

33




33













  • Perhaps you can try Jsoup?

    – manfromnowhere
    Nov 20 '18 at 10:31











  • It builds dom from whole page. It quite fast but not enough

    – Vlad Doronin
    Nov 20 '18 at 10:41



















  • Perhaps you can try Jsoup?

    – manfromnowhere
    Nov 20 '18 at 10:31











  • It builds dom from whole page. It quite fast but not enough

    – Vlad Doronin
    Nov 20 '18 at 10:41

















Perhaps you can try Jsoup?

– manfromnowhere
Nov 20 '18 at 10:31





Perhaps you can try Jsoup?

– manfromnowhere
Nov 20 '18 at 10:31













It builds dom from whole page. It quite fast but not enough

– Vlad Doronin
Nov 20 '18 at 10:41





It builds dom from whole page. It quite fast but not enough

– Vlad Doronin
Nov 20 '18 at 10:41












2 Answers
2






active

oldest

votes


















0














I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



Example using the library I mentioned:



byte pageInBytes = readAllBytesFromTheURL();
VTDGen vg = new VTDGen();
vg.setDoc(pageInBytes);
vg.parse(false);
VTDNav vn = vg.getNav();

AutoPilot ap = new AutoPilot(vn);

//Jump to the section that we want to process
ap.selectXPath("/html/body/div");
String fileId = vn.toString(vu.getElementFragment());





share|improve this answer


























  • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

    – Vlad Doronin
    Nov 20 '18 at 11:35











  • cool, can you accept my answer. I'm trolling for points on the stack overflow :)

    – piotr szybicki
    Nov 20 '18 at 12:50













  • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

    – Vlad Doronin
    Nov 20 '18 at 12:57











  • Can you share your solution when you are done. I'm curious to see what you come up with.

    – piotr szybicki
    Nov 20 '18 at 14:18











  • Posted my code in next answer

    – Vlad Doronin
    Nov 21 '18 at 12:06



















0














Wrote helper to read url content. Parser for elements in another class.



public class HTMLReaderHelper {

private final URL currentURL;

HTMLReaderHelper(URL url){
currentURL = url;
}

public CharIterator charIterator(){
CharIterator iterator;
try {
iterator = new CharIterator();
} catch(IOException ex){
return null;
}
return iterator;
}

public StringIterator stringIterator(){
return new StringIterator();
}

class CharIterator implements java.util.Iterator<Character>{

private InputStream urlStream;

private boolean isValid;

private Queue<Character> buffer;

private CharIterator() throws IOException {
urlStream = currentURL.openStream();
isValid = true;
buffer = new ArrayDeque<>();
}

@Override
public boolean hasNext() {
char c;
try {
c = (char)urlStream.read();
buffer.add(c);
} catch (IOException ex) {
markInvalid();
return false;
}
return c != (char) -1;
}

@Override
public Character next() {
if(!isValid){
return null;
}
char c;
try {
if(buffer.size() > 0){
return buffer.remove();
}
c = (char)urlStream.read();
} catch (IOException ex) {
markInvalid();
return null;
}
return (c != (char)-1) ? c : null;
}

private void markInvalid(){
isValid = false;
}
}

class StringIterator implements java.util.Iterator<String>{

private CharIterator charPointer;

private Queue<String> buffer;

private boolean isValid;

private StringIterator(){
charPointer = charIterator();
isValid = true;
buffer = new ArrayDeque<>();
}

@Override
public boolean hasNext() {
String value = next();
try {
buffer.add(value);
} catch (NullPointerException ex){
markInvalid();
return false;
}
return isValid;
}

@Override
public String next() {
if(buffer.size() > 0){
return buffer.remove();
}
if(!isValid){
return null;
}
StringBuilder sb = new StringBuilder();
Character currentChar = charPointer.next();
if(currentChar == null){
return null;
}
while (currentChar.equals('n') || currentChar.equals('r')){
currentChar = charPointer.next();
if(currentChar == null){
return null;
}
}
while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
sb.append(currentChar);
currentChar = charPointer.next();
}
return sb.toString();
}
private void markInvalid(){
isValid = false;
}
}
}





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390833%2ffetch-html-part-in-java%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer


























    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

      – Vlad Doronin
      Nov 20 '18 at 11:35











    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)

      – piotr szybicki
      Nov 20 '18 at 12:50













    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

      – Vlad Doronin
      Nov 20 '18 at 12:57











    • Can you share your solution when you are done. I'm curious to see what you come up with.

      – piotr szybicki
      Nov 20 '18 at 14:18











    • Posted my code in next answer

      – Vlad Doronin
      Nov 21 '18 at 12:06
















    0














    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer


























    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

      – Vlad Doronin
      Nov 20 '18 at 11:35











    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)

      – piotr szybicki
      Nov 20 '18 at 12:50













    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

      – Vlad Doronin
      Nov 20 '18 at 12:57











    • Can you share your solution when you are done. I'm curious to see what you come up with.

      – piotr szybicki
      Nov 20 '18 at 14:18











    • Posted my code in next answer

      – Vlad Doronin
      Nov 21 '18 at 12:06














    0












    0








    0







    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());





    share|improve this answer















    I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.



    Also be careful with the BufferedReader.readLine() as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.



    Example using the library I mentioned:



    byte pageInBytes = readAllBytesFromTheURL();
    VTDGen vg = new VTDGen();
    vg.setDoc(pageInBytes);
    vg.parse(false);
    VTDNav vn = vg.getNav();

    AutoPilot ap = new AutoPilot(vn);

    //Jump to the section that we want to process
    ap.selectXPath("/html/body/div");
    String fileId = vn.toString(vu.getElementFragment());






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 20 '18 at 11:22

























    answered Nov 20 '18 at 11:14









    piotr szybickipiotr szybicki

    609310




    609310













    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

      – Vlad Doronin
      Nov 20 '18 at 11:35











    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)

      – piotr szybicki
      Nov 20 '18 at 12:50













    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

      – Vlad Doronin
      Nov 20 '18 at 12:57











    • Can you share your solution when you are done. I'm curious to see what you come up with.

      – piotr szybicki
      Nov 20 '18 at 14:18











    • Posted my code in next answer

      – Vlad Doronin
      Nov 21 '18 at 12:06



















    • Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

      – Vlad Doronin
      Nov 20 '18 at 11:35











    • cool, can you accept my answer. I'm trolling for points on the stack overflow :)

      – piotr szybicki
      Nov 20 '18 at 12:50













    • Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

      – Vlad Doronin
      Nov 20 '18 at 12:57











    • Can you share your solution when you are done. I'm curious to see what you come up with.

      – piotr szybicki
      Nov 20 '18 at 14:18











    • Posted my code in next answer

      – Vlad Doronin
      Nov 21 '18 at 12:06

















    Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

    – Vlad Doronin
    Nov 20 '18 at 11:35





    Thanks a lot! By the way page is using Lightstreamer to fetch data from their servers, I tryed to use it directly, which obviously was not successfull

    – Vlad Doronin
    Nov 20 '18 at 11:35













    cool, can you accept my answer. I'm trolling for points on the stack overflow :)

    – piotr szybicki
    Nov 20 '18 at 12:50







    cool, can you accept my answer. I'm trolling for points on the stack overflow :)

    – piotr szybicki
    Nov 20 '18 at 12:50















    Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

    – Vlad Doronin
    Nov 20 '18 at 12:57





    Yeah, sure. But VTD didn't work for me. Page has some tokens, which VTD can not parse, so now i'm writing custom reader. But i tryed it on another XML file and it really fast.

    – Vlad Doronin
    Nov 20 '18 at 12:57













    Can you share your solution when you are done. I'm curious to see what you come up with.

    – piotr szybicki
    Nov 20 '18 at 14:18





    Can you share your solution when you are done. I'm curious to see what you come up with.

    – piotr szybicki
    Nov 20 '18 at 14:18













    Posted my code in next answer

    – Vlad Doronin
    Nov 21 '18 at 12:06





    Posted my code in next answer

    – Vlad Doronin
    Nov 21 '18 at 12:06













    0














    Wrote helper to read url content. Parser for elements in another class.



    public class HTMLReaderHelper {

    private final URL currentURL;

    HTMLReaderHelper(URL url){
    currentURL = url;
    }

    public CharIterator charIterator(){
    CharIterator iterator;
    try {
    iterator = new CharIterator();
    } catch(IOException ex){
    return null;
    }
    return iterator;
    }

    public StringIterator stringIterator(){
    return new StringIterator();
    }

    class CharIterator implements java.util.Iterator<Character>{

    private InputStream urlStream;

    private boolean isValid;

    private Queue<Character> buffer;

    private CharIterator() throws IOException {
    urlStream = currentURL.openStream();
    isValid = true;
    buffer = new ArrayDeque<>();
    }

    @Override
    public boolean hasNext() {
    char c;
    try {
    c = (char)urlStream.read();
    buffer.add(c);
    } catch (IOException ex) {
    markInvalid();
    return false;
    }
    return c != (char) -1;
    }

    @Override
    public Character next() {
    if(!isValid){
    return null;
    }
    char c;
    try {
    if(buffer.size() > 0){
    return buffer.remove();
    }
    c = (char)urlStream.read();
    } catch (IOException ex) {
    markInvalid();
    return null;
    }
    return (c != (char)-1) ? c : null;
    }

    private void markInvalid(){
    isValid = false;
    }
    }

    class StringIterator implements java.util.Iterator<String>{

    private CharIterator charPointer;

    private Queue<String> buffer;

    private boolean isValid;

    private StringIterator(){
    charPointer = charIterator();
    isValid = true;
    buffer = new ArrayDeque<>();
    }

    @Override
    public boolean hasNext() {
    String value = next();
    try {
    buffer.add(value);
    } catch (NullPointerException ex){
    markInvalid();
    return false;
    }
    return isValid;
    }

    @Override
    public String next() {
    if(buffer.size() > 0){
    return buffer.remove();
    }
    if(!isValid){
    return null;
    }
    StringBuilder sb = new StringBuilder();
    Character currentChar = charPointer.next();
    if(currentChar == null){
    return null;
    }
    while (currentChar.equals('n') || currentChar.equals('r')){
    currentChar = charPointer.next();
    if(currentChar == null){
    return null;
    }
    }
    while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
    sb.append(currentChar);
    currentChar = charPointer.next();
    }
    return sb.toString();
    }
    private void markInvalid(){
    isValid = false;
    }
    }
    }





    share|improve this answer




























      0














      Wrote helper to read url content. Parser for elements in another class.



      public class HTMLReaderHelper {

      private final URL currentURL;

      HTMLReaderHelper(URL url){
      currentURL = url;
      }

      public CharIterator charIterator(){
      CharIterator iterator;
      try {
      iterator = new CharIterator();
      } catch(IOException ex){
      return null;
      }
      return iterator;
      }

      public StringIterator stringIterator(){
      return new StringIterator();
      }

      class CharIterator implements java.util.Iterator<Character>{

      private InputStream urlStream;

      private boolean isValid;

      private Queue<Character> buffer;

      private CharIterator() throws IOException {
      urlStream = currentURL.openStream();
      isValid = true;
      buffer = new ArrayDeque<>();
      }

      @Override
      public boolean hasNext() {
      char c;
      try {
      c = (char)urlStream.read();
      buffer.add(c);
      } catch (IOException ex) {
      markInvalid();
      return false;
      }
      return c != (char) -1;
      }

      @Override
      public Character next() {
      if(!isValid){
      return null;
      }
      char c;
      try {
      if(buffer.size() > 0){
      return buffer.remove();
      }
      c = (char)urlStream.read();
      } catch (IOException ex) {
      markInvalid();
      return null;
      }
      return (c != (char)-1) ? c : null;
      }

      private void markInvalid(){
      isValid = false;
      }
      }

      class StringIterator implements java.util.Iterator<String>{

      private CharIterator charPointer;

      private Queue<String> buffer;

      private boolean isValid;

      private StringIterator(){
      charPointer = charIterator();
      isValid = true;
      buffer = new ArrayDeque<>();
      }

      @Override
      public boolean hasNext() {
      String value = next();
      try {
      buffer.add(value);
      } catch (NullPointerException ex){
      markInvalid();
      return false;
      }
      return isValid;
      }

      @Override
      public String next() {
      if(buffer.size() > 0){
      return buffer.remove();
      }
      if(!isValid){
      return null;
      }
      StringBuilder sb = new StringBuilder();
      Character currentChar = charPointer.next();
      if(currentChar == null){
      return null;
      }
      while (currentChar.equals('n') || currentChar.equals('r')){
      currentChar = charPointer.next();
      if(currentChar == null){
      return null;
      }
      }
      while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
      sb.append(currentChar);
      currentChar = charPointer.next();
      }
      return sb.toString();
      }
      private void markInvalid(){
      isValid = false;
      }
      }
      }





      share|improve this answer


























        0












        0








        0







        Wrote helper to read url content. Parser for elements in another class.



        public class HTMLReaderHelper {

        private final URL currentURL;

        HTMLReaderHelper(URL url){
        currentURL = url;
        }

        public CharIterator charIterator(){
        CharIterator iterator;
        try {
        iterator = new CharIterator();
        } catch(IOException ex){
        return null;
        }
        return iterator;
        }

        public StringIterator stringIterator(){
        return new StringIterator();
        }

        class CharIterator implements java.util.Iterator<Character>{

        private InputStream urlStream;

        private boolean isValid;

        private Queue<Character> buffer;

        private CharIterator() throws IOException {
        urlStream = currentURL.openStream();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        char c;
        try {
        c = (char)urlStream.read();
        buffer.add(c);
        } catch (IOException ex) {
        markInvalid();
        return false;
        }
        return c != (char) -1;
        }

        @Override
        public Character next() {
        if(!isValid){
        return null;
        }
        char c;
        try {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        c = (char)urlStream.read();
        } catch (IOException ex) {
        markInvalid();
        return null;
        }
        return (c != (char)-1) ? c : null;
        }

        private void markInvalid(){
        isValid = false;
        }
        }

        class StringIterator implements java.util.Iterator<String>{

        private CharIterator charPointer;

        private Queue<String> buffer;

        private boolean isValid;

        private StringIterator(){
        charPointer = charIterator();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        String value = next();
        try {
        buffer.add(value);
        } catch (NullPointerException ex){
        markInvalid();
        return false;
        }
        return isValid;
        }

        @Override
        public String next() {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        if(!isValid){
        return null;
        }
        StringBuilder sb = new StringBuilder();
        Character currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        while (currentChar.equals('n') || currentChar.equals('r')){
        currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        }
        while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
        sb.append(currentChar);
        currentChar = charPointer.next();
        }
        return sb.toString();
        }
        private void markInvalid(){
        isValid = false;
        }
        }
        }





        share|improve this answer













        Wrote helper to read url content. Parser for elements in another class.



        public class HTMLReaderHelper {

        private final URL currentURL;

        HTMLReaderHelper(URL url){
        currentURL = url;
        }

        public CharIterator charIterator(){
        CharIterator iterator;
        try {
        iterator = new CharIterator();
        } catch(IOException ex){
        return null;
        }
        return iterator;
        }

        public StringIterator stringIterator(){
        return new StringIterator();
        }

        class CharIterator implements java.util.Iterator<Character>{

        private InputStream urlStream;

        private boolean isValid;

        private Queue<Character> buffer;

        private CharIterator() throws IOException {
        urlStream = currentURL.openStream();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        char c;
        try {
        c = (char)urlStream.read();
        buffer.add(c);
        } catch (IOException ex) {
        markInvalid();
        return false;
        }
        return c != (char) -1;
        }

        @Override
        public Character next() {
        if(!isValid){
        return null;
        }
        char c;
        try {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        c = (char)urlStream.read();
        } catch (IOException ex) {
        markInvalid();
        return null;
        }
        return (c != (char)-1) ? c : null;
        }

        private void markInvalid(){
        isValid = false;
        }
        }

        class StringIterator implements java.util.Iterator<String>{

        private CharIterator charPointer;

        private Queue<String> buffer;

        private boolean isValid;

        private StringIterator(){
        charPointer = charIterator();
        isValid = true;
        buffer = new ArrayDeque<>();
        }

        @Override
        public boolean hasNext() {
        String value = next();
        try {
        buffer.add(value);
        } catch (NullPointerException ex){
        markInvalid();
        return false;
        }
        return isValid;
        }

        @Override
        public String next() {
        if(buffer.size() > 0){
        return buffer.remove();
        }
        if(!isValid){
        return null;
        }
        StringBuilder sb = new StringBuilder();
        Character currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        while (currentChar.equals('n') || currentChar.equals('r')){
        currentChar = charPointer.next();
        if(currentChar == null){
        return null;
        }
        }
        while (currentChar != Character.valueOf('n') && currentChar != Character.valueOf('r')){
        sb.append(currentChar);
        currentChar = charPointer.next();
        }
        return sb.toString();
        }
        private void markInvalid(){
        isValid = false;
        }
        }
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 21 '18 at 12:05









        Vlad DoroninVlad Doronin

        33




        33






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53390833%2ffetch-html-part-in-java%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            Npm cannot find a required file even through it is in the searched directory

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith