What is the optimal way for reading the contents of a webpage into a string in Java? -


To get the full content of the HTML page at the URL given to me, there is the following Java code to be done in a more efficient way. Could? Any improvement is welcome.

  throws a static string obtained HTML (final string URL) IOException {if (url == null} url.length () == 0) {new illegalArgumentException ("URL is not empty or empty so may "); } Last HttpURLConnection conn = (HttpURLConnection) New URL (url) .openConnection (); Last BufferedReader buf = new BufferedReader (new InputStreamReader (conn.getInputStream ())); Last StringBinder Page = New StringBuilder (); Last String Lineand = System.JetProperty ("Line Separator"); String line; Try {while (true) {line = buf.readLine (); If (line == zero) {break; } Page.append (line) .append (lineEnd); }} Finally {buf.close (); } Return page To string (); }  

I can not help but I feel that the line is less than optimal in reading. I know that I'm probably doing masking due OpenConnection is due to the call, and I am fine with it.

Also in my function, the side effects of creating HTML strings are the right line terminator for the existing system. This is not a requirement.

I realize that Network Io will probably have to spend time reading HTML, but I still want to know that it is optimal.

> Note to one side: If stringbuilder was a constructor for an open InputStream , it would be amazing that it only InputStream and takes it to stringbinder .

As has been seen in other answers, cases of many different leads (HTTP attributes, Encoding, chatting etc.) that are responsible for any strong solution. So I think that in addition to a toy program you use the actual Java standard HTTP library:.

They provide several samples,:

  httpclient httpclient = new DefaultHttpClient (); HttpGet httpget = New HttpGet ("http://www.google.com/"); ResponseHandler & LT; String & gt; FeedbackHandler = new basicshandler (); String feedbackBody = httpclient.execute (httpget, responseHandler); // responseBody now includes the content of the page System.out.println (responseBody); . Httpclient.getConnectionManager () off ();  

Comments

Popular posts from this blog

c# - ListView onScroll event -

PHP - get image from byte array -

Linux Terminal Problem with Non-Canonical Terminal I/O app -