web scraping to fill out (and retrieve) search forms? -

- August 15, 2012

I was wondering whether to "automate" the task of typing in entries for finding forms and removing the results from the results possible . For example, I have a list of journal articles for which I want to get a DOI (Digital Object Identifier); For this manually I will go to the journal article search page (for example), type in the authors / title / volume (etc), and then find the article from the list of returned results, and select and paste the DOI. In the reference list, I regularly use R and Python for data analysis (I was inspired by a post on RQRL), but not much information about the Web protocol Is it possible ... Is it possible (for example, to use something like Python's beautiful soup?). Are there any good references to doing something remotely from this task? I am interested in learning about web scrapping and web scrapping as much as to accomplish this special task normally ... Thank you for your time!

Beautiful soup is great for parsing webpages - half of what you want to do is. Pyrens, Pearl, and Ruby have a version of MacKenize, and it is from the other half:

Mechanize that you control a browser:

  # Follow a link browser. Follow_link (link_node) # Submit a form browser .select_form (name = "search") browser ["author"] = ["author # 1", "author # 2"] browser ["volume"] = "any" Search_response = br.submit ()

Firebug, Fastbug, is an excellent startup with a mechanical and beautiful soup, an extra tool used in this fast Ruby Scrapping Guide: / P>

Firebug can speed up your build of the xpath to save you some critical time.

Good luck!

Search This Blog

IDEA SSL

web scraping to fill out (and retrieve) search forms? -

Comments

Post a Comment

Popular posts from this blog

c# - ListView onScroll event -

PHP - get image from byte array -

Linux Terminal Problem with Non-Canonical Terminal I/O app -