Learned Ruby and Hpricot


Hi ,

I started learning ruby .. I asked Thiyagarajan sir to give task on ruby .. He gave me task

Question is

Go to the webpage : http://www.chennai.stpi.in:8080/stpi/MemberLogin/MemberUnitAdminHome.jsp

there are many company address from A to Z

what i want is, write a programme in python or ruby to read the website and effectively put it in a text file like

companyname.txt

a…….
a…….

……….
……….
b…….
b……..
……..
etc upto
z…
z….

so companyname.txt will be having all software company name from a to z which is collected from the web address http://www.chennai.stpi.in:8080/stpi/MemberLogin/MemberUnitAdminHome.jsp

First I dint understand how to do . Then after reading about web scrapping tecniques , Then i started doing

my task 🙂 .. Then I completed my task .. When i gave it to Thiyagarajan sir , He told that it was very long , do it with

OOPS principle (i.e) with class .. Atlast I completed that too 🙂

First install ruby gems in your system .. and then you have to install hpricot .. do following steps to install hpricot

[for Ubuntu]

sudo apt-get install rubygems

sudo apt-get install build-essential

sudo apt-get install ruby-dev

sudo gem install hpricot

sudo apt-get install libopenssl-ruby

require 'rubygems'
require 'net/https'
require 'hpricot'
require 'open-uri'

class CompanyNames
   def companyname( args )
        a = File.open("companyname.txt","w+")
        a.puts("____________COMPANY NAMES____________")
        a.close
        d = args.each{|name|}

        for i in d
        no = "#{i}...."
        puts "fetching company name starting with '#{i}' and it will store it in 'companyname.txt'\n\n\ Loading ..... \n\n\n"
        a = File.open("companyname.txt","a+")

	doc = Hpricot(open("http://www.chennai.stpi.in:8080/stpi/MemberLogin/alphaSearchMemberUnits.jsp?uid=#{i}"))
	items = doc.search('//tr/td[@width = "46%"]/font').inner_text.split('Ltd')
	    items.each do |item|

		   a.puts( "#{no}.... #{item}")
	    end
	a.close
   end
 end

  def companyweb( args )
        a = File.open("companywebsite.txt","w+")
        a.puts("____________COMPANY WEBSITES____________")

        d = args.each{|name|}
        for i in d
        no = "#{i}..."
        puts "fetching website name starting with #{i} and it will store it in 'companywebsite.txt'\n\n\ Loading ..... "
        a = File.open("companywebsite.txt","a+")

	doc = Hpricot(open("http://www.chennai.stpi.in:8080/stpi/MemberLogin/alphaSearchMemberUnits.jsp?uid=#{i}"))

	items = doc.search('//tr/td/font/a')
	items.each do |item|
		   a.puts( "#{no}... #{item.attributes['href']}")
        end
       a.close
    end
  end
end
names = CompanyNames.new()
c =('a'..'z').to_a
i =('a'..'z').to_a

names.companyname(i)
names.companyweb(c)

Ouput will fetch name and website of companies and it will save in two files names companyname.txt and

companywebsite.txt respectively

Next I need to do is That to create a ruby script which will go into every company website and tell whether there is

career tag is available or not .. Now I am working on that .. I will finish it soon ..

Ya i got the output 🙂

Regards

sathia

Advertisements

About sathia

Web developer at cloudmint
This entry was posted in programs. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s