Saturday, 26 November 2016

python API for assamese news papers

Today i will scrap news from some of the news agencies. You can download  news papers without visiting the news sites .

URLs
1.http://ganaadhikar.com/
2.http://www.dainikagradoot.com
3.http://www.assamtribune.com
4.http://dainikjanambhumi.co.in
5.http://www.assamiyakhabor.com
6.http://www.asomiyapratidin.in
7.http://www.assamtribune.com

Overall 7 Assamese news papers i m  going to use for experiment .

1.গনঅধিকাৰ :


import os
a=raw_input('Give me the date of paper_like 22112016:')
os.system('rm -vfr gana')
os.system('mkdir gana')
os.chdir('gana')
for i in range(1,13):
        os.system('wget http://ganaadhikar.com/%s/pages/page%s.gif'%(a,i))



2.দৈনিক অগ্ৰদোত:

import os
os.system('rm -vfr agradut')
os.system('mkdir agradut')
os.chdir('agradut')
for i in range(1,15):
        os.system('wget http://www.dainikagradoot.com/pages/page%d.pdf'%i)

3. দৈনিক অসম :

 import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr dainik_asom')
os.system('mkdir dainik_asom')
os.chdir('dainik_asom')
for i in range(1,16):
        os.system('wget http://www.assamtribune.com/DA/2016/%s/BigPage%d.jpg'%(a,i))

4.অসমীয়া খবৰ :

import os,bs4
os.system('rm -vfr khabar_gy')
os.system('mkdir khabar_gy')
os.chdir('khabar_gy')
for i in range(1,15):
        os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.png'%i)
        os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.jpg'%i)


5.প্ৰতিদিন :

import os , bs4,requests
os.system('rm -vfr *_pratidin*')
a1=requests.get('http://www.asomiyapratidin.in')
r=bs4.BeautifulSoup(a1.content)
os.system('clear')
a=raw_input('Print the date for which you want the news paper : Ex: 19-11-2016 ')
os.system('mkdir %s_pratidin'%a)
os.chdir('%s_pratidin'%a)
c=len(r.find_all("div",{"id":"page-thumbnails"})[0].find_all('a'))+1
for i in range(1,c):
        os.system('wget http://www.asomiyapratidin.in/np-images/medium/ap-%s-%d.jpg'%(a,i))
os.system('clear')
for i in range(1,19):
        try:
                os.system('convert %s_pratidin/ap-%s-$d.jpg -resize 50% ap-%s-$d.jpg'%(a,a,i,a,i))
        except:
                pass

6.Assamtribune:

 import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr tribune')
os.system('mkdir tribune')
os.chdir('tribune')
for i in range(1,17):
        os.system('wget http://www.assamtribune.com/at/2016/%s/BigPage%d.jpg'%(a,i))


Bellow is a screen shot .





Connect With Me: Facebook




No comments:

Post a Comment

Popular Posts