Today i will scrap news from some of the news agencies. You can download news papers without visiting the news sites .
URLs
1.http://ganaadhikar.com/
2.http://www.dainikagradoot.com
3.http://www.assamtribune.com
4.http://dainikjanambhumi.co.in
5.http://www.assamiyakhabor.com
6.http://www.asomiyapratidin.in
7.http://www.assamtribune.com
Overall 7 Assamese news papers i m going to use for experiment .
1.গনঅধিকাৰ :
import os
a=raw_input('Give me the date of paper_like 22112016:')
os.system('rm -vfr gana')
os.system('mkdir gana')
os.chdir('gana')
for i in range(1,13):
os.system('wget http://ganaadhikar.com/%s/pages/page%s.gif'%(a,i))
2.দৈনিক অগ্ৰদোত:
import os
os.system('rm -vfr agradut')
os.system('mkdir agradut')
os.chdir('agradut')
for i in range(1,15):
os.system('wget http://www.dainikagradoot.com/pages/page%d.pdf'%i)
3. দৈনিক অসম :
import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr dainik_asom')
os.system('mkdir dainik_asom')
os.chdir('dainik_asom')
for i in range(1,16):
os.system('wget http://www.assamtribune.com/DA/2016/%s/BigPage%d.jpg'%(a,i))
4.অসমীয়া খবৰ :
import os,bs4
os.system('rm -vfr khabar_gy')
os.system('mkdir khabar_gy')
os.chdir('khabar_gy')
for i in range(1,15):
os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.png'%i)
os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.jpg'%i)
5.প্ৰতিদিন :
import os , bs4,requests
os.system('rm -vfr *_pratidin*')
a1=requests.get('http://www.asomiyapratidin.in')
r=bs4.BeautifulSoup(a1.content)
os.system('clear')
a=raw_input('Print the date for which you want the news paper : Ex: 19-11-2016 ')
os.system('mkdir %s_pratidin'%a)
os.chdir('%s_pratidin'%a)
c=len(r.find_all("div",{"id":"page-thumbnails"})[0].find_all('a'))+1
for i in range(1,c):
os.system('wget http://www.asomiyapratidin.in/np-images/medium/ap-%s-%d.jpg'%(a,i))
os.system('clear')
for i in range(1,19):
try:
os.system('convert %s_pratidin/ap-%s-$d.jpg -resize 50% ap-%s-$d.jpg'%(a,a,i,a,i))
except:
pass
6.Assamtribune:
import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr tribune')
os.system('mkdir tribune')
os.chdir('tribune')
for i in range(1,17):
os.system('wget http://www.assamtribune.com/at/2016/%s/BigPage%d.jpg'%(a,i))
Bellow is a screen shot .
Connect With Me: Facebook
URLs
1.http://ganaadhikar.com/
2.http://www.dainikagradoot.com
3.http://www.assamtribune.com
4.http://dainikjanambhumi.co.in
5.http://www.assamiyakhabor.com
6.http://www.asomiyapratidin.in
7.http://www.assamtribune.com
Overall 7 Assamese news papers i m going to use for experiment .
1.গনঅধিকাৰ :
import os
a=raw_input('Give me the date of paper_like 22112016:')
os.system('rm -vfr gana')
os.system('mkdir gana')
os.chdir('gana')
for i in range(1,13):
os.system('wget http://ganaadhikar.com/%s/pages/page%s.gif'%(a,i))
2.দৈনিক অগ্ৰদোত:
import os
os.system('rm -vfr agradut')
os.system('mkdir agradut')
os.chdir('agradut')
for i in range(1,15):
os.system('wget http://www.dainikagradoot.com/pages/page%d.pdf'%i)
3. দৈনিক অসম :
import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr dainik_asom')
os.system('mkdir dainik_asom')
os.chdir('dainik_asom')
for i in range(1,16):
os.system('wget http://www.assamtribune.com/DA/2016/%s/BigPage%d.jpg'%(a,i))
4.অসমীয়া খবৰ :
import os,bs4
os.system('rm -vfr khabar_gy')
os.system('mkdir khabar_gy')
os.chdir('khabar_gy')
for i in range(1,15):
os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.png'%i)
os.system(' wget http://www.assamiyakhabor.com/publishfinal/asset/guwahati/current/pages/ghy%d.jpg'%i)
5.প্ৰতিদিন :
import os , bs4,requests
os.system('rm -vfr *_pratidin*')
a1=requests.get('http://www.asomiyapratidin.in')
r=bs4.BeautifulSoup(a1.content)
os.system('clear')
a=raw_input('Print the date for which you want the news paper : Ex: 19-11-2016 ')
os.system('mkdir %s_pratidin'%a)
os.chdir('%s_pratidin'%a)
c=len(r.find_all("div",{"id":"page-thumbnails"})[0].find_all('a'))+1
for i in range(1,c):
os.system('wget http://www.asomiyapratidin.in/np-images/medium/ap-%s-%d.jpg'%(a,i))
os.system('clear')
for i in range(1,19):
try:
os.system('convert %s_pratidin/ap-%s-$d.jpg -resize 50% ap-%s-$d.jpg'%(a,a,i,a,i))
except:
pass
6.Assamtribune:
import os
a=raw_input('Date formate nov2216 :')
os.system('rm -vfr tribune')
os.system('mkdir tribune')
os.chdir('tribune')
for i in range(1,17):
os.system('wget http://www.assamtribune.com/at/2016/%s/BigPage%d.jpg'%(a,i))
Bellow is a screen shot .
Connect With Me: Facebook
No comments:
Post a Comment