IMDb Crawler (PYTHON)

byteeee — Tue, 15 Nov 2016 10:30:57 +0800

So I already used scrapy to crawl all the IMDb ID (sth like tt1797700)
now I want to utilize the ID I have crawled to grab movie information from https://www.omdbapi.com/ and sort them into json file.
But when I ran my python scripts in command prompt, it simply wouldn't run.

no error has popped out in command prompt.It simply stayed still for over 1 hour. Is it because there is error in my script ?

Here is my code:

CODE

import requests
import json

def IMDb_query_url(id):
[B]query_url = 'http://www.omdbapi.com/?i='+id+'&plot=short&r=json' [/B]# id is sth like tt78900700 as every movies has its own ID
return query_url

def get_movie_ids(input_file): # this is to convert the text (the IMDb ID i have crawled from IMDb website) into a list called id_list
[B]id_list= []
with open (input_file, 'r') as f:
for line in f:
id_list.append(line.strip())
return id_list[/B]

def get_all_data(in_file, out_file): #this is to grab movie info from OMDBAPI.COM using id in id_list
movie_data_dict = {}
movie_ids = get_movie_ids(in_file)

id_counter = 0
session = requests.Session()

[B] for id in movie_ids: #this is to catch errors in corrupted JSON file
url = IMDb_query_url(id)
try:
movie_data = session.get(url).json()
except ValueError , e: #if the json file is corrupted,just ignore it and move on
pass[/B]

movie_data_dict[id_counter] = movie_data
id_counter += 1
with open(out_file, 'w+') as f:
json.dump(movie_data_dict, f)

if __name__ == '__main__':
# don't change any code below this line
movie_id_file = r'../IMDbIDCrawler/movie_ids06-15'
movie_data_file = 'IMDb2006-2015.json'
get_all_data(movie_id_file, movie_data_file)

the bold codes are written by me while the others were provided by teacher in the first place.

your help will be greatly appreciated!!

Lowyat.NET: Latest topics by byteeee

IMDb Crawler (PYTHON)