<?xml version="1.0" encoding="utf-8"?>
<!-- generator="FeedCreator 1.7.2" -->
<rss version="2.0">
    <channel>
        <title>Lowyat.NET: Latest topics by byteeee</title>
        <description></description>
        <link>http://forum.lowyat.net/</link>
        <lastBuildDate>Sat, 27 Jun 2026 13:43:42 +0800</lastBuildDate>
        <generator>FeedCreator 1.7.2</generator>
        <item>
            <title>IMDb Crawler (PYTHON)</title>
            <link>http://forum.lowyat.net/topic/4113281</link>
            <description>So I already used scrapy to crawl all the IMDb ID (sth like tt1797700) &lt;br /&gt;now I want to utilize the ID I have crawled to grab movie information from &lt;a href='https://www.omdbapi.com/' target='_blank'&gt;https://www.omdbapi.com/&lt;/a&gt; and sort them into json file.&lt;br /&gt;But when I ran my python scripts in command prompt, it simply wouldn&amp;#39;t run. &lt;br /&gt;&lt;br /&gt;no error has popped out in command prompt.It simply stayed still for over 1 hour. Is it because there is error in my script ? &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here is my code:&lt;br /&gt;&lt;br /&gt;&lt;!--c1--&gt;&lt;div class='codetop'&gt;CODE&lt;/div&gt;&lt;div class='codemain'&gt;&lt;!--ec1--&gt;import requests&lt;br /&gt;import json&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def IMDb_query_url&amp;#40;id&amp;#41;&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp;&amp;#91;B&amp;#93;query_url = &amp;#39;http&amp;#58;//www.omdbapi.com/?i=&amp;#39;+id+&amp;#39;&amp;amp;plot=short&amp;amp;r=json&amp;#39; &amp;nbsp;&amp;#91;/B&amp;#93;# id is sth like tt78900700 as every movies has its own ID&lt;br /&gt; &amp;nbsp; &amp;nbsp;return query_url&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def get_movie_ids&amp;#40;input_file&amp;#41;&amp;#58; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;# this is to convert the text &amp;nbsp;&amp;#40;the IMDb ID i have crawled from IMDb website&amp;#41; into a list called id_list&lt;br /&gt; &amp;nbsp; &amp;nbsp;&amp;#91;B&amp;#93;id_list= &amp;#91;&amp;#93;&lt;br /&gt; &amp;nbsp; &amp;nbsp;with open &amp;#40;input_file, &amp;#39;r&amp;#39;&amp;#41; as f&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;for line in f&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;id_list.append&amp;#40;line.strip&amp;#40;&amp;#41;&amp;#41; &amp;nbsp; &lt;br /&gt; &amp;nbsp; &amp;nbsp;return id_list&amp;#91;/B&amp;#93;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def get_all_data&amp;#40;in_file, out_file&amp;#41;&amp;#58; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; #this is to grab movie info from OMDBAPI.COM using id in id_list&lt;br /&gt; &amp;nbsp; &amp;nbsp;movie_data_dict = {}&lt;br /&gt; &amp;nbsp; &amp;nbsp;movie_ids = get_movie_ids&amp;#40;in_file&amp;#41;&lt;br /&gt;&lt;br /&gt; &amp;nbsp; &amp;nbsp;id_counter = 0&lt;br /&gt; &amp;nbsp; &amp;nbsp;session = requests.Session&amp;#40;&amp;#41;&lt;br /&gt;&lt;br /&gt; &amp;nbsp; &amp;#91;B&amp;#93; for id in movie_ids&amp;#58; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; #this is to catch errors in corrupted JSON file &lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;url = IMDb_query_url&amp;#40;id&amp;#41;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;try&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;movie_data = session.get&amp;#40;url&amp;#41;.json&amp;#40;&amp;#41;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;except ValueError , e&amp;#58; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; #if the json file is corrupted,just ignore it and move on&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pass&amp;#91;/B&amp;#93;&lt;br /&gt;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;movie_data_dict&amp;#91;id_counter&amp;#93; = movie_data&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;id_counter += 1&lt;br /&gt; &amp;nbsp; &amp;nbsp;with open&amp;#40;out_file, &amp;#39;w+&amp;#39;&amp;#41; as f&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;json.dump&amp;#40;movie_data_dict, f&amp;#41;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;if __name__ == &amp;#39;__main__&amp;#39;&amp;#58;&lt;br /&gt; &amp;nbsp; &amp;nbsp;# don&amp;#39;t change any code below this line&lt;br /&gt; &amp;nbsp; &amp;nbsp;movie_id_file = r&amp;#39;../IMDbIDCrawler/movie_ids06-15&amp;#39;&lt;br /&gt; &amp;nbsp; &amp;nbsp;movie_data_file = &amp;#39;IMDb2006-2015.json&amp;#39;&lt;br /&gt; &amp;nbsp; &amp;nbsp;get_all_data&amp;#40;movie_id_file, movie_data_file&amp;#41;&lt;!--c2--&gt;&lt;/div&gt;&lt;!--ec2--&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;the bold codes are written by me while the others were provided by teacher in the first place.&lt;br /&gt;&lt;br /&gt;your help will be greatly appreciated&amp;#33;&amp;#33; &lt;br /&gt;</description>
            <author>byteeee</author>
            <category>Codemasters</category>
            <pubDate>Tue, 15 Nov 2016 10:30:57 +0800</pubDate>
        </item>
    </channel>
</rss>
