News APIへの登録とPythonでニュースを取得するまでの手順

公開日：2019/05/30 更新日：2019/05/30

Image Credit : News API

はじめに

News APIは、いくつかのニュースサイトからトップニュースや条件指定して合致したニュースをJSON形式で取得できるAPIです。例えばBBC NEWSやTechCrunchなどに対応しています。2019年5月時点では30,000ほどのニュースサイト等のソースに対応しています。このNews APIはPythonやRuby、Node.jsなどでクライアントも用意されており、これらクライアントを使用すれば簡単にニュース情報を取得することができます。この記事では、実際にNews APIを使用してニュースを取得するまでの手順をまとめます。

できるようになること

以下のようなJSON形式のデータをPythonから取得します。以下はTechCrunchのヘッドニュースをリクエストした場合に返ってくるJSONデータです。

{
    'source': {
        'id': 'techcrunch',
        'name': 'TechCrunch'
    }, 
    'author': 'Jonathan Shieber',
    'title': 'Pitching accuracy rates of over 99% for multiple cancer screens, Thrive launches with $110 million',
    'description': 'For more than 25 years the founders of Thrive Earlier Detection have been researching ways to improve the accuracy of liquid biopsy tests. The fruits of that labor from Dr. Bert Vogelstein, Dr. Kenneth Kinzler and Dr. Nickolas Papadopoulos — all professors an…',
    'url': 'https: //techcrunch.com/2019/05/30/pitching-accuracy-rates-of-over-99-for-multiple-cancer-screens-thrive-launches-with-110-million/', 
    'urlToImage': 'https://techcrunch.com/wp-content/uploads/2016/09/8696879946_c47d0f98fa_o.jpg?w=601', 
    'publishedAt': '2019-05-30T14:37:22Z', 
    'content': 'For more than 25 years the founders of Thrive Earlier Detection have been researching ways to improve the accuracy of liquid biopsy tests.\r\nThe fruits of that labor from Dr. Bert Vogelstein, Dr. Kenneth Kinzler and Dr. Nickolas Papadopoulos — all professors a… [+4268 chars]'
}

また、国、カテゴリ、ニュースサイト、キーワード等の条件を指定して合致したニュースを取得することができます。

News APIについて

News APIは、30,000ほどのソースから横断的にニュースを取得するためのAPIです。APIの使用回数に制限はあるものの、オープンソースや個人利用の目的ならば無料で使用することができます。商用利用したい場合は有料プランに申し込む必要があります。以下が公式サイトへのリンクになります。

newsapi.org

News API

Search worldwide news with code

各プランの内容については以下に記載されています。

newsapi.org

Pricing

Free for development, options if you're commercial

前提と環境

以下の通りです。

OS : Ubuntu18.04
Python 3.7

News APIのAPIキーを取得する

News APIを使用するには、まずNews APIにユーザー登録する必要があります。登録ページはこちらRegister for API keyです。Developerアカウントの場合は登録に必要な情報はメールアドレスのみです。登録ページにアクセスすると以下のようなページが表示されます。

登録ページの各項目の内容は以下です。

First name	お名前。ユーザ名
Email adress	メールアドレス
Choose a password	パスワード
You are...	個人利用か商用利用の選択。個人利用の場合は`I am an individual`を選択します。
I agree to the terms	利用規約への同意チェックボックス。利用規約を読んだ上でチェックを入れます。
I promise to add an attribution link on my website or app to NewsAPI.org	アプリもしくはWebサイトにNewsAPIへのリンクを貼ることを約束してください　という意味です。例えば、"Powered by News API"のような表記します。良ければチェックを入れます。

各項目に入力、選択して「Submit」ボタンをクリックすると、以下のように「Your API key is:」の下にAPIキーが表示されますのでこれを使用します。

上記で「My account」をクリックすると、自身のアカウント情報に加えてAPI使用回数などの状態やNews APIのステータスを確認できます。

以上でNews APIのAPIキーの取得は完了です。以降で実際にPythonからNews APIをつかってニュース情報を取得してみます。

News APIとPythonを使ってニュースを取得する

News APIのWebサイトにドキュメントが用意されており、そこで非公式ではあるもののNews API用のクライアントのためのライブラリが紹介されています。この記事ではこのライブラリを使用します。まず以下でインストールします。

$ pip install newsapi-python

試しに以下のようなサンプルコードでTechCrunchのトップニュースを取得してみます。以下ではトップニュースの１件のみ出力するようにしています。以下のコード内のYOUR_API_KEYを各自のAPIキーに置き換えてください。

sample.py

from newsapi import NewsApiClient
# クライアントを初期化
newsapi = NewsApiClient(api_key='YOUR_API_KEY')

# sourcesで指定したニュースサイトからトップニュースを取得
headlines = newsapi.get_top_headlines(sources='techcrunch')

# 取得したトップニュースの１件を表示
print(headlines['articles'][0])

上記を実行すると以下のような結果が得られます。

{
    'source': {
        'id': 'techcrunch',
        'name': 'TechCrunch'
    }, 
    'author': 'Jonathan Shieber',
    'title': 'Pitching accuracy rates of over 99% for multiple cancer screens, Thrive launches with $110 million',
    'description': 'For more than 25 years the founders of Thrive Earlier Detection have been researching ways to improve the accuracy of liquid biopsy tests. The fruits of that labor from Dr. Bert Vogelstein, Dr. Kenneth Kinzler and Dr. Nickolas Papadopoulos — all professors an…',
    'url': 'https: //techcrunch.com/2019/05/30/pitching-accuracy-rates-of-over-99-for-multiple-cancer-screens-thrive-launches-with-110-million/', 
    'urlToImage': 'https://techcrunch.com/wp-content/uploads/2016/09/8696879946_c47d0f98fa_o.jpg?w=601', 
    'publishedAt': '2019-05-30T14:37:22Z', 
    'content': 'For more than 25 years the founders of Thrive Earlier Detection have been researching ways to improve the accuracy of liquid biopsy tests.\r\nThe fruits of that labor from Dr. Bert Vogelstein, Dr. Kenneth Kinzler and Dr. Nickolas Papadopoulos — all professors a… [+4268 chars]'
}

News APIで取得できるJSONのフォーマットは、以下のようになっています。

{
  "status": "ok",
  "totalResults": 5960, // 件数
  -"articles": [各ニュース情報の配列]
}

条件を指定してトップニュースを取得する

以下のコードのように、条件を複数指定して合致したニュースを取得することもできます。

from newsapi import NewsApiClient

# クライアントを初期化
newsapi = NewsApiClient(api_key='YOUR_API_KEY')

# categoryをbusiness、国をjpに指定してニュースを取得
headlines = newsapi.get_top_headlines(category='business', country='jp')

if( headlines['totalResults'] > 0 ):
    print(headlines['articles'][0])
else:
    print("条件に合致したトップニュースはありません。")

categoryとして指定できるのは、business、entertainment、general、health、science、sports、technologyです。

なお、sourcesを指定した場合、categoryまたはcountryと組み合わせて使用することができません。その他のパラメータや詳細は以下に載っています。

newsapi.org

Top headlines /v2/top-headlines

This endpoint provides live top and breaking headlines for a country, specific category in a country, single source, or multiple sources. You can also search with keywords. Articles are sorted by the earliest date published first.

過去のニュースを取得する

get_everythingを使うことでトップニュース以外にも過去のニュースを取得することができます。以下では、キーワードとしてgoogle、ソースをtechcrunchに指定して合致するニュースを全て取得しています。

from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key='YOUR_API_KEY')
all_articles = newsapi.get_everything(q='google', sources='techcrunch')

if( all_articles['totalResults'] > 0 ):
    print("ニュース件数： {}".format(all_articles['totalResults']))
    print(all_articles['articles'][0])
else:
    print("条件に合致したニュースはありません。")

実行結果は以下です。

ニュース件数： 383
{
    'source': {
        'id': 'techcrunch',
        'name': 'TechCrunch'
    },
    'author': 'Frederic Lardinois',
    'title': 'Google makes travel planning easier',
    'description': 'Google today announced a major revamp of its travel planning tools on the web. After launching a similar set of tools on mobile last year, the company today announced that google.com/travel on the web will now let you see information about all of your previou…',
    'url': 'https://techcrunch.com/2019/05/14/google-makes-travel-planning-easier/',
    'urlToImage': 'https://techcrunch.com/wp-content/uploads/2019/05/gml-pixelbook7_MWpkfLz.gif?w=711',
    'publishedAt': '2019-05-14T17:18:52Z',
    'content': 'Google today announced a major revamp of its travel planning tools on the web. After launching a similar set of tools on mobile last year, the company today announced that google.com/travel on the web will now let you see information about all of your previou… [+1822 chars]'
}

期間を指定して過去のニュースを取得する

以下のようにfrom_paramとtoを使うことで特定の期間のニュースを取得することができます。

from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key='YOUR_API_KEY')
all_articles = newsapi.get_everything(q='google', sources='techcrunch', from_param="2019-05-20", to="2019-05-31")

if( all_articles['totalResults'] > 0 ):
    print("ニュース件数： {}".format(all_articles['totalResults']))
    print(all_articles['articles'][0])
else:
    print("条件に合致したニュースはありません。")

なお、公式ドキュメントでは、fromというパラメータになっていましたが、こちらでは動作しませんでした。GitHubで公開されているソースを見ると、from_paramが使えるようになっており、これを使うことで動作しました。

並べ替えをして過去のニュースを取得する

以下のようにsort_byパラメータを与えることで並び替えできます。以下はpopularity（人気順）で並べ替えています。デフォルトではニュースの公開日の降順並べ替えされています。

from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key='YOUR_API_KEY')
all_articles = newsapi.get_everything(q='google', sources='techcrunch', from_param="2019-05-20", to="2019-05-31", sort_by='popularity')

（...以降省略...）

その他のパラメータや詳細は以下に載っています。

newsapi.org

Everything /v2/everything

Search through millions of articles from over 30,000 large and small news sources and blogs. This includes breaking news as well as lesser articles.