この解答を考える際、日本語のRSSフィードにも対応できるようにした。
Python は、内部文字列に unicord文字列をつかっており、外部に渡すときは、バイト文字列に変換しなくてはならない。
searcher = re.compile(args[0].decode('utf-8'), re.IGNORECASE)
ここでは、args[0]を unicord文字列 に変換(decode)して、正規表現オブジェクトにしている。
さらに、この文字列を外部に渡すときは、バイト文字列に変換しなくてはならない。
txt = unicodedata.normalize('NFKC', txt).encode('utf-8', 'ignore')
print(txt)
使い方
$ make
で、コンパイルしたら、newshound ができるので
$ ./newshound '香港|hong kong|韓国|korea'
あるいは、Python2 で直に動作させたい場合は、
$ export RSS_FEED=https://news.yahoo.co.jp/pickup/rss.xml $ /usr/bin/python rssgossip.py [-u|p|h] '香港|hong kong|韓国|korea'
newshound.c
/*
* newshound.c
*
* 参考
* https://www.mkamimura.com/2014/09/c-unistdfork-idpid.html
*/
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main(int argc, char *argv[])
{
char *feeds[] = {
"https://news.yahoo.co.jp/pickup/rss.xml",
"https://rss.msn.com/",
"https://news.google.com/rss?ie=UTF-8&oe=UTF-8&topic=h&hl=en-US&gl=US&ceid=US:en"
};
int times = 3;
char *phrase = argv[1];
int i;
const char *PYTHON = "/usr/bin/python";
const char *SCRIPT = "rssgossip.py";
for (i = 0; i < times; i++) {
char var[255];
sprintf(var , "RSS_FEED=%s", feeds[i]);
char *vars[] = {var, NULL};
pid_t pid = fork();
if (pid == -1) {
fprintf(stderr, "プロセスをフォークできません:%s\n", strerror(errno));
return (1);
}
if (!pid) {
if (execle(PYTHON, PYTHON, SCRIPT, phrase, NULL, vars) == -1) {
fprintf(stderr, "スクリプトを実行できません:%s\n", strerror(errno));
return (1);
}
}
}
return (0);
}
Makefile
CC=cc
CFLAGS=-g -Wall
SRC=newshound.c
OBJ=newshound.o
all: newshound
newshound: $(OBJ)
$(CC) $(CFLAGS) $(OBJ) -o newshound
newshound.o: newshound.c
$(CC) $(CFLAGS) -c newshound.c -o newshound.o
clear:
rm -rf newshound $(OBJ)
rssgossip.py
# Copyright (C) 2019 by Seiichi Nukayama
#
# This code works under Python2.
#
# ---------- original copyright --------------------------------------------
# Copyright (C) 2011 by D+D Griffiths
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
import urllib
import os
import re
import sys
import string
import unicodedata
import getopt
from xml.dom import minidom
def usage():
print("Usage:\npython rssgossip.py [-uph] <search-regexp>")
try:
opts, args = getopt.getopt(sys.argv[1:], "uph", ["urls", "prlist","help"])
except getopt.GetoptError, err:
print str(err)
usage()
sys.exit(2)
include_urls = False
prlist = False
for o, a in opts:
if o == "-u":
include_urls = True
elif o == "-p":
prlist = True
elif o in ("-h", "--help"):
usage()
sys.exit()
else:
assert False, "unhandled option"
# print(args[0])
searcher = re.compile(args[0].decode('utf-8'), re.IGNORECASE)
for url in string.split(os.environ['RSS_FEED']):
print("url: %s" % url)
feed = urllib.urlopen(url)
try:
dom = minidom.parse(feed)
forecasts = []
for node in dom.getElementsByTagName('title'):
txt = node.firstChild.wholeText
if prlist:
prtxt = txt.encode('utf-8', 'ignore')
print("> %s" % prtxt)
if searcher.search(txt):
txt = unicodedata.normalize('NFKC', txt).encode('utf-8', 'ignore')
print(txt)
if include_urls:
p = node.parentNode
link = p.getElementsByTagName('link')[0].firstChild.wholeText
print("\t%s" % link.encode('utf-8', 'ignore'))
except:
sys.exit(1)