動かざることバグの如し

近づきたいよ君の理想に

VoTTでラベル付けしたJSONをAWS SageMaker用に変換する

aws python

VoTTとはMSが作った画像系機械学習に使うラベリングをGUIで行えるツール

f:id:thr3a:20180903212547j:plain

これのおかげで~~くっそダルくて発狂しそうな~~ラベル付作業が少しだけ楽になる。神ツール

が、当然マイクロソフト謹製の機械学習用アプリケーション（CNTK）ように作られているので、ライバルのAWSのSagemakerの仕様なんか知ったこっちゃない。だけどどうしてもsagemakerで使いたかったのでCNTK用のJSONをSageMaker用に変換するスクリプトを作った

環境

Python 3.x

コード

import json

vott_json_path = "images.json"
images_path = ""
class_list = {'cat': 0, 'dog': 1, 'bird': 2}

images = []
import os
# 画像ファイル名の配列作成
for item in os.listdir(images_path):
    base, ext = os.path.splitext(item)
    if ext.lower() == '.jpg':
        images.append(item)

# 1つのJSONを1つずつに分割していく
with open(vott_json_path) as f:
    obj = json.load(f)
for index, v in obj['frames'].items():
    item = {}
    item['file'] = images[int(index)]
    item['image_size'] = [{
        'width':int(v[0]['width']),
        'height':int(v[0]['height']),
        'depth':3
    }]
    item['annotations'] = []
    for annotation in v:
        item['annotations'].append(
            {
                'class_id': class_list[annotation['tags'][0]],
                'top': int(annotation['y1']),
                'left': int(annotation['x1']),
                'width': int(annotation['x2'])-int(annotation['x1']),
                'height': int(annotation['y2']-int(annotation['y1']))
            }
        )
    item['categories'] = []
    for name, class_id in class_list.items():
        item['categories'].append(
            {
                'class_id':class_id,
                'name':name
            }
        )
    print(item)
    # 書き込み
    os.makedirs(os.path.join("jsons"), exist_ok=True)
    basename = os.path.splitext(images[int(index)])[0]
    with open("jsons/{}.json".format(basename), mode='w') as f:
        json.dump(item, f)

必ず変更が必要なのは以下

vott_json_path VoTTが出力のJSONパスを指定　相対パス可
images_path 画像があるディレクトリを指定　注意しなきゃいけないのはVoTTのJSONにはファイル名の記述はなく、あくまでファイルのインデックスしか持ってないのでファイル名がズレたりファイルの増減があると死ぬ
class_list 学習に分けたリストを指定　~~JSONから読んでパースできないこともないけど面倒だった~~

あとは実行すれば同一ディレクトリにjsonsができて画像の数だけJSONファイルが生成されているはず！