Training an ML model on the COCO Dataset


My current goal is to train an ML model on the COCO Dataset. Then be able to generate my own labeled training data to train on. So far, I have been using the maskrcnn-benchmark model by Facebook and training on COCO Dataset 2014.

Here my Jupyter Notebook to go with this blog.

Okay here’s an account of what steps I took.

Getting the data

The COCO dataset can be download here

I am only training on the 2014 dataset.

I’m working with this project:

https://github.com/facebookresearch/maskrcnn-benchmark#perform-training-on-coco-dataset

And, it must be linked to the correct directories in order to use it with the Github project:

https://github.com/facebookresearch/maskrcnn-benchmark#perform-training-on-coco-dataset

Training

Here are some training commands. These worked. Funny enough, the first command is for 720,000 iterations and it reported that it was going to take 3+ days to complete on my GTX 1080Ti. Also, I could only load 2 images at a time and it took up 10 out of 11GB of memory. This is a big model!

1-16-2019

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

1-17-2019

python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 10 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

Some Things that maskrcnn does really well

everything’s a hyper param

logging - for lots of feedback

single tqdm output for training

COCO Dataset format notes

Things that I learned about the COCO dataset that will be important in the future for training my own datasets with this format are:

Images

Image annotations have this format:

{'license': 3,
'file_name': 'COCO_val2014_000000391895.jpg',
'coco_url': 'http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg',
'height': 360,
'width': 640,
'date_captured': '2013-11-14 11:18:45',
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895}

Annotations

Annotations have this format:

{'segmentation': [[239.97,
260.24,
222.04,
270.49,
199.84,
253.41,
213.5,
227.79,
259.62,
200.46,
274.13,
202.17,
277.55,
210.71,
249.37,
253.41,
237.41,
264.51,
242.54,
261.95,
228.87,
271.34]],
'area': 2765.1486500000005,
'iscrowd': 0,
'image_id': 558840,
'bbox': [199.84, 200.46, 77.71, 70.88],
'category_id': 58,
'id': 156}

segmentation explained:

I was confused how the segmentation above was converted to a mask. The segmentation is a list of x,y points. In this format: [x1, y1, x2, y2, etc...] In this code block, the segmentation list of points is reshaped to [(x1,y1), (x2, y2), ...] and is then usable by matplotlib

for seg in ann['segmentation']:
    poly = np.array(seg).reshape((int(len(seg)/2), 2))

poly becomes an np.ndarray of shape (N, 2) where N is the number of segmentation points.

bbox explained:

bbox is of format [x1, y1, x2, y2]. The bounding box points start at the top left of the image as point (0,0). The x1,y1 offset is from the (0,0) starting point, where the y size goes down, since it’s starting from the top left. Then the x2,y2 values are offsets from the x1,y1 points.

Conclusion

I’ve now learned 2 datasets. Pascal and COCO. Now I know a little more why most projects doing image tasks support both.

What’s Next

Next I want to label my own data and train on it. The last section of the notebook is my attempt at this using RectLabel

I reviewed 3 different applications for labeling data:

  • Labelbox
  • RectLabel
  • Labelme

My criteria for evaluating is that it should be free and I should be able to run the program locally and label my own data as I please. User friendly is better obviously.

If I can’t find something, then maybe I have to create a simple app for labeling data. Definitely doable, but it’d be a detour.

Labelbox seems like it used to be open source, but they turned it into a SaaS, and I couldn’t get it to run.

RectLabel is $5 which isn’t bad, but it didn’t generate the segmentation data in the format that I need.

Labelme seems exactly what I am looking for. Open source. There isn’t a script for exporting to COCO Dataset 2014 format, so maybe this is an opportunity to contribute as well :)

So labeling my own data and training on it is the next step. Okay, until next time.

Random… Extra

Some random notes about things learned when doing this.

commands

count files in a dir - in order to check that the file count matches what was expected, or when the zip file didn’t fully download, or to check the image file count vs. expected.

ls -1 | wc -l

wget in background with no timeout, so I can start the job from my laptop, but process runs as a daemon on the DL machine

wget -bqc --timeout=0 url

fastjar

If a zip file didn’t fully download, fastjar can be used to unzip it.

Trying to unzip the file will give you:

unzip error “End-of-central-directory signature not found”

Use fastjar if the whole zip file didn’t download:

sudo apt-get install fastjar
jar xvf something.zip

StackExchange reference

nvidia-smi

equivalent to “tail nvidia-smi”

# keeps passed traces
nvidia-smi -l 1

# doesn't keep past traces
watch -n0.1 nvidia-smi

Get fresh articles in your inbox

If you liked this article, you might want to subscribe. If you don't like what you get, unsubscribe with one click.