Training an ML model on the COCO Dataset
My current goal is to train an ML model on the COCO Dataset. Then be able to generate my own labeled training data to train on. So far, I have been using the maskrcnn-benchmark model by Facebook and training on COCO Dataset 2014.
Here my Jupyter Notebook to go with this blog.
Okay here’s an account of what steps I took.
Getting the data
The COCO dataset can be download here
I am only training on the 2014 dataset.
I’m working with this project:
https://github.com/facebookresearch/maskrcnn-benchmark#perform-training-on-coco-dataset
And, it must be linked to the correct directories in order to use it with the Github project:
https://github.com/facebookresearch/maskrcnn-benchmark#perform-training-on-coco-dataset
Training
Here are some training commands. These worked. Funny enough, the first command is for 720,000
iterations and it reported that it was going to take 3+ days to complete on my GTX 1080Ti. Also, I could only load 2 images at a time and it took up 10 out of 11GB of memory. This is a big model!
1-16-2019
python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
1-17-2019
python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_101_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 10 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
Some Things that maskrcnn does really well
everything’s a hyper param
logging - for lots of feedback
single tqdm
output for training
COCO Dataset format notes
Things that I learned about the COCO dataset that will be important in the future for training my own datasets with this format are:
Images
Image annotations have this format:
{'license': 3,
'file_name': 'COCO_val2014_000000391895.jpg',
'coco_url': 'http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg',
'height': 360,
'width': 640,
'date_captured': '2013-11-14 11:18:45',
'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg',
'id': 391895}
Annotations
Annotations have this format:
{'segmentation': [[239.97,
260.24,
222.04,
270.49,
199.84,
253.41,
213.5,
227.79,
259.62,
200.46,
274.13,
202.17,
277.55,
210.71,
249.37,
253.41,
237.41,
264.51,
242.54,
261.95,
228.87,
271.34]],
'area': 2765.1486500000005,
'iscrowd': 0,
'image_id': 558840,
'bbox': [199.84, 200.46, 77.71, 70.88],
'category_id': 58,
'id': 156}
segmentation explained:
I was confused how the segmentation
above was converted to a mask. The segmentation
is a list of x,y
points. In this format: [x1, y1, x2, y2, etc...]
In this code block, the segmentation
list of points is reshaped to [(x1,y1), (x2, y2), ...]
and is then usable by matplotlib
for seg in ann['segmentation']:
poly = np.array(seg).reshape((int(len(seg)/2), 2))
poly
becomes an np.ndarray
of shape (N, 2)
where N
is the number of segmentation points.
bbox explained:
bbox
is of format [x1, y1, x2, y2]
. The bounding box points start at the top left of the image as point (0,0)
. The x1,y1
offset is from the (0,0)
starting point, where the y
size goes down, since it’s starting from the top left. Then the x2,y2
values are offsets from the x1,y1
points.
Conclusion
I’ve now learned 2 datasets. Pascal and COCO. Now I know a little more why most projects doing image tasks support both.
What’s Next
Next I want to label my own data and train on it. The last section of the notebook is my attempt at this using RectLabel
I reviewed 3 different applications for labeling data:
- Labelbox
- RectLabel
- Labelme
My criteria for evaluating is that it should be free and I should be able to run the program locally and label my own data as I please. User friendly is better obviously.
If I can’t find something, then maybe I have to create a simple app for labeling data. Definitely doable, but it’d be a detour.
Labelbox seems like it used to be open source, but they turned it into a SaaS, and I couldn’t get it to run.
RectLabel is $5 which isn’t bad, but it didn’t generate the segmentation data in the format that I need.
Labelme seems exactly what I am looking for. Open source. There isn’t a script for exporting to COCO Dataset 2014 format, so maybe this is an opportunity to contribute as well :)
So labeling my own data and training on it is the next step. Okay, until next time.
Random… Extra
Some random notes about things learned when doing this.
commands
count files in a dir - in order to check that the file count matches what was expected, or when the zip
file didn’t fully download, or to check the image file count vs. expected.
ls -1 | wc -l
wget
in background with no timeout, so I can start the job from my laptop, but process runs as a daemon on the DL machine
wget -bqc --timeout=0 url
fastjar
If a zip
file didn’t fully download, fastjar
can be used to unzip it.
Trying to unzip the file will give you:
unzip error “End-of-central-directory signature not found”
Use fastjar
if the whole zip file didn’t download:
sudo apt-get install fastjar
jar xvf something.zip
nvidia-smi
equivalent to “tail nvidia-smi”
# keeps passed traces
nvidia-smi -l 1
# doesn't keep past traces
watch -n0.1 nvidia-smi