The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive hand labeled datasets are available. As generating these datasets is time-consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can generalize well to the real world. We present a method for automated dataset generation and rapidly generate a training dataset of 50k depth images and 320k object masks synthetically using simulated scenes of 3D CAD models. We train a variant of Mask R-CNN on the generated dataset to perform category-agnostic instance segmentation without hand-labeled data. We evaluate the trained network, which we refer to as Synthetic Depth (SD) Mask R-CNN, on a set of real, high-resolution images of challenging, densely cluttered bins containing objects with highly-varied geometry. SD Mask R-CNN outperforms point cloud clustering baselines by an absolute 15% in Average Precision and 20% in Average Recall, and achieves performance levels similar to a Mask RCNN trained on a massive, hand-labeled RGB dataset and fine-tuned on real images from the experimental setup. The network also generalizes well to a lower-resolution depth sensor. We deploy the model in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application.