Presenting new results from two projects using unsupervised deep neural networks to learn about image structure from static or moving visual input. Such networks spontaneously learn to represent underlying scene factors, and can predict human gloss perception and misperception on an image-by-image basis.