A testbed to assess the physical reasoning skills of AI agents