A new benchmark pitting AI against previously unseen maths problems shows systems still fall short of top human expertise.