Main Statistics


Comparison with Other Datasets

The question length distribution of ScienceQA is flatter than other datasets and span more evenly across question lengths.


Context Distributions

66.11% of the questions in ScienceQA have either an image or text context, while 30.80% have both.


Word Clouds

The word clouds of questions show that ScienceQA covers a wide range of topics.


Choice Distributions

Most choices are short, containing up to five words. However, the distribution has a long tail where about 5% of the choices contain more than 15 words.


Grade Distributions

The majority of questions come from the middle level curriculum (i.e., from grade 3 to grade 8) while around 10% are taken from the high school curriculum (i.e., from grade 9 to grade 12).