Main Statistics

main_statistics.png

Comparison with Other Datasets

The question length distribution of ScienceQA is flatter than other datasets and span more evenly across question lengths.

question.png

Context Distributions

66.11% of the questions in ScienceQA have either an image or text context, while 30.80% have both.

context.png

Word Clouds

The word clouds of questions show that ScienceQA covers a wide range of topics.

word_clouds.png

Choice Distributions

Most choices are short, containing up to five words. However, the distribution has a long tail where about 5% of the choices contain more than 15 words.

choice.png

Grade Distributions

The majority of questions come from the middle level curriculum (i.e., from grade 3 to grade 8) while around 10% are taken from the high school curriculum (i.e., from grade 9 to grade 12).

grades.png