On the Math Kangaroo Finland: Homogeneous subgroups, misconceptions and answering choices

Abstract The Math Kangaroo is one of the biggest international mathematics competitions. In Finland, the number of participants has been around 15,000 in recent years. In this research, we considered Finnish fourth to ninth graders’ performance in the Math Kangaroo Finland 2011, 2015 and 2019. We investigated the following problems: What kind of homogeneous subgroups contestants form based on their answer choices? Can misconceptions or answering choice strategies explain answer choices that were more popular than the correct answer? The subgroups obtained by LCA were described by contestants’ general answering strategies and number of correct answers. We also looked at those wrong answer choices which were more popular than the correct one. It turned out that a strategy called estimation was a possible reason for these. Furthermore, misconceptions were more common among fourth to seventh graders’ than older students and they mostly occurred in geometry.


Introduction
Finland has done well in various PISA evaluations.Success in TIMSS assessment has been a lot more modest.Andrews (2015) points out that the lack of success in the TIMSS assessment reveals the lack of the kind of competence that is necessary in university study of mathematical disciplines.This difference between results in TIMSS and PISA has been investigated by Andrews et al. (2014), who concluded that their observations go better together with the modest TIMSS success than the much better PISA results.The students entering elementary school are heterogeneous in their mathematical skills, but these differences disappear during the first 2 years (Metsämuuronen & Tuohilampi, 2014).Mathematical skills of ninth graders have been declining since 2001, and the achievement level has changed from normal distribution to low, average and high performers (Metsämuuronen & Nousiainen, 2021).
Since PISA and TIMSS measure very different aspects of understanding mathematics, and they give quite different information about the skills of Finnish students, it seems reasonable to compare with the results in yet another widely taken exam.One option is the Math Kangaroo competition.It is the biggest international mathematics competition in the world.For example, in 2019 there were over 6 million participants from 77 countries.According to the website of International organizer and designer of the Math Kangaroo competition, Kangourou sans Frontières, the competition is "Motivated by the importance of mathematics in the modern world, their passion is to spread the joy of mathematics, support mathematical education in school and promote a positive perception of mathematics in society" (Association Kangourou sans Frontières, 2022a).The Math Kangaroo Finland is organized by Maunula Secondary School and Helsinki School of Mathematics and the competition has gathered about 15,000 participants (Association Kangourou sans Frontières, 2022b;Kangaroo Finland, 2021a).
The Math Kangaroo Finland is a very suitable choice to consider results in another widely taken exam for several reasons.Even though it is a mathematics competition, it gathers about 15,000 participants in Finland a year (Association Kangourou sans Frontières, 2022b), which means that it is not only an elite competition but taken by a wider population.In our opinion, competition problems are a very nice mix of skills required in PISA and in TIMSS: while in PISA, the emphasis is on mathematical literacy, and in TIMSS, the emphasis is on mathematical core knowledge, the problems in Math Kangaroo require both.For example, in 2019 sixth and seventh graders were asked to solve the following problem: "Jere took selfies with his 8 cousins.Each of his cousins is in two or three of these pictures.In each picture there are exactly 5 of his cousins.How many selfies did Jere take?A) 7 B) 6 C) 5 D) 4 E) 3" (The Kangaroo Finland, 2021b.)On way to solve the problem is to note that the total number of cousins in the pictures (counting with multiplicities) is divisible by five.Then, considering the potential sums and comparing it to how many cousins must be in two and how many in three pictures, it can be deduced that the correct answer is four.Hence, the problem requires the ability to interpret a real-life problem mathematically, indeed mathematical literacy, and solver benefits from knowing divisibility.Our perception is that the Math Kangaroo exercises are essentially never in the remember class in the Bloom's taxonomy (Anderson & Krathwohl, 2001;Bloom et al., 1956), and very rarely in the understand class.Hence, by our consideration, these problems are based on higher classes in the taxonomy.
However, as the Math Kangaroo Finland is not a mandatory exam, it is plausible to believe that a large part of participants in Math Kangaroo have teachers who are enthusiastic about mathematics, which possibly affects the outcome.
Typically, Math Kangaroo problems are neither traditional school mathematics nor rigorous competition mathematics as we can see from the example above.It is not allowed to use a calculator.There are five series in Math Kangaroo: Mini-Ecolier (2. and 3. grades), Ecolier (4. and 5. grades), Benjamin (6. and 7. grades), Cadet (8. and 9. grades) and Student (high school).Math Kangaroo competition is free and open for all schools, and it is sufficient that one teacher enrolls the school in the competition.Math Kangaroo is held annually in March-April, and it is effortless to organize the competition since multiple-choice questions are easy to check (Association Kangourou sans Frontières, 2022a.).
In this research, we investigate the fourth to ninth graders in the Math Kangaroo Finland in 2011, 2015 and 2019.First, we consider, what kind of homogeneous subgroups can be found based on contestants' answer choices for the problems using a person-centered approach, the latent class analysis.Then, we consider problems for which at least two different groups answered incorrectly with at least two different ways and how misconceptions or ways to answer can describe the differences of the homogeneous groups.We call these problems wrongly answered group characterizing problems.Lastly, we investigate possible misconceptions and answering choice strategies within the answer choices that were more popular than the correct one.In particular, our aim is to understand what kind of misconceptions these students have.We consider only wrongly solved problems since the Math Kangaroo Finland consists of multiple-choice questions and, hence, it is not possible to determine what kind of strategies students have used when they have solved the problem correctly.It is worth noticing that while understanding these misconceptions does not help to arrive at the correct answer without solving the problem, understanding these may help the teacher to better tackle the common misunderstandings.

On person-centered approach
Traditional analytical and variable-centered methods, such as regression analysis and factor analysis, do not always recognize heterogeneity within individuals (Hickendorff et al., 2018).Linear techniques used in these methods concentrate on the relations between variables (Bergman et al., 2003;Collins & Lanza, 2010;Muthén & Muthén, 2000) and assume that the individual differences are homogeneous (Collins & Lanza, 2010).This ignores heterogeneity between learners and is thus restricted to quantitative individual differences and cannot reveal qualitative differences (Hickendorff et al., 2018;Lanza & Cooper, 2016).
Since the aim in learning research is not to describe a single learner but general patterns of learning pathways, it is important to apply methods that identify them (Hickendorff et al., 2018).Person-centered approaches, such as cluster analysis, finite mixture analysis and latent class analysis, are methods that take the individual heterogeneity into account and hence provide a solution to the problem.The idea is to group individuals into categories where the categories are similar and differ from other categories (Bergman & Magnusson, 1997;Hickendorff et al., 2018;Muthén & Muthén, 2000).Latent class analysis provides a flexible tool "to analyze both qualitative and quantitative inter-and intraindividual differences simultaneously" (Hickendorff et al., 2018).
Person-centered approaches are used to examine individual differences in learning of mathematics as well as other disciplines.Flunger et al. (2017) investigated students' homework behavior using a person-centered approach.Investing effort and spending a lot of time on homework were found to be associated with high motivation and high conscientiousness.Lin et al. (2018) conducted a latent profile analysis inferring three homogeneous latent classes to measure math selfefficacy.Their person-centered approach showed that students' ethnicity and implicit theories of math ability predict the class membership (mastery/moderate/unconfident).In their case study, Delderfield and McHattie (2018) offered findings through an application of a person-centered approach of delivering mathematics skills development.

On misconceptions
In addition to finding possible subgroups among Finnish students in the Math Kangaroo Finland, we investigate possible misconceptions in wrong answer choices that are more popular than the correct one.This is important because if we understand how misconceptions are related to each other, then we may be able to find different factors that affect student's mathematical development (Bransford et al., 2000).
The definition of a mathematics misconception may slightly vary depending on the reference: Misconceptions are described to be conceptual misunderstandings (see, e.g., Durkin & Rittle-Johnson, 2015;Van Dooren et al., 2015), but also as any "student's conceptions that produce a systematic pattern of errors" (Smith et al., 1993, page 119).In this article, a misconception means a conceptual misunderstanding.
Misconceptions have usually been studied through different topics such as algebra and geometry (see Rakes & Ronau, 2019).The results have shown that misconceptions in rational numbers and/or probability may predict those in algebra and geometry (see, e.g., Fuson et al., 2005;Lamon, 2007;Moss, 2005) and that there may not be a hierarchical relationship between misconceptions and topics (see Rakes & Ronau, 2019).However, Rakes and Ronau (2019) tried to find factors which may produce misconceptions, but which do not depend on topics.According to their research, misconceptions behind several knowledge structures (variable, measurement, spatial reasoning, additive/multiplicative structures, absolute/relative comparison) affect misconceptions in several topics and each topic in their research contains misconceptions from several knowledge structures.It has also to be emphasized that correct answers do not necessarily mean that there is no misconception (Lobato et al., 2010).

On answering choice strategy and preferred answering choices
In addition to studying misconceptions in problems where students in at least two different subgroups tend to give different wrong answers or in the wrong answer choices that are more popular than the correct one, we study possible answering choice strategies and preferred answer choices among them.By an answering choice strategy, we mean a way to choose an answer for a question especially when the way leads to an incorrect answer.
Studying answer choice strategies is important since solving a mathematical problem correctly typically includes selecting and applying suitable strategies flexibly (see, e.g., Siegler & Lemaire, 1997;Siegler & Shipley, 1995).However, it should be noted that the most efficient strategy may depend also on the problem solver, not only on the problem itself (Verschaffel et al., 2009).
Symmetric answer choices have been seen to be preferred in mathematics questions: Reber et al. (2008) found that the symmetrical statements were thought to be true more often than asymmetrical ones, even though they were true equally often.
Different answering choice strategies have also been found when it is investigated how pressure affects performance.Stress factors or negative feelings have been seen to affect performance in mathematics (Ashcraft & Kirk, 2001;Vukovic et al., 2013).However, the results have not been unambiguous (see, e.g., Ng & Lee, 2010;Sorvo et al., 2017;Vukovic et al., 2013).One stress factor seems to be time pressure (Hunt & Sandhu, 2017;Sussman & Sekuler, 2022).It has been seen to affect the strategy to solve mathematical problems in (Campbell & Austin, 2002;Caviola et al., 2017;Chesney et al., 2013;Luwel & Verschaffel, 2003;McNeil et al., 2010;Schunn et al., 1997).Furthermore, Beilock and DeCaro (2007) studied the performance of students under high social and monetary pressure.They compared how working memory affected the solving strategies.They noticed that the higher working memory a participant had, the less likely they were to use a shortcut in a low-pressure situation, while in a high-pressure situation, they could resort to shortcuts.Using shortcuts can be useful in solving different problems, but sometimes shortcuts lead to wrong conclusions.
Several websites and documents offer guidance and "educated guessing strategies" on how to answer multiple-choice questions.One example is an article by Lindner (2020) on the website of the Western Illinois University.Stough (1993) has collected different techniques which are often found helpful.For example, long choices have often been seen to be the correct answer.
The following 10 strategies and signals to correctness are used to solve mathematical problems, and they can also be identified from the answers in multiple-choice tests: Add-all (Chesney et al., 2013;McNeil et al., 2010) Add-to-Equal Signs (Chesney et al., 2013;McNeil et al., 2010) Add-two (Chesney et al., 2013): An equation of type 3 þ 5 þ 9 ¼ 3 þ ? is solved by summing wrong two numbers.In the example, the solution would be 3 þ 5 ¼ 8 or 3 þ 9 ¼ 12: Carry (Chesney et al., 2013): An equation of type 3 þ 5 þ 9 ¼ 3 þ ? is solved by taking one of those terms from the left-hand side that do not appear on the right-hand side of the equation and claiming it to be the answer.In the example, the solution would be 5 or 9: Estimation (or guessing or using heuristics) (Beilock & DeCaro, 2007): The student uses previous associations or heuristics, which may lead to the correct answer.For example, if one needs to decide whether 80 À 14 is divisible by 4, the student may say yes, because all the numbers are even.

Mid-range answers
Repeat (Chesney et al., 2013): An equation of type 3 þ 5 þ 9 ¼ 3 þ ? is solved by claiming that the answer is a term on the right-hand side.In the example, the solution would be 3.
Use symmetry as a hint for a correct answer (Reber et al., 2008): Symmetrically written statements can be seen to be correct more often than asymmetrical ones, even if they are correct equally often.
Other: Other than retrieval/remembering, counting or transformation in (Campbell & Austin, 2002), other than Add-all, Add-to-Equal Signs, Add-Two, Carry, Repeat or a correct answer in (Chesney et al., 2013).
It should be noted that the previous list does not contain strategies that cannot be recognized in multiple-choice questions.For example, the most seriously studied multiple-choice test strategy seems to be that it is beneficial to change answers (see, e.g., Merry et al., 2021).However, in our study, it is not possible to consider this aspect since it is not known if the final answer is the initial one or not.See (Caviola et al., 2017) for the review of different answering choice strategies.
While it is sometimes difficult to recognize the strategy from multiple-choice answers, it is worthwhile to notice that even if explanations were required, the strategy that is being used, is not always understandable or even makes sense (Beilock & DeCaro, 2007).

Study aim and research questions
We consider the following three research questions: (1) Are there some kind of similar groups among the students?What kind of groups are there?
The homogeneous groups obtained to answer question (1) lead to the following question: (2) Do misconceptions described or answering choice strategies in Sections 2.3 and 2.3 describe the differences of the found groups when we consider wrongly answered group characterizing problems?(3) Consider misconceptions and answering choice strategies in a more general setting: Can misconceptions or answering choice strategies in Sections 2.3 and 2.3 explain answer choices that were more popular than the correct answer?

Participants
We investigated three different series: Ecolier, Benjamin and Cadet.The participants are from the fourth grade to the ninth grade in the Finnish school system.The number of participants varied from 2,923 to 3,340 in Ecolier, from 5,019 to 6,394 in Benjamin and from 5,545 to 6,167 in Cadet (Kangaroo Finland, 2021b).

Data
We used statistics provided by Math Kangaroo Finland website (Kangaroo Finland, 2021b) which contains information about the competitions from 2011 to 2021.The archive contains each contestants' choices for each question, their scores for each problem and the final score in the competition.In this study, we limited our consideration to years 2011, 2015 and 2019 since they contained complete data and all of the contestants did the competitions in paper which allowed them to choose multiple answer choices.

Background on latent class analysis
To answer the first and the second question, we wanted to find homogeneous subgroups of contestants.We used a statistical tool, latent class analysis (LCA) (see, e.g., Goodman, 1974;Lazarsfeld, 1950;Lazarsfeld & Henry, 1968), to find these subgroups.LCA has been applied in several areas of educational research, also in mathematics education (see, e.g., Fan et al., 2019;Hickendorff et al., 2009;Linzer & Lewis, 2011;Swanson et al., 2018).It has been argued to be suitable for finding subgroups in this kind of analysis with presumably qualitative differences, data sets consisting of hundreds of participants and categorical variables (Hickendorff et al., 2009(Hickendorff et al., , 2018)).The analysis can be used to determine subgroups based on the answers of a questionnaire (Collins & Lanza, 2010).
In LCA, homogeneous subgroups are called classes.The classes are wanted to be homogeneous and to differ from each other as much as possible (Hickendorff et al., 2018;Lanza & Cooper, 2016;Lazarsfeld, 1950).In LCA, this is done with the help of the following function f y ð Þ: Let where y ¼ y 1 ; y 2 ; . . .; y N ð Þ is the data as a vector, K is the number of classes, P k ð Þ is the probability of the class, N is the number of the items in data y and P y n jk ð Þ is the conditional probability that an answer choice y n belongs to class k (Hickendorff et al., 2010).At the beginning of LCA, the classconditional probabilities, from which the terms P y n jk ð Þ can be computed, are given by a user, or randomly generated by the algorithm based on the user's choice.This model is then estimated using the expectation-maximization algorithm (EM) (Dempster et al., 1977).The goal is to maximize the socalled log-likelihood function which is computed from the given probabilities.This is done iteratively by replacing the old probabilities with the new, better ones.The process is repeated until the change of the log-likelihood function is small enough or the maximum number of iterations is obtained.After this, the obtained classes form the wanted division to subgroups (Linzer & Lewis, 2011.).

The latent class analysis in our setting
In the first research question, we studied hidden homogeneous subgroups of the contestants using LCA.We wanted to find classes based on students' categorical answer choices (empty, A, B, C, D, E, choosing more than one option) to the problems.
The vector y consists of answers to the questions and N ¼ 21 is the number of the problems.We used poLCA package (Linzer & Lewis, 2011, 2013) in R version 4.1.2to perform the basic LCA.Default parameters for tol, na.rm, probs.start,nrep and calc.se were used, and maxiter was first set to be 1,000 and then 5,000 since a thousand iterations caused the error"MAXIMUM LIKELIHOOD WAS NOT FOUND" in Ecolier 2015.It should be noted that 99.4-100% of the contestants were classified into similar classes with both 1,000 and 5,000 iterations in those competitions where it was possible to perform 1,000 iterations.The parameter nrep was set to be 10 to avoid the problem that poLCA produces a local maximum instead of a global one (Linzer & Lewis, 2011) and because poLCA may produce different results in different runs due to the randomness of the algorithm (Haughton et al., 2009).
We tested which number of classes produced the best possible result by varying their number throughout the interval 1-20.We did not consider a higher number of classes since almost all the suggested models had less than 20 classes (see Table 1).Since Bayesian information criterion (BIC) (Schwartz, 1978) and Akaike information criterion (AIC) (Akaike, 1973) usually provide suitable estimates of the model in the case of basic LCA (Forster, 2000;Lin & Dayton, 1997;Linzer & Lewis, 2011;Nylund-Gibson et al., 2007), we considered the models with the lowest BIC and AIC.
However, the best possible model does not necessarily mean that the model is good (Oberski, 2016).To determine how well the students belonged to the class they had the highest probability to belong, we computed average posterior probabilities (AvePP) for each class.Nagin (2005) suggests that AvePP is at least 0.70 for all classes and hence we compared AvePPs to this value.
After finding the LCA classes, we considered wrongly answered group characterizing problems by first finding them and then considering different ways to get these wrong answer choices.This was our method to investigate the second research question.

Misconceptions and answering choice strategies
We classified various answers students gave in the competitions by trying to analyze which of the possible strategies could have led to the answer.We first went through the problems and tried to solve problems wrongly to obtain common wrong answers.In some cases, it was impossible to think how one might even arrive to some answer choices.After this, we classified these problems under different solution strategies based on what kind of strategies could be used in the solutions.All three authors did this classification, and then we accepted the strategy as a possible one only if at least two of us agreed that it was a likely strategy to arrive to a certain answer.
For example, in the following question (see Figure 1), the correct answer was D with 10% of the participants choosing it.Every other choice was more popular.The most popular one was B with 31% of the participants choosing it.Choice B was classified as misconception and estimation, because it could be reached by simplifying the problem to a perhaps more familiar one of only counting squares of different sizes with sides parallel to the sides of the large square, hence estimation.On the other hand, a similar explanation also works for misconception, so that was the other classification.
As another example, in the following question (see Figure 2), the answer d was correct, and it was chosen by 19% of the participants.Answer A was more popular with 33% of the participants.Answer A is reached if one computes the fourth figure, instead of the fifth one.This mistake was classified as "other", as it was not a consequence of any strategy.

LCA: classes
Using LCA, we divided participants of each competition into different classes.The best possible number of classes varied from 9 to 14 using AIC and from 4 to 7 using BIC (see Table 1).The best possible number of classes for Ecolier tended to be lower than for Benjamin and Cadet by both AIC and BIC.Further, based on BIC, each series had exactly 1 year in which the suggested number of classes is lower than in the other 2 years.Only in the case of Ecolier, the year giving the lowest number of classes by AIC coincided with the year giving the lowest number of classes by BIC.
All the AvePPs were at least 0.70 which is the boundary value suggested by Nagin (2005).Since AvePPs were in general higher for classes with the lowest BIC than with the lowest AIC and the classes by BIC were clearer to interpret, these classes were looked at closely.The classes obtained by BIC are described based on means, standard deviations and medians, obtained minimum and maximum numbers of the correct answers compared to the whole series, percentages of contestants answering correctly to some problems, leaving some of them empty  and answering incorrectly to some, the most common answers for the problems (see Appendix A, Tables A2, A4 and A6) and proportions of each answer choice in the class (see Appendix A, Tables A1, A3 and A5).According to this analysis, the contestants belonged to different classes based on their number of correct answers and their answering strategy.Here answering strategy means how the contestants belonging to the class answered the questions in general, for example, did they tend to answer all the questions.Types of different classes are given in Table 2 and used abbreviations are described below.The first part of the abbreviations corresponds to the answering strategy and the second part to the number of correct answers.
It should be noted that the differences of the classes in the same competition can be described only by properties described in Table 2, there is no need to describe them by misconceptions or answering choice strategies described in Sections 2.3 and 2.3.The wrongly answered group characterizing problems do not generally divide classes that are otherwise similar.The only possible exception is Benjamin 2011 with classes JBs and NBs.In this case, both classes seem to have a misconception on how the number zero in the middle of an integer affects the product of two integers (see problem 11) but the misconception is different: adding at the end of the number (JBs, 11d) or in the middle (NBs, 11c).Further, they also use the estimation strategy in different ways in problem 12 but this is not an important qualitative difference.
No. of classes 6 6 4 6 7 7 7 7 5 Abbreviations for the answering strategies: Mo = chose more than one option (this is the only class applying this), N = did not tend to leave answers empty, J = tended to jump over some questions and then answer some later ones, F = tended to answer the first questions of the exam and then skip the later ones.Mi = did not clearly apply either of the strategies J or F but left some questions empty and answered some questions, Ns = did not tend to apply any of the previous strategies.
Abbreviations for the number of correct answers: Ac/Bc clearly above/below average, As/Bs = a little bit or somewhat above/below average, A = average, S = similar to the whole group.
"Paul wanted to multiply an integer by 301, but he forgot the zero and multiplied by 31 instead.The result he got was 372.(He did manage to multiply by 31 correctly!)What result was he supposed to get? A) 3010 B) 3612 C) 3702 D) 3720 E) 30720" (Benjamin 2011, problem 11, see The Kangaroo Finland, 2021b.)Answering strategy N was the most common strategy except for Benjamin 2011 and Cadet 2011.Further, there was always a class NAc or NAs and a class NBs or NBc.The contestants in Ecolier belonged more often to classes applying answering strategy N (58-75% of participants) than in Benjamin (37-55%) and Cadet (29-54%).
In each competition, strategy J was applied in at least one class.In Cadet, the students in none of these classes performed above average, whereas in Benjamin 2011 and 2015 and Ecolier 2019, this kind of class exists.Furthermore, older contestants were more likely to belong to the classes applying strategies F or Mi than younger ones.In classes applying strategy Mi, the number of correct answers was in general at least on the average level.As we can see also in Table 2, averagelevel classes are more common in Ecolier (21-54%) than in Benjamin (0-26%) and in Cadet (0-24%).
With the exception of year 2019, each competition had a class in which contestants chose more than one answer even though this led to a point deduction.These classes consisted of 4-6% of the contestants.In these classes, the diversity of the number of correct answers was often similar to the whole group.

On answer choices that were more common than the correct one
We found 17 answer choices that were more popular than the correct one in Ecolier, 16 in Benjamin and 25 in Cadet (see Table 3).
Out of misconceptions and strategies described in Section 4.1 we identified the following four among wrong answer choices that were more common than the correct one: misconceptions, estimation, use symmetry as a hint of correct answer and choose a mid-range answer.In addition, we also identified that some errors were caused by miscalculations and some in category other (mistakes, guessing, not understanding the problem correctly or answer is taken from a picture which is not in scale).(See Figure 3.) Except for Cadet 2019 21E, all answer choices belonged to at least one category and some of them in more than one category.As can be seen in Figure 3, estimation was the most common wrong strategy and misconceptions and category other occurred more often in younger age groups than in eighth and ninth graders' group.The misconceptions were mostly related to geometry, such as figuring 2D or 3D objects (see Table 4).Except for one geometry misconception (area and perimeter), these misconceptions seem to be based on not figuring geometric objects correctly.However, it should be noted that the same answer choice may be classified into several categories.Miscalculations were more often behind a popular wrong answer choice among fourth and fifth graders than older students.

Discussion
First, we divided contestants into different classes to investigate general learning patterns and to reveal personal differences (see Hickendorff et al., 2018;Lanza & Cooper, 2016).According to LCA and BIC, the contestants were divided to 4-7 classes based on their answering strategies and the number of correct answers (see Table 2).The classes can be described by the general answering strategy and performance level of the students, especially not by the misconceptions or answering choice strategies described in Sections 2.3 and 2.3.One reason may be that estimation was found to be the used answering choice strategy in more than half of the choices that were more common than the correct one (see Figure 3).Hence, different answer choice strategies may not make enough difference between students.However, it should be noted that it may be possible that there are several different ways to divide contestants meaningfully into different classes and these classifications may depend on the selected method.
The existence of classes NBs and NBc may be explained by the results that unskilled students are often, but not always, not aware of their lack of skills or lack of knowledge to estimate their performance (see Händel & Dresel, 2018;Kim et al., 2015;Kruger & Dunning, 1999;Urban & Urban, 2021).Hence, they may unwisely answer questions that they are unable to solve.
Further, older students were less likely to belong to a class applying strategy N than the younger ones.In Cadet, the contestants more likely belonged to a class applying strategy F or Mi than in Ecolier or Benjamin, and in most of these classes, the students in general had a higher number of correct answers than the rest of the population.Students' awareness of their own skills may improve when they become older (see Urban & Urban, 2021); thus, students may be more capable of determining whether they are able to solve a problem correctly and hence skip some of the questions.On the other hand, in Cadet there was no class in which the students performed above average and applied strategy J, whereas in Ecolier and Benjamin, this kind of class exists.If the awareness of own skills raises, it should probably mean that especially the strongest contestants do not follow the question order when they answer the questions.Hence, they should apply strategy J which is advised to be a time wise strategy (see Scruggs & Mastropieri, 1986).One explanation could be that the contestants in Ecolier and Benjamin competitions were faster to solve the problems than in Cadet and hence there were more classes applying strategy F in Cadet.On the other hand, the average numbers and medians of correct answers were quite similar and at most 12 in each series (see Appendix A, Tables A1, A3 and A5) which does not support the idea that Cadet competition problems were more difficult.
Older contestants, indeed, those in Benjamin and Cadet, do not seem to form average-level classes as often as the youngest ones.This might mean that the difference between different • Sides do not change when we look at a 3D object from the opposite direction (11.18).
• In a big 3 � 3 � 3 cube formed by small white and black cubes, the total number of white cubes is the same as the number of white cubes in the faces or all layers are similar, when they are not (15.12B).
• In a right-angled triangle with hypotenuse 2 and legs of equal lengths, legs are also 2 (15.10A).
• If one gets minus points by answering wrong to a question and the same number of positive points by answering correctly, answering wrong means the same number of points as just not getting positive number of points (11.15E).
• Combinations are computed by a sum (15.19B).
• It suffices to check one bad case (or some bad cases) to determine when something surely happens (15.11A,15.11B).• Adding a line through a square cannot add corners to a new figure (11.2D).
• Product of numbers increases the least when the smallest number increases (11.17A).
• If the product of each of three pairs of numbers is the same, there must be exactly three different numbers (19.16E).
• Not-symmetric situation can be solved by dividing similarly as in a symmetric situation (15.14C).
• When a square sheet is folded in half twice and then cut twice in half, all pieces are squares (19.13B) and the number of pieces is 8 (19.13E).
• Area of a square is 4 times its side (11.15A).
• Per square meter does depend on area, not that something happens in every square meter (15.14E).
• Not-necessarily-symmetric situation can be solved by dividing similarly as in a symmetric situation (19.18D, 19.21B, 19.21C).
• Sum of two odd numbers is odd (15.20B).
ability levels rises during school years, which is supported by the result of Metsämuuronen and Nousiainen (2021) showing that the Finnish ninth graders' achievement level in mathematics is not distributed normally, but it is composed of three populations: low, average and high performers.However, our finding is not supported by the study of Metsämuuronen and Tuohilampi (2014) according to which Finnish students' abilities to solve mathematics problems differ a lot when they begin school, but these differences disappear during the first years.It might be that the average students do not tend to answer as similarly in Benjamin and in Cadet as they did in Ecolier and hence do not end up in the same classes.Also, the participation populations may be different, and hence the average classes are not that common in Benjamin and Cadet.
Since the AvePPS obtained by AIC were higher than the suggested value (see Nagin, 2005), it may be interesting to analyse what kind of classes those are.
Among answer choices that were more common than the correct one, estimation was most commonly identified to be behind the error (see Figure 3).The phenomenon, that estimation is commonly behind erros, is in line with the research that intuitive approaches are used before more deliberate approaches especially in timed tests (see, e.g., De Neys, 2006) since estimation does not require thinking problem accurately.No answering choice strategies related to equations (add-all, add-to-equal signs, add two, carry, repeat) were found.The problems requiring equation solving often required interpretation of the problem, and the possible answer choices lead us to think that the errors occurred most often in this part of the problem solving, not in solving the actual equation.
Possible misconceptions were found in more answer choices that were more common than the correct one among younger students than older ones (see Figure 3).Since out of eight different misconceptions, three (right-angled triangle, negative numbers, combinations) are related to topics that are studied in a more detailed way later in school (see Opetushallitus, 2004Opetushallitus, , 2014)), this partly explains the high number of misconceptions in Ecolier.
Out of 24 misconceptions, 12 of them were related to geometry (see Table 4).However, according to Rakes and Ronau (2019) misconceptions are generally caused by a misconception of underlining structure, not the topic itself.In this research, the possible underlining structure among most of the geometry misconceptionscan be not figuring the shapes and objects correctly.Similar misconceptions were also found by Özerem (2012) and the suggestion was to use more visual objects and to emphasize more the similarities and differences of the objects.
Contrary to the previous research (see, e.g., Fuson et al., 2005;Lamon, 2007;Moss, 2005), we did not find any connection that misconception in rational numbers or probability causes misconceptions in geometry-neither in LCA analysis or in found misconceptions.This may not mean that this connection does not exist but tells of the lack of probability and rational number problems in the Math Kangaroo Finland competition.More precisely, there were only four problems including rational numbers (Ecolier 2011 problem 13, Benjamin 2015 problem 4 and Cadet 2011 problems 9 and 19) and one probability problem (Benjamin 2015 problem 11).Hence, it would be important to study more Finnish fourth to ninth graders' misconceptions and their connections in rational numbers, probability and geometry.
It should also be noted that since misconceptions are often over-or under-generalizations (see Kalchman & Koedinger, 2005;Van Dooren et al., 2003), it may be possible that some misconceptions were put under category estimation.Multiple-choice questions possibly also hide some misconceptions.Further, we only studied possible misconceptions in wrong answer choices so that those occur in correct ones were not seen (see Lobato et al., 2010).

Conclusion
In this paper, we considered homogeneous subgroups of the contestants based on their answer choices in the Math Kangaroo Finland.The grades under the consideration were 4-9.
The homogeneous subgroups formed by using LCA can be described using only contestants' general performance level and their answering strategies.All these subgroups, called classes, satisfied the boundary value for a reasonable classification given by Nagin (2005).
We also considered possible misconceptions and answering choice strategies in answer choices that were more common than the correct one.The importance of studying them comes from the idea that understanding how misconceptions are related to each other, and then we may be able to find different factors that affect student's mathematical development (Bransford et al., 2000).Since solving a mathematical problem correctly, typically includes selecting and applying suitable strategies flexibly (see, e.g., Siegler & Lemaire, 1997;Siegler & Shipley, 1995), studying answering choice strategies reveals information of an important aspect in mathematical problem solving, too.
The most common answer choice strategy was estimation.Misconceptions occurred more often in fourth to seventh graders' age group than in the oldest group.Half of the misconceptions were related to geometry and those mainly figuring geometric objects.According to Özerem (2012) using more visual objects and emphasizing more the similarities and differences of the objects may help with these.However, it may be possible that there are misconceptions behind correct answers and hence all misconceptions may not have been found (see Lobato et al., 2010).
The study starts the research of the Math Kangaroo Finland and provides information about contestants' answering strategies.It should be noted that participants may not represent the whole fourth to ninth graders' population equally.More research would be needed especially considering other possible ways to divide contestants into homogeneous subgroups and possible reasons for misconceptions.-

Figure
Figure 3. Proportions of different categories of wrong answer choices that were more common than the correct one (number of answer choices/all answer choices that were more common than the correct one).One answer choice can belong to more than one category.