A data-driven approach to predict first-year students’ academic success in higher education institutions

This study presents a data mining approach to predict academic success of the first-year students. A dataset of 10 academic years for first-year bachelor’s degrees from a Portuguese Higher Institution (N = 9652) has been analysed. Features’ selection resulted in a characterising set of 68 features, encompassing socio-demographic, social origin, previous education, special statutes and educational path dimensions. We proposed and tested three distinct course stage data models based on entrance date, end of the first and second curricular semesters. A support vector machines (SVM) model achieved the best overall performance and was selected to conduct a data-based sensitivity analysis. The previous evaluation performance, study gaps and age-related features play a major role in explaining failures at entrance stage. For subsequent stages, current evaluation performance features unveil their predictive power. Suggested guidelines include to provide study support groups to risk profiles and to create monitoring frameworks. From a practical standpoint, a data-driven decision-making framework based on these models can be used to promote academic success.


Introduction
Research areas, such as higher education, are expanding their interest in extracting meaningful and more complex knowledge from their data sources (Koedinger et al. 2008).Recently, a research area that combines Data Mining (DM) and education has emerged and consolidated.Educational Data Mining (EDM) is a field that explores DM applied on different types of educational data (Howard et al. 2016).EDM uses data mainly obtained from educational information systems to unfold knowledge and find answers to questions and problems concerning the education system.
This study aims to apply data mining techniques to an academic data set provided by a Portuguese Higher Institution, and present meaningful information to increase academic success rate.The resulting models' performance is evaluated and its suitability to predict potential success and failure cases are scrutinized.To achieve the predictors for academic success in the first-year we implemented CRoss-Industry Standard Process for Data Mining (CRISP-DM).This methodology defines a project as a cyclic process and applies a non-rigid sequence of six main stages (Chapman et al. 2000).At the end of this process a knowledge extraction process is conducted, and the collected insights used to formulate guidelines and suggestions regarding institutional policies and pedagogical approaches to improve academic success.On an institutional and management level, the suggested guidelines are expected to leverage decision-making, optimize allocation of educational resources and increase overall institutional performance.

Background
The concept of academic success, which is pivotal to an analytical tool for assessing the quality of HEIs, has several problems in its definition and, consequently, in its operationalization (York et al. 2015).This concept has a myriad of meanings and very diverse uses, depending on the various scientific approaches, but also on its recognition in the various systems and public policies of higher education systems, and the practices and cultures prevailing in educational institutions.From the point of view of its observation, our perspective can be placed at several levels of analysis (cf.Costa and Lopes 2011): at the level of the performance of higher education systems (in a macro or structural perspective), institutional (where the present study is located) or individual paths of success (in a biographical perspective).Success can also be interpreted from the point of view of learning and acquired competences and skills, the persistence and the achievement of degrees or certifications (this is the measure of success that we can analyse), students' engagement in academic activities, or even the possibility of having a better and more qualified entry into the labour market (among other social opportunities) (York et al. 2015).Academic success concept is being applied as a definition base that aggregates a multiple number of student and institutional outcomes in students in all grade levels (e.g.Guo et al. 2018;Pace et al. 2019).Success, in conceptual terms, remains relevant in its appeal and motivation for attainment or achievement of a goal (Hannon et al. 2017).The Astin model first proposed in 1991 (Astin 2012) clearly identifies academic success as an outcome of input factors and the environment.The model also suggests that the environment functions as a mediator.However, the relationship between environment and student outcomes cannot be understood without considering student inputs.According to Tinto (2006), students enter Higher Educational Institutions [HEI] with a variety of abilities, skills, levels of high education preparation, attributes, specifically with differences on social class, age, gender, attitudes, values and knowledge about higher education.At the same time, students participate in external commitments, such as family, work and community.These set of features is being used as root to correlation and patterns studies regarding academic success.Pascarella and Terenzini (2005) refine Astin's framework by explaining higher education outcomes as functions of three sets of elements: inputs, environment and outputs.The inputs are composed by demographic characteristics, family backgrounds, academic and social experiences.The environment encompasses people, programs, policies, cultures, and experiences that students encounter in HEI.
The first-year student's achievement is predictive for subsequent years, as seen in Brouwer et al. (2016) study, so the students must be supported early.For HEI to be able to create the most appropriate support for students, it is necessary to understand which factors predict the academic success.An approach with data mining techniques will allow to achieve this goal.According to Romero and Ventura (2010), the introduction of DM techniques in academic domains could improve decision-making processes in higher learning institutions.This improvement is expected to promote student's retention, transition rate and academic success.EDM is a field that explores data-mining approaches and techniques on different types of educational data, aiming at solving problems within the educational context (Baker and Yacef 2009).It concerns to better understand students and the settings in which they learn (Baker 2010).Over the years, students' enrolment and practicing in HEI has generated huge sets of student related data that may reflect the efficiency of the learning process (Koedinger et al. 2008).Converting raw data originated by educational systems into useful information can potentially have a great impact on educational research and practice.Regardless of the origin, all DM techniques show one common characteristic: automated discovery of new relations and dependencies between attributes in the observed data.
Through this research effort it was possible to infer that academic success' modelling is significantly affected by diverse factors, such as, higher educational context, educational system and its specificities, available data and its quality.Other aspects such as problem and modelling decisions lead to distinct operationalization of success and how it is measured.Regarding datasets there is a great diversity in terms of source, nature and volume (Khan et al. 2017).The data sources are mostly originated through surveys to students and/or form the HEI database.It is possible to label the reviewed features in five distinct clustering groups: socio-demographic features, social origin features, educational path features, previous education features and special statute features.Regarding student's success operationalizations, the following main definitions have been reviewed: passing grade in a specific module or course, passing grade in a specific exam, passing grade point average, student's graduation and student's graduation with no failures.A wide spectrum of relevant explanatory features is observed, as there is a large number of distinct features pointed as the most relevant in the literature depending on each study's characteristics.There is no standard in the used datasets, as each study relies on distinct sources.Even so, the following features' groups showed great impact on multiple studies: previous education features, educational path features and socio-demographic features.

Methodology and methods
We adopted the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology which is particularly suited for data-driven research projects as it was demonstrated by Moro et al. (2011), with several recent data-driven studies adopting it (e.g., Almahadeen et al. 2017).Such approach consists in an iterative sequence of six phases (business understanding, data understanding, data preparation, modeling, evaluation, deployment) involved in a cycle with the goal to tune the final result, i.e., the data model capability to adequately model a problem according to evaluation metrics.The CRISP-DM is not prescriptive, rather it suggests a sequence flow.Therefore, the methodology is flexible, although usually the six phases tend to be followed.In research, usually the deployment phase is replaced by knowledge extraction to understand a given problem (Moro et al. 2011).All the experiments were implemented using the R statistical tool and the "rminer" package (Cortez 2010).In the next subsections, we detail the main tasks implemented to address each CRISP-DM phase.

Business understanding
The studied institution is a public HEI located in Portugal, where the socioeconomic status constitutes a high impact on academic success of the students (Mestre and Baptista 2016).This relationship is also verified in other countries, as reported in Sirin (2005); Ingram (2011) and Brouwer et al. (2016), who have found that the financial and social capital of families have a high influence on students' academic success, even at the college.All the used data was anonymized and its use for this research was made under a confidentiality agreement signed by the institution's Data Protection Office representative and all the authors.
All data used in our study was extracted from the institution's information system.Therefore, the earlier in the academic path the DM model could predict failures by avoiding a high level of incorrectly predicted cases (false positives), the better.This study adopts student's graduation with no failures as student's success operationalization.Thus, DM's goal and the main analysis' subject are devoted to predicting students that would not complete their degree's programme within the optimal number of curricular years.In other words, students that fails and/or repeats at least one curricular year.This study follows a classification DM approach, as it builds a predictive model that classifies a data record into one of two predefined classes.Predefined classes used for success are "Failure" and "Success".

Data understanding
The original data collected from the institutional database included bachelor students' records who effectively enrolled a programme provided by the institution between 2006/2007 and 2015/2016 (10 years' timeframe).This ensures success operationalization requirements to be met.An analytical base table (ABT) was created to collect features' candidates, originally spread among distinct tables of the relational database model.No data collected after the first curricular year were considered for feature's gathering purposes because the first-year student's academic success is predictive for later years (Brouwer et al. 2016), as it concentrates in freshmen.Thus, thirty-two directly extracted features were added to the ABT in first instance.Additionally, fifty-four derived features total were designed for student statutes and social services.Social services features were also split in two categories: accepted and requested, adding extra detail to the analysis.Further data understanding effort exposed potential features based on pre-existing data.According to Barraza et al. (2019), feature engineering is key for data mining.Therefore, eighteen new computed features were designed applying non-straightforward logic, requiring distinct transformation, aggregation or/and calculation processes.For instance, five new computed features were designed for candidacy preference considering the relationship between student's preference, HEI and degree student ended up registering and entry exams grades average.On its turn, new computed evaluation related features were designed, comprising overall evaluation by each semester of the first curricular year.Thus, six new computed educational path features were designed for student's evaluation.It is important to detail that weighted average features were calculated relying on the premise that 30 ECTS are the optimum amount of ECTS to be collected per semester.Additional computed features representing, student's age at entry, study gap time between precedent and current educational degree, and student's residence location were also developed.Table 1 depicts the description and resultant classes of each of the one-hundred and four features gathered at this stage.

Data preparation
Data preparation stage requires to take decisions on final features' set, establishing the foundation for modelling.Five distinct approaches were applied at this stage.The first approach was based in data generalization (through replacing low level attributes with high level concepts).A conceptual review process was carried out to design a meaningful higher aggregation level, to deal with several features' low quality, setting bases for appropriate modelling.For instance, six distinct classes were designed, taking ESCO1 (European Skills, Competences, Qualifications and Occupations) multilingual classification of occupations, for parent's occupations, one of the indicators for socioeconomic status (Costa et al. 2002;Smith and Lynch 2004).The second approach consisted in dealing with missing data' features.Hotdeck imputation algorithm (knearest neighbour) was applied to some feature, while for the remaining missing data' features, a 1% threshold was set up for decision taking (input with "unknow" value or exclude).The third approach consisted in reviewing dependencies between the DM goal and each feature.For instance, partial-time students are unable to meet operationalized success conditions, so, all records, which partialTimeStudentAtEntry is true were excluded.The fourth approach consisted in removing single class features.A clear example is degreeType feature, that due to this proposed scope, is only represented by a single class: bachelor.The fifth and final approach is based on outliers and conflicting data' features.At the end of this process further CRISP-DM iteration based on features' selection tuning decided on the imputed values' features.Table 2 summarizes the final ABT by features' group, data type and collection time, as the result of data preparation "True";"False".
firstChoiceCourse Was the enrolled degree the first choice? "True";"False".
Education and Information Technologies (2021) 26:2165-2190 The population represented in the dataset consists in 48.1% male and 51.9% female.The average age of the students is 20.1 years, with a standard deviation of 5.3 years.As expected for these ages, most of the students are single (95.8%), with the remaining consisting in 2.4% married, 0.8% divorced, 0.1% widowed, and 0.9% unknown.

Modelling
Considering nature and structure of the final ABT and the techniques that produced best results in the related works, supported by Shahiri and Husain (2015) analysis, we decided to develop models based on the following four techniques: Decision Trees (DT) (Apté and Weiss 1997), Random Forests (RF) (Breiman 2001), Support Vector Machines (SVM) (Cortes and Vapnik 1995) and Artificial Neural Networks (ANN) (Haykin 1994).Rminer provides the mining function which we applied using the following setup: RPART (Recursive Partitioning and Regression Trees), DT and CTREE (Conditional Inference Trees) (distinct DT algorithms), RF, SVM, and MLPE (multilayer perceptron), as ANN representative.Models' training plan is based on k-fold cross-validation method (Trevor et al. 2009).The k parameter was set to 10 (k = 10), as per the most recent related works' guidelines.Each DM model analysis is submitted to 20 runs in order to enhance results' robustness.

Evaluation
Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the ROC Curve (AUC) (Bradley 1997), based on the confusion matrix (Kohavi and Provost 1998), were used for measuring purposes.Three main models are evaluated, one for each data collection time, entrance, end of the first curricular semester and end of the second curricular semester.Each model relies on a distinct number of features depending on collection time it is based on.The 4-year degrees' model is additionally evaluated at this stage.
The first model being evaluated are composed by 30 features collected at entrance.This model is henceforth referred as DM_Entrance.Table 3 depicts SVM, as the best predictive model (AUC slightly higher than 0.77).RF model also demonstrates a considerable predictive result surpassing 0.76, while MLPE model almost reaches 0.75.CTREE achieves the best result by far within the decision tree model, even performing considerably worse than the previous models.
Figure 1 shows the ROC curve for CTREE, as DT's representative, SVM, RF and MLPE.It is possible to observe that SVM curve achieves higher TPR (True Positive Rate) values along the entire FPR-axis (False Positive Rate).SVM model proves its higher discriminatory capacity, outperforming remaining models for the whole cut-off probability's range.The points highlighted in the graphic represent a threshold value of 50%, for each model's curve.DM_EntryYear1Sem model establishes the basis for succeeding collection time model, summing up to 44 features.Table 5 demonstrates a huge predictive performance boost compared to DM_Entrance model's results (greater or equal than 13%).Once more SVM and RF achieve the best AUC results, surpassing 0.90.  Figure 2 shows the ROC curves for DM_EntryYear1Sem models' performance analysis.RF curve clearly intersects SVM curve for an FPR close to 0.5.SVM slightly achieves better performance for lower values of FPR, while RF is slightly better above that value.Threshold values of 50% and 30%, for each model's curve are highlighted in the figure.30% threshold represents an optimized TPR/FPR trade-off.
Table 6 details 30% threshold analysis through confusion matrices and resulting sensitivity and 1-specificity values for DM_EntryYear1Sem model.
DM_EntryYear2Sem model relies on the whole set of features collected by the end of the second curricular semester (68 features).Table 7 shows DM_EntryYear2Sem model's increased predictive performance results.Newly included features allowed SVM and RF models to reach, approximately, 0.94.The discriminatory capacity of the whole features' set, at this point, is so robust that distinct models' performance results tend to converge.
Figure 3 shows ROC analysis for DM_EntryYear2Sem models.SVM achieves the best performance for an FPR below 0.3.RF intersects SVM around that value, outperforming it for above values.20% threshold value was scrutinized, following same threshold selection reasoning applied previously.
Table 8 details 20% threshold analysis through confusion matrix for DM_EntryYear2Sem model.At this point the models' sensitivity is so high, that special attention is given to 1-specificity review.So, comparing RF and SVM, RF achieves a slightly better sensitivity while SVM achieves a reduced and considerably better 1-specificity results.

Knowledge extraction
Sensitivity analysis (SA) method described by Cortez and Embrechts (2011) was adopted to perform feature's relevance analysis based on SVM resultant models.Specifically, the data-based sensitivity analysis algorithm (DSA) is selected among others, as it induces several features values to be changed simultaneously, allowing interactions between input features to be detected.Figure 4 shows the relevance for the most impacting features in DM_Entrance model.Feature's relevance is measured through its contribution percentage to the output.Each of the illustrated features, 8 out of 30, demonstrates great relevance, above 5%.Their combined contribution to the model surpasses 63%.
Reviewing high impact features on its characteristics, it is noticeable that all features' groups are represented, except social origin.
Even submitted to an imputation process, during data preparation stage, entryGradeHotdeck feature keep its prominent importance and shows the highest relevance.The detailed influence of entryGradeHotDeck feature is depicted in Fig. 5.
This feature quantifies secondary or high school evaluation performance, so as highlighted in Tinto (1999), high school evaluation performance provides insight into    The second most impacting feature is studyGapYears, it is quite an interesting finding, since no similar feature is found in related works' models.Figure 6 shows no significant impact for studyGapYears values below 10.
Even so several years' gap shows slightly inferior impact than gap's absence.For gaps above 10 years, a prominent influence is verified.
The third most impacting feature is yearOfBirth, registering a similar contribution percentage.In order to review its impact illustrated in Fig. 7, it is important to remind that original dataset was trimmed to enrolments between 2006/2007 and 2015/2016.
In general terms, yearOfBirth show considerable to high contribution to failure for values below 1990.This impact trend demonstrates that failure is higher among older students, as most of these cases represent students that enrolled in later life stages.These findings follow indications presented in Natek and Zwilling (2014); Martins et al. (2017) and Fernandes et al. (2019).
Following DSA is based on DM_EntryYear1Sem model.Figure 8 shows the relevance of the 8 most impacting features in the model.Several features, collected at the end of first curricular semester, showed great impact, placing 4 features among the 8 most relevant.This impact confirms the directions discussed in model's evaluation.The combined contribution of these 8 most impacting features is close to 65%.The two higher relevance features are educational path features' group representatives.Particularly, they represent first curricular semester evaluation' information, achieving a combined relevance greater than 30%.These feature's relevance supports the findings presented in Martins et al. (2018), that relying on the same educational system, observed similar results for these features.Other studies, such as, Mishra et al. (2014); Slim et al. (2014);Zimmermann et al. (2015) and Asif et al. (2017) demonstrate similar level of impact for equivalent features in their models.Asif et al. (2017) points two groups of academic students according to their performance, high-performing students and low-achieving students and claim that many students tend to stay in the same kind of groups for all academic path.This standpoint may provide some insight regarding these features' great impact in the model.Figures 9 and 10 demonstrate that the lower their values (worst evaluation performance), the stronger their contribution to academic failure.
Figure 11 shows the relevance of the 8 most importance features in DM_EntryYear2Sem model.The combined contribution of the 8 most impacting features in the model is approximately 63%.
Following DM_EntryYear1Sem model's trend, most recent evaluation-related features are the most important features.These features' relevance is aligned with Zimmermann et al. (2015) insights regarding the higher impact of most recent evaluation performances over the academic path.Despite showing equivalent trend, compared to most recent evaluation-related features in DM_EntryYear1Sem model, a

Discussion
Figure 12 shows a wrapped-up analysis for the three main reviewed models' performance, considering each DM model per features' collection time.
SVM is clearly the best model for DM_Entrance, as it outperforms other models for all threshold range.DM_Entrance model can be developed to predict student's performance before the beginning of the first curricular semester.This a-priori predictive model shows good evaluation results (AUC = 0.77 for SVM).DM_EntryYear1Sem model predicts student's performance by the end of the first curricular semester, achieving improved evaluation results (AUC around 0.91 for SVM).DM_EntryYear2Sem model can be set up by the end of the first curricular year, achieving near perfect performance (AUC around 0.94 for SVM and RF models).SVM results can be partially explained due to improved performance of the SVM training algorithm for small sized datasets.As for the DM_entryYear2Sem model, both SVM and RF achieved similar performance, with RF even surpassing SVM's results in a few of the evaluation metrics.This RF performance boost can be explained by algorithm's improved ability to deal with a mixture of numerical and categorical features, bearing in mind that relevant numerical features amount has increased significantly, with the inclusion of first and second semesters' students' evaluation features.Although relying on slightly later stages, reducing timings for decision-making and actions to be taken, these models provide an enhanced predictive potential, achieving great performances.These results demonstrate that collecting fresh features during the first curricular year, such as, student's evaluation performance features, it is possible to enrich model's ability to predict unsuccessful cases, while reducing false positive detections.

Conclusion
The overall success of an EDM project is very much accounted for providing educational stakeholder, such as deans, coordinators, teachers and managers, with meaningful information when making decisions concerning educational policies, courses offered, etc. (Fernandes et al. 2019).It is therefore useful to underline this type of knowledge as a basis for informed intervention.This may point to clues for the institution's various forms of action.Following are some of these guidelines for the case of the analysed institution, especially regarding the success of its freshman students: & Providing specific study supporting groups for lower entry grade's students, since the beginning of first curricular semester.Some literature suggests that low performing secondary school students tend to maintain their low performance level on further higher education.& Monitoring performance evolution of a specific students' group.This group would be gathered using the following criteria: low entry grade (below 13); older students (above 26 years old) and large study gap (above 20 years).& Identifying students that collect less than 18 ECTS or achieve weighted average grade below 7, at the end of the first curricular semester.Extended institutional support can be provided to these students, such as, helping them defining individual study plan for second curricular semester, clearly identifying effort requirements and work balance for better performance achievement.& Again, at the end of the second curricular semester, poor performance students could be identified.Proceeding with pedagogical support is important at this stage.
A significant part of this study's effort consisted in data quality tasks.Nevertheless, predictive potential has been lost due to some bad quality data, this is a limitation on this study.Consistent and coherent academic data is easier to analyse and include in further DM models and frameworks.Specifically, in the data preparation phase, some features were removed due to consisting in single class features.As programmes have unique resource needs, contact hours, credits per module, prior-entry qualification requirements and, laboratory/fieldwork demands, the dataset after the preparation tasks does not reflect important aspects necessary in gauging students' academic success in higher education, which consists in an important limitation that must be mentioned.Simple processes, as empty/incomplete fields validation could be applied to academic forms in order to reduce inadequate data.Creating a segmented list of answers for each field would enhance the quality of collected data.These suggestions would facilitate and promote DM applications as it would potentially reduce the data preparation, cleansing and quality stages' effort as well as increasing the number of data and specially the number of candidates' features to be included in the model.Ketonen et al. (2016) characterized the first-year students through a set of profiles: alienated, engaged, disengaged and undecided.They found that the engaged students performed better in academic achievement and the undecideds one received the lowest grades.For future work, it would be interesting to compare the results presented in our study mediated by defined features and Ketonen et al. (2016) to create a more complete model that considers both approaches.
We also propose to designing individual school's DM models based on presented models, in order to capture specific school's characteristics; considering additional data sources, such as, end of semester's student satisfaction surveys; scrutinizing the effect of post-labour feature on academic failure; and extending data quality approaches on social origin, candidacy preference and secondary school related features and revisiting their impact on predicting academic failure.Ultimately, an information system encompassing these models can be used as a data-driven decision-making framework for supporting and optimising institutional policies and actions for academic success, also in other educational systems and social contexts.
This study has important limitations that derive, in part, from its main empirical reference.It deals with institutional data with essentially administrative and educational management functions.However, the proposed model has many advantages for monitoring and defining policies within the framework of this Portuguese university and may be replicable to other institutions.
Despite the limits of the variables available to cover some of the dimensions inscribed in the theoretical models used here -such as those related to learning and skills acquisition processes (among others mentioned by authors such as York et al. 2015); or those related to the engagement and integration of students in the academic environment and their activities (as mentioned by Tinto 2006) -it enables the use, fulfilling all the ethical requirements of data protection, of an information system on students in order to study a relevant range of attributes and factors involved in academic success.In general, these information systems have, among others, attributes of sociodemographic characterization, family background, previous academic paths and sometimes some indicators on student satisfaction that can be related to variables of educational outcomes (which allow approaching a reading of academic success).This type of exercise enables the adaptation of theoretical models to the conditions of availability of existing information, still allowing to add causal and relational knowledge about the factors of success in higher education, particularly useful, because it can provide relevant knowledge to the higher education institutions.Although this model has only been tested in one university institution, it can be tested and produce interesting results in other institutional contexts, in Portugal or in other countries, reinforcing the possibilities of monitoring and intervening in advance regarding academic success.Its focus on first-year students allows not only to act in a recognized critical segment, but also to intervene early in the sustainability of successful paths (Brouwer et al. 2016).

Fig. 2
Fig. 2 ROC curves for DM_EntryYear1Sem model Education and Information Technologies (2021) 26:2165-2190 potential academic performance of the freshmen.Previous education features are commonly pointed out as relevant predictors of academic success.Related studies, such as, Osmanbegović and Suljić (2012); Goker et al. (2013); Trstenjak and Donko (2014) and Asif et al. (2017), present previous evaluation related features as the most impacting features on their models.As per Trstenjak and Donko (2014), great part of socio-demographic and social origin features doesn't change over time, having previously influenced secondary school evaluation performance.This helps explaining the leveraged relevance of entryGradeHotDeck feature in the model.The initial perception regarding previous student's performance is confirmed, as lower entryGradeHotdeck values, presents a much stronger contribution to failure, especially for entry grade values below 13.

Fig. 12
Fig.12Shows a wrapped-up analysis for reviewed models performance
Education and Information Technologies (2021) 26:2165-2190stage.Final dataset is composed by a total 9652 records for regular bachelor's degrees and 789 records for 4-year bachelor's degrees.A total of 68 features are represented, 36 special statute features, 12 education path features, 6 previous education features, 10 socio-demographic features and 4 social origin features.

Table 2
Final ABT for DM modelling purposes

Table 4
details 50% threshold analysis through confusion matrices and resulting sensitivity and 1-specificity values for DM_Entrance models.

Table 4
Confusion matrices for DM_Entrance model

Table 5
AUC results for DM_EntryYear1Sem model

Table 6
Confusion matrix for DM_EntryYear1Sem model

Table 8
Confusion matrices for DM_EntryYear2Sem model