1. Introduction
Every year, well over half a million high school students drop out of school in the United States. Few of them return to earn their diplomas or General Education Development Test (GED) certificates. While some of these young people will ultimately get their lives back on track and secure stable careers, many of them will end up homeless or involved in criminal activity within a few years after dropping out of high school. One alarming study found that high school dropouts commit around 75 percent of all crimes in the United States [
1]. In particular, male minorities living in urban areas correlate strongly with this trend. For example, a study found that about 60 percent of Black students in the Bronx who dropped out of high school serve prison time [
2].
Completing high school education is a critical step in a person’s life, and failing to do so generally sets them on a dire course for the future. For prior generations of Americans, this was not the case, but in the current economy, it is difficult for anyone to gain employment in a legitimate career path without a high school diploma or GED certificate.
The average dropout has an annual income of only USD 20,241, which is USD 10,386 less than those with a high school diploma or GED [
3]. That is the difference between living close to poverty levels by US standards and having a stable lower- to middle-class income. Over a lifetime, the average career earnings of dropouts are USD 260,000 less than those who earn a high school diploma or GED. A 2010 National Center for Education Statistics (NCES) study [
4] estimated that getting just one student to graduate from high school instead of dropping out would create an average societal net benefit of USD 200,000 over that person’s lifetime. High school dropouts in 2017 were also found to have the highest unemployment rate by far of any social class at around 13 percent [
5]. For the first time on record, the majority of American dropouts aged 16–24 were not active participants in the labor force, as
Figure 1 below shows. An increasing number of these people are no longer even employed in minimum wage jobs, as they were in prior decades.
While high school dropouts are a nationwide problem, they are a problem that is especially prevalent in the state of Louisiana. The 2019 NCES study [
5] found that Louisiana public high schools had the highest dropout rate in the nation by far over the period from 2013 to 2017. A staggering 9.6 percent of Louisiana residents between ages 16 and 24 were high school dropouts far beyond the national average of 6.0 percent [
3]. Louisiana public high schools consistently have among the highest dropout rates in the nation each year and, often, the very highest. In the dataset of this project, it was found that approximately one in nine of Louisiana’s seventh graders dropped out of public high schools before completing the twelfth grade. About 6000 students drop out of high school each year in Louisiana. The severity of the problem is a major factor in the state’s high poverty, violent crime rates, and its struggles to keep up with neighboring states economically [
6].
This dataset was compiled for a project that studied high school dropouts in the state of Louisiana from the geospatial and statistical perspectives, seeking to find clear patterns for what factors cause the dropout rate to grow or diminish. Despite dropouts being such a prevalent and consistent problem in the state, there has been a miniscule amount of publicly available research about high school dropout rates in Louisiana [
7]. What little there is tends to be a brief mention in national studies or a short annual update on school performance metrics, sometimes including dropout statistics, in a local paper [
7,
8,
9]. The project aimed to help bridge that gap in the established research by performing a comprehensive analysis of the problem. The full project report, which also includes a spatial analysis of the dropout rates and further literature to explore, can be found here [
10]. However, this dataset can be useful to researchers and analysts who are interested in studying other aspects of school-level data.
The paper is organized as follows.
Section 2 describes in detail the characteristics of the study area to allow for a better understanding of the data. This section explains the types of schools included or omitted from the dataset and clarifies things for readers who may not be familiar with the structure of American high schools.
Section 3 is used to outline the 86 school-level variables that constitute the dataset. Further information about some variables is provided in the appendices. The paper ends with a brief summary.
3. Dataset Description
There are 86 variables in this dataset. Variables that measure a similar component of the school are color-coded for convenience and grouped together. The following paragraphs in this section outline the variables that constitute this dataset.
The first three fields of the dataset, shown in gray, indicate the academic year (Field A), the school district (Field B), and the high school (Field C) that the rest of the data in each row correspond to. The data is organized by date in sequential order, from the oldest data (2014–2015 academic year) to the most recent data (2018–2019 academic year). Within each year, the data are arranged in the order they were recorded in the source records from which they were compiled. This generally means the schools are arranged in alphabetical order by parish (the administrative equivalent of a county in most of the US) and then further alphabetized by school name within each parish. However, the schools not under the jurisdiction of a parish school board were often recorded in a random order from year to year. These schools are found at the bottom of the list in each academic year.
The next nineteen fields of the dataset, shown in yellow, measure the number and rate of dropouts in each high school during a given academic year. Field D represents the total number of dropouts in each school, while Field E represents the number of dropouts in the 7th and 8th grades combined, and Field F represents the number of dropouts from the 9th through 12th grades combined. The next seven variables represent the number of dropouts in the 7th grade (Field G), the 8th grade (Field H), the transitional 9th grade (Field I), the 9th grade (Field J), the 10th grade (Field K), the 11th grade (Field L), and the 12th grade (Field M), respectively. It is important to note that some schools do not have students enrolled in all of the grades listed above, often the 7th and 8th grades. Those data points are marked as not applicable (N/A) rather than zero.
The remaining nine variables represent the dropout rate in the 7th–12th grades combined (Field N), 9th–12th grades combined (Field O), the 7th grade (Field P), the 8th grade (Field Q), the transitional 9th grade (Field R), the 9th grade (Field S), the 10th grade (Field T), the 11th grade (Field U), and the 12th grade (Field V), respectively. As with the dropout numbers, the dropout rates are marked as not applicable (N/A), rather than zero, for grades at a school with no students enrolled.
It is important to note that sometimes a student will transfer from one school to another before or during a school year. Although such students have left their original school, they did so with permission and are not considered to be dropouts in these variables. Only students that leave school unapproved for any reason except death are considered to be dropouts. Unfortunately, detailed information about the number of transferred students entering and leaving each school is not made available to the public at this time for privacy concerns.
The next three fields of the dataset, shown in green, show the number of students enrolled in the school, as recorded on October 1st, roughly eight weeks after the start of the school year. Recording this count in October allows the interschool movement of students in the opening weeks of the academic year to settle down, as families confirm their residences for the year. Field W lists the total enrollment of all students in the school from pre-kindergarten through the 12th grade. Field X lists the combined enrollment of the 9th through 12th grades, while Field Y lists the combined enrollment of the 7th through 12th grades. Depending on the nature of the school, these variables can all be identical or unique. They would be the same for a large, urban high school with students only in the 9th through 12th grades, but they would be unique for a small, rural high school with students from kindergarten through the 12th grade.
The next twelve variables, also shown in green, relate to the racial and socioeconomic demographics of the total student body enrolled in each school. Field Z is the percentage of female students, while Field AA is the percentage of male students. The following six variables represent the percentage of Native American students (Field AB), Asian students (Field AC), Black students (Field AD), Hispanic students (Field AE), Hawaiian and Pacific Islander students (Field AF), and White students (Field AG), respectively. Field AH represents the percentage of multiracial students (exclusive of all racial groups listed above). Field AI represents the percentage of minority students (a combination of all non-White student groups listed above). Field AJ represents the percentage of “At-Risk” students. Due to various factors, such students are considered to be Economically Disadvantaged by the state of Louisiana. See
Appendix A for further information about this. Field AK represents the percentage of students with limited English language proficiency (LEP), which, in Louisiana, are primarily students who speak Spanish or Vietnamese at home as their first language.
The next seven fields, shown in light blue, represent the number of students enrolled in the 7th grade (Field AL), the 8th grade (Field AM), the transitional 9th grade (Field AN), the 9th grade (Field AO), the 10th grade (Field AP), the 11th grade (Field AQ), and the 12th grade (Field AR), respectively.
The next three fields of the dataset, shown in green, further cover the grade composition and nature of each school. Field AS represents the percentage of 9th graders in the school who were in the transitional 9th grade (T9). See
Appendix B for more information about the T9 program. Field AT represents the starting grade of the school during that academic year, from the 1st grade at the earliest to the 12th grade at the latest. For simplicity, schools that start with kindergarten or earlier are recorded as starting in the 1st grade, while schools that start with the transitional 9th grade are recorded as starting in the 9th grade. Field AU denotes the school type for comparative purposes.
The next four fields of the dataset, shown in blue, relate to class sizes in each school. These variables are recorded as the percentage of classes with 20 students or less (Field AV), classes between 21 and 26 students (Field AW), classes between 27 and 33 students (Field AX), and classes with 34 or more students (Field AY).
The next two fields of the dataset, shown in orange, relate to the suspension rates at each school. Field AZ represents the percentage of students issued an in-school suspension, while Field BA represents the percentage of students issued an out-of-school suspension. A suspension is a disciplinary measure in American schools issued for serious offenses. An in-school suspension might be issued for an infraction such as cheating and restricts the privileges of the student while they are still required to attend school. An out-of-school suspension prohibits the student from attending school for a brief time and might be issued for an infraction such as fighting.
The next two fields of the dataset, shown in red, represent the attendance rates and truancy rates at each school across the entire student body. Field BB represents the percentage of students attending class each day, while Field BC represents the percentage of students who had one or more unexcused absences during the school year (truants). Schools want attendance rates to be high and truancy rates to be low.
The next six fields of the dataset, shown in purple, represent retention rates. These variables are the percentage of students held back from completing the 7th Grade (Field BD), the 8th Grade (Field BE), the 9th Grade (Field BF), the 10th Grade (Field BG), the 11th Grade (Field BH), and the 12th Grade (Field BI), respectively. This is a case where differences in terminology are very important. From the college perspective, retention is a good thing, as it refers to keeping college students at the university and on the path to graduating. In the context of high school students, however, retention means something quite different. When a high school student does not fulfill the requirements to advance to the next grade by the end of the academic year, they are retained, meaning they must repeat that grade again the next year. High schools want their retention rates to be as low as possible. Sometimes, schools get into trouble for passing students onto the next grade level who have not actually met the proper requirements, even falsifying test scores. A few schools have been closed down for doing such things, such as John F. Kennedy High School and Tallulah Charter School in this dataset.
Field BJ, shown in pink, is the average ACT (American College Test) score of students at a high school. Though far from perfect, the ACT is a general standard assessment of how prepared a high school student is for college, and it is taken by millions of American students each year. It tests reading comprehension, as well as basic knowledge of mathematics (algebra, geometry, and trigonometry) and English that students should have learned by the 11th grade. Scores range from 1 to 36, and the average ACT score for students in this study is 19.0.
The next four variables, shown in green, relate to the expense per student and staff salaries at each school. Field BK shows the average of US dollars spent per student across the entire student body. Field BL shows the average teacher salary, while Field BM shows the average salary of administrators. Field BN shows the average salary across all school staff, including those who are neither teachers nor administrators.
The next seven fields, shown in blue, relate to the teachers at each school. The first five variables cover the highest educational attainment of the teachers. They represent the percentage of teachers: with less than a Bachelor’s degree (Field BO), with a Bachelor’s degree (Field BP), with a Master’s degree (Field BQ), who are Education Specialists (Field BR), and with a Doctoral degree (Field BS). Field BT represents the average teacher experience in years. Field BU represents the percentage of staff who are teachers.
The next seven fields, shown in red, relate to the administrators at each school. The first five variables cover the highest educational attainment of the administrators. They represent the percentage of administrators: with less than a Bachelor’s degree (Field BV), with a Bachelor’s degree (Field BW), with a Master’s degree (Field BX), who are Education Specialists (Field BY), and with a Doctoral degree (Field BZ). Field CA represents the average administrator experience in years. Field CB represents the percentage of staff who are administrators.
The final six fields, shown in green, relate to the entire staff at each school, including those who are neither teachers nor administrators. The first five variables cover the highest educational attainment of the staff. They represent the percentage of staff: with less than a Bachelor’s degree (Field CC), with a Bachelor’s degree (Field CD), with a Master’s degree (Field CE), who are Education Specialists (Field CF), and with a Doctoral degree (Field CG). Field CH represents the average staff experience in years.