The Data SGP is a massive database project whose goal is to assemble and generate multi-proxy sedimentary geochemical data for every Paleozoic Epoch and roughly equivalent Neoproterozoic time slice. This effort has required the coordination of multiple institutions, the generation of new data, and the development of an array of analytical tools for interpretation.
The database has been constructed using a SQLite relational schema. The structure of the database has a very high degree of flexibility and can be easily modified to accommodate different data formats and queries. The resulting system is a powerful tool for analyzing the geological history of a region and can be used to identify the most likely candidate sites for further exploration.
As a result of the flexible database structure, the sgpData table can be easily expanded to include additional columns. This is important because the analysis of student growth requires many different variables to be included in a single query. Having to create several separate tables or databases to store all of these variables increases the complexity of the query and reduces the speed at which the results can be returned.
One key aspect of the database is the sgpData_INSTRUCTOR_NUMBER column, which provides the teacher information associated with each student test record. This is a very useful piece of information for interpreting students’ growth. For example, a student with a high assessment score could show low growth if the teacher who administered the test was not a good match to that student’s academic needs. Conversely, a student with a low assessment score could show very high growth if the teacher who administered the test had a strong academic connection to the student.
A second key aspect of the sgpData table is the sgpData_PERCENTILES column, which contains the percentiles for each content area and year. These percentiles are the basis for comparing students’ growth to their peers. Consequently, it is very important that these percentiles be accurate and that they be calculated in a consistent manner across years. In addition to evaluating students’ growth over time, the percentiles can also be used to identify specific teachers and schools that may require additional assistance with their instruction.
While the term “big data” has become a buzzword in business and science, the amount of data involved in Data SGP is relatively small when compared to analyses of global Facebook interactions, for instance. Nevertheless, it is still large enough to require a substantial investment in hardware and software to store and manage the data.
Depending on the type of analysis to be conducted, it may be easier for users to format the data in either WIDE or LONG data formats. The lower level functions, studentGrowthPercentiles and studentGrowthProjections, use the WIDE data format, while the higher level wrapper functions (studentGrowthMeasures, studentGrowthIndices) utilize the LONG data format. Generally speaking, it is best to format in LONG for all but the simplest, one-off analyses. This is because LONG offers numerous preparation and storage benefits over the WIDE format.