Project 1 Sprint 3

Summary: In this sprint3 of your first project, You will read data out of an excel file to add to your database

Tools:

We will continue to use the same version control, continuous integration tools and programming language requirements that we put together for parts one & two.

Sprint 3: Getting some additional data:

Due: Wednesday Feb 24th at 11:59pm However as per Slack submissions will be accepted as on time until 11:59pm Sun Feb 28

Use the same project that you used last time. I will just do a git pull on the project and get the updates.
Get the excel file for wage and job data by state. Either get the local copy that I downloaded here. Or get the original online, you want the state by state data. The most recent data is from May 2019, it seems to come out each May and I'm guessing they didn't post it last year because the Covid effects would be too much of an issue - either that or it take a really long time to get the data out.
Create at least one more table in your database with state by state employment data (be able to store the data from step 4 below)
Get the following data from the excel sheet and store it in your database

for each state (I don't care if you include territories or not) get every major employment category and record (as a DB row)

the state
the occupation major title
the total employment in that field in that state
the 25th percentile salary (lets assume that most college grads earn in the lower 25%) for that field both hourly and annual
the total employment in that state for that field oops typo, It should be the Occupation code instead

Write automated tests

write tests to assure your method to read from the xlsx file works properly

eg, make sure you get data from all 50 states from the original data
maybe create a test xlsx with a limited number of major occupational groups along with some other stuff. make sure you get the right number of major occupational groups

write a test to make sure the new table is there
write a test to make sure the old table is still there if test is still not there from previous sprint
write a test to make sure that your new write to table works (unless you found a way to make the old write to table work and the old test covers it all.

make sure your DevOps stuff all works (tests are run on github and formatting checks still work)
update your readme and requirements (or equivalent go.mod or gradle or whatever you need for your language.)
commit and push often so you lose nothing if your computer dies.

Some hints for working with xcel files:

If you are working with python I've had really good luck with openpyxcel (https://realpython.com/openpyxl-excel-spreadsheets-python/) though if you have lots of experience with pandas that library can also easily read in excel files.
If you are working with Go I've used excelize and enjoyed working with it: https://github.com/avelino/awesome-go#microsoft-excel I haven't tried the other options there
If you are using java, I've never read in excel files with java, but the internet all seems to agree that Apache POI is the way to go (https://poi.apache.org/)