Initial data collection

Motivation#

This is the first task outlined on the Gantt chart following the project specification submission. This task involves collecting, formatting, and cleaning the data necessary for the foundational model.

16/10/23#

Action points#

  • Selected GSP ABHA1, Latitude: 50.47108, Longitude: -3.72977, PVLive ID: 1
  • Collected NSRDB data with corresponding coordinates, temporal accuracy of 30 minutes
  • Excel data actually says 50.49, -3.74
  • NSRDB data will be split 80:10:10 for training, validating, and testing
    • 17520 total data entries
    • 14,016 for training
    • 1,752 each for validating and testing
    • Dates are: 2019-01-01 00:00 to 2019-12-31 @ 23:30
  • Getting PV data for corresponding GSP between date and time specified
    • The time ordering of the data is inconsistent, but can be sorted with a simple A to Z Excel sort

TODO#

  • Get tensorflow locally
  • See how to provide test data to tensorflow model, and how to add labels
    • Do I want the NSRDB and PVLive in separate or combined CSV?

17/10/23#

Action points#

  • Downloaded tensorflow locally
    • Will my laptop be powerful enough? Should I use DCS instead? I never mentioned this in my specification but it was considered. I don’t think for the foundational model it will be required but once Satellite imagery is involved it may become more intensive
  • The data can stay in separate files then be accessed separately within the code
    • It can then be merged easily in code if necessary
  • Data formatted and is accessible from the code for use in the model
  • Trying to find how to provide the test data which is leading to looking into ANNs
    • Perceptron models -> receive multiple numerical inputs to produce a single numerical output (output is binary)

18/10/23#

Action points#

  • Since I’ve got all the data I’m slightly ahead of schedule and am looking into models and trying to build the foundational model
  • First attempt with mean squared error loss of nan, accuracy increases slightly once then maintains constant value so errors
    • Since I had to gather sections of the data in the code, the ordering is different again so times aren’t matching up
  • Sort data within the code so its always consistent
    • Data now looks correct and is temporally matching
  • Same issue with the model itself

TODO#

  • Potentially normalise the values for input and labels
    • NSRDB currently has large ranges and for each measurement the range is different
    • PVLive has large range between ~50 to e-5
  • Which activation function should I be using, since sigmoid is for classification so that doesn’t apply here
    • I think it is linear

19/10/23#

  • Attended webinar
    • Not too applicable, interesting to hear about how the system works in California
  • Normalised the data
    • Still not improved the model
  • I think the abundance of all zero rows is causing issues
    • I need to remove all 0s rows between both NSRDB and PVLive

TODO#

  • Correct bug for removing 0 rows