Further data collection (satellite imagery)


The main chunk of work for second term will involve incorporating satellite imagery into the foundational model to allow it to react to cloud coverage.


Action points#

  • Over Christmas break I worked on accessing the data for satellite imagery
    • This involved numerous calls and e-mails due to the lack of up-to-date documentation


  • My term 2 gantt chart is still relevant, but I will be ahead of schedule as I did not assume that I would get the satellite imagery over Christmas
  • I’ll have more time to improve the foundational model, as well as study different ways in collecting the data for the foundational model
  • I’ll also have more time to develop the model with satellite imagery
  • If I remain ahead of schedule I will then look into further improvements to the model
    • Looking at data regarding aerosols
    • Not dealing with live data since this will be a major challenge with little reward for the report


Action points#

  • Deciding which satellite datasets to use from the ones I have available
  • Working on finely selecting the data to access around the key GSP I’m considering
  • E-mailed for support
  • Made a kanban board for coding tasks to keep better track of them


Action points#

  • Refined selection of data
  • Collecting multiple time samples at once
  • Started looking into the model to use for just satellite -> pv


  • Model is giving nan which likely means there is missing/erroneous data
    • More pre-processing required


Action points#

  • Additional pre-processing

  • Model runs

  • Actual output vs prediction (Model results 1) depicts:

    • Predicts generation during nightime since the clouds are necessarily different, but the model is unaware of the sun
      • We can deal with this for this model, but if we merge with NWP that will be handled automatically
    • The predictions are massively off, but there is a general correspondance for each day which is promising
  • Using less data seems to get the near 0 pv generation (Model results 2)

    • There may just be an issue with some of the data and how it matches up to the pv generation data
  • Remember that I may end up overfitting because I haven’t split the dataset into training/validation

    • For now just trying to find problematic data
  • This is what problematic data does, typically we have 30 min intervals, but if there is missing data then we will skip further ahead

  • This shifts the entire timings so that all future data doesn’t temporally match

  • Removed times that don’t have ‘00’ or ‘30’ in the minutes column


  • Continue working on pre-processing
  • Will have to check in the end that all the data is temporally matching


Action points#

  • 5000 images - model seems fine except around july where additional small peaks are predicted between two consecutive days
  • 7500 images - prediction is shifted up by ~5MW
  • 6000 images - still shifted
  • 5500 images - prediction is shifted down by ~2MW
  • 5250 images - prediction is shifted up by ~1.8MW
  • 5100 images - model fine, peaks start from around sample 3900
  • 5175 images - prediction is shifted down minimally
  • 5150 images - prediction shifted up
  • 5125 images - prediction shifted up
  • 5110 images - prediction shifted up
  • 5105 images - tiny shift
  • 5103 images - large shift ~ 4.5MW
  • 5012 images - shift below
  • 5011 images - large shift ~ 4.5MW
  • Nevermind 5100 is shifted

4000 fine, 4500 bad, 4250 fine, 4400 small shift

  • There doesn’t seem to be an issue with the satellite data in terms of the timing, so it may be the content of the image, or from the gsp data?


Action points#

  • Satellite data - timing correct but jumps from 16th to 18th @ index 739
  • Automatically detecting further discrepencies


Action points#

  • (Missed a few days of logging progress)
  • All data collection complete
  • Refactored repository for improved code readability
  • Attempting to produce a hybrid model


Action points#

  • Still working on the hybrid model
  • Looking into how the CNN works for satellite images


  • Perhaps the data can be merged before the models?


Action points#

  • Just trying to use the nwp data in the hybrid file isn’t giving expected results
  • Satellite was also shifted up so going to need to check for NaNs again
  • Warning skipping variable loading…?


Action points#

  • Training and testing data split by months. Testing periodically over the year to get representations of the seasons
  • NWP and sat in single file with hybrid to avoid loading models - positive results so far


Action points#

  • Drew approximate grid system over UK to replicate NSDRB satellite resolution
  • Collected additional points for surrounding grids
    • Issue with not all additional points giving a new CSV file