金风风速预测 ============ 一、业务理解 ~~~~~~~~~~~~ **风速预测:** 风力发电直接接入国家电网,需要准确预报发电功率以保持电压平稳、\ **减少能源浪费**\ ,且预报偏差大会受到财务\ **处罚** .. image:: output_3_0.jpg :width: 320px 二、数据理解 ~~~~~~~~~~~~ 1. 历史风场平均风速观测记录 - 广西一年多的数据,数据间隔15分钟 - 新疆四年多的数据,数据间隔15分钟 2. 风场气象数值预报 - 广西和新疆 - 各4份不同的气象源数据 - 以“广西EC气象源“数据为例,共有67105行数据,每条数据共有包含时间戳的36个特征 .. code:: python import numpy as np import matplotlib.pyplot as plot import pandas as pd df = pd.read_csv("/Users/phamourair/devyhu/anylearn-resources/datasets/jinfeng_guangxi/wind_guangxi.csv", index_col=0) df.plot(title="GuangXi wind speed", figsize=(16, 5)) .. parsed-literal:: .. image:: output_5_1.png 三、数据准备 ~~~~~~~~~~~~ 1. 数据格式处理,调整时间戳格式为标准时序算法库读入格式。创建对应csv文件。该csv文件为目前时序预测使用的主要文件。 2. 数据清洗,删去csv文件中数值为空的记录。在保证时序数据严格排列的基础上,尽量使数据等间隔排列。 3. 数据合并,将各地风速记录和气象源数据合并起来,在保证时间戳绝对一致且严格排序的基础上,尽可能保留多的等间隔数据。此外,由于同一地方4个预测的气象源数据之间存在出入,我们尝试首先将其4个预测值求平均然和与风速进行合并并生成最终的数据csv文件。(该文件目前暂未使用) 四、模型建立 ~~~~~~~~~~~~ tranformer时序预测 \* 本地无封装的生算法 \* 本地数据集 .. image:: output_8_0.jpg 1 - 初始化SDK与Anylearn后端引擎连接 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: python from anylearn.config import init_sdk init_sdk('http://anylearn.back', 'user', 'pass123') 2 - 调用SDK快速训练 ^^^^^^^^^^^^^^^^^^^ .. code:: python from anylearn.applications.quickstart import quick_train train_task, algo, dset, project = quick_train( algorithm_dir="/path/to/algo/transformer", dataset_dir="/path/to/datasets/jinfeng_guangxi", entrypoint="python -u main_informer.py", output="output_model", dataset_hyperparam_name="data_path", hyperparams={ 'is_training': '1', 'root_path': '.', 'data_path': 'wind_guangxi.csv', 'model_id': 'wind_guangxi_96', 'model': 'transformer', 'data': 'custom', 'features': 'M', 'seq_len': '336', 'label_len': '198', 'pred_len': '96', 'enc_in': ' 1', 'dec_in': ' 1', 'c_out': ' 1', 'e_layers': '3', 'd_layers': '2', 'attn': 'prob', 'des': 'Exp', 'itr': '1', }, ) train_task .. parsed-literal:: TrainTask(name='7m3lt5u4', description='', state=0, visibility=1, creator_id='USERfb6c6d2111eaadda13fd17feeac7', owner=['USERfb6c6d2111eaadda13fd17feeac7'], project_id=None, algorithm_id='ALGO6f46a33311eb872cb22c75dd00d0', train_params='{"data_path": "wind_guangxi.csv", "is_training": "1", "root_path": ".", "model_id": "wind_guangxi_96", "model": "transformer", "data": "custom", "features": "M", "seq_len": "336", "label_len": "198", "pred_len": "96", "enc_in": " 1", "dec_in": " 1", "c_out": " 1", "e_layers": "3", "d_layers": "2", "attn": "prob", "des": "Exp", "itr": "1"}', files='DSET4caea33311eb872cb22c75dd00d0', results_id='FILE5e38a33311eb872cb22c75dd00d0', secret_key='TKEYb77ea33311eb872cb22c75dd00d0', create_time='2021-04-22 14:23:48', finish_time='', envs='', hpo=False, hpo_search_space=None, final_metric=None, id='TRAI3346a33311eb872cb22c75dd00d0') 3 - 跟踪训练进度 ^^^^^^^^^^^^^^^^ .. code:: python import time status = train_task.get_status() while 'state' not in status: print("Waiting...") time.sleep(120) status = train_task.get_status() while status['state'] not in ["success", "fail"]: if 'process' in status: print(f"Progress: {int(100 * float(status['process']))}%") else: print(status['state']) try: print(f"-- last metric: {train_task.get_intermediate_metric()}") except: print(f"-- no metric yet") time.sleep(300) status = train_task.get_status() print(status) print(f"Final metric: {train_task.get_final_metric()}") status['state'] .. parsed-literal:: Waiting... working -- no metric yet {'ip': '10.244.2.245', 'secret_key': 'TKEYb77ea33311eb872cb22c75dd00d0', 'state': 'working'} working -- last metric: {'id': 'METRcbfea33511eb872cb22c75dd00d0', 'metric': -1.5579650402069092, 'train_task_id': 'TRAI3346a33311eb872cb22c75dd00d0'} {'ip': '10.244.2.245', 'secret_key': 'TKEYb77ea33311eb872cb22c75dd00d0', 'state': 'working'} working -- last metric: {'id': 'METRcbfea33511eb872cb22c75dd00d0', 'metric': -1.5579650402069092, 'train_task_id': 'TRAI3346a33311eb872cb22c75dd00d0'} {'ip': '10.244.2.245', 'secret_key': 'TKEYb77ea33311eb872cb22c75dd00d0', 'state': 'working'} working -- last metric: {'id': 'METRd33ea33611eb872cb22c75dd00d0', 'metric': -1.646214485168457, 'train_task_id': 'TRAI3346a33311eb872cb22c75dd00d0'} {'ip': '10.244.2.245', 'secret_key': 'TKEYb77ea33311eb872cb22c75dd00d0', 'state': 'success'} Final metric: {'final_metric': -1.6970630884170532, 'id': 'TRAI3346a33311eb872cb22c75dd00d0', 'name': '7m3lt5u4'} .. parsed-literal:: 'success' 五、模型验证 ~~~~~~~~~~~~ 1 - 导出训练结果 ^^^^^^^^^^^^^^^^ .. code:: python train_task.get_detail() from pathlib import Path import shutil from anylearn.interfaces.resource import SyncResourceDownloader workdir = Path("/path/to/models/") downloader = SyncResourceDownloader() res = train_task.download_results(save_path=workdir, downloader=downloader) shutil.unpack_archive(workdir / res, workdir / train_task.id, format="zip") 2 - 调用SDK快速验证 ^^^^^^^^^^^^^^^^^^^ 已失效。版本0.13.0中移除了验证任务相关的所有逻辑。 3 - 跟踪验证结果 ^^^^^^^^^^^^^^^^ 已失效。版本0.13.0中移除了验证任务相关的所有逻辑。 4 - 样例预测结果 (1) ^^^^^^^^^^^^^^^^^^^^ .. code:: python truth = np.load(workdir / train_task.id / "output_model/results/true.npy") pred = np.load(workdir / train_task.id / "output_model/results/pred.npy") df_t = pd.DataFrame(truth[0]) df_p = pd.DataFrame(pred[0]) print(df_t.shape, df_p.shape) df = pd.concat({"Grand Truth": df_t, "Prediction": df_p}, axis=1) df.plot() .. parsed-literal:: (96, 1) (96, 1) .. parsed-literal:: .. image:: output_23_2.png 5 - 样例预测结果 (2 静态图) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. image:: output_25_0.png :width: 360px 六、模型部署 ~~~~~~~~~~~~ *暂未支持非标准无封装算法/模型的在线服务部署,持续设计与开发中……*