Когда я пытаюсь запустить TSFreshes extract_features
в блокнотах Microsoft Fabric, я продолжаю сталкиваться с одним и тем же набором ошибок.
Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 0%| | 0/20 [00:00<?, ?it/s]2024-03-26:07:49:13,37 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,36 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,95 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,103 ERROR [synapse_mlflow_utils.py:348] 'c'
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 345, in set_envs
config = MLConfig(sc)
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 128, in __init__
self.env_configs = self.get_mlflow_configs()
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 163, in get_mlflow_configs
region = self._get_spark_config("spark.cluster.region")
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 135, in _get_spark_config
value = self.sc.getConf().get(key, "")
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 2375, in getConf
conf.setAll(self._conf.getAll())
File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in getAll
return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in <listcomp>
return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py", line 342, in get_return_value
return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
KeyError: 'c'
2024-03-26:07:49:13,192 ERROR [synapse_mlflow_utils.py:349] ## Not In PBI Synapse Platform ##
2024-03-26:07:49:13,336 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,341 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,342 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,344 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,346 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,347 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,350 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,351 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,357 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,348 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,360 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,361 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,364 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,343 ERROR [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,371 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 WARNING [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 ERROR [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,372 ERROR [tracking_store.py:67] get_host_credentials fatal error
Последние несколько ошибок продолжают повторяться, если я просто позволяю программе работать.
Он также продолжает ссылаться на MLFlow, пакет, который, как я знаю, интегрирован в блокноты Fabric, но я не звоню активно. Я пытался использовать set_mlflow_env_config
, как сказано в ошибке, но не нашел ничего подобного в документации.
Пример кода ниже создает мою точную проблему (из: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html)
import pandas as pd
import numpy as np
import tsfresh
from tsfresh import extract_features
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute
# Example Dataset from Tsfresh
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
download_robot_execution_failures()
timeseries, y = load_robot_execution_failures()
#Extract features, in the style of the documentation: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html
extracted_features = extract_features(timeseries, column_id = "id", column_sort = "time")
impute(extracted_features)
features_filtered = select_features(extracted_features, y)
Как мне решить эту проблему и не допустить вмешательства MLFlow?
Я также уже пробовал импортировать и проводить эксперимент с MLFlow, чтобы посмотреть, решит ли это проблему, но это не помогло. Он создал множество моделей машинного обучения с данными, которые я не мог отследить. Он все еще не извлек ни одной из моих функций.
На данный момент я предполагаю, что TSFresh использует SKlearn или что-то подобное, чтобы соответствовать своим функциям, которые, по мнению MLFlow, он должен отслеживать.
🤔 А знаете ли вы, что...
Python популярен в анализе данных и машинном обучении с помощью библиотеки scikit-learn.
Как оказалось, этот код, по всей видимости, просто запускается. MLflow, выдающий постоянные ошибки, похоже, не влияет на способность TSFresh выполнять извлечение функций. Мой набор данных был настолько большим, что это заняло некоторое время, и все ошибки скрывали прогресс.
Однако, если вы хотите отключить mlflow (что я настоятельно рекомендую), сработает следующее (Источник):
import mlflow
mlflow.autolog(disable=True)