Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector cant'f find reference data in 0.4.39 #1369

Open
versus666jzx opened this issue Nov 19, 2024 · 1 comment
Open

Collector cant'f find reference data in 0.4.39 #1369

versus666jzx opened this issue Nov 19, 2024 · 1 comment

Comments

@versus666jzx
Copy link

The collector creates a reference data, but does not find it when I send data to it and, accordingly, the creation of the test suite or report crashes.

Version of Evidently UI, Collector and python package 0.4.39.
For example in version 0.4.25 is works correctly.

  • Docker compose to up UI and Collector:
version: "3.9"

services:
  ui:
    image: evidently/evidently-service:0.4.39
    ports:
      - 8880:8000
    command: ["--workspace", "/data", "--host", "0.0.0.0"]
    volumes:
      - evidently_data:/data

  collector:
    image: evidently/evidently-service:0.4.39
    entrypoint: ["evidently", "collector", "--host", "0.0.0.0"]
    command: ["--config-path", "/config/collector.json"]
    ports:
      - 8001:8001
    links:
      - ui
    volumes:
      - collector_data:/data
      - ./config:/config

volumes:
  evidently_data:
  collector_data:
  • Collector config after client.set_reference(COLLECTOR_TEST_ID, reference_data). Here we see that the path to the reference is set: "reference_path":"default_test_reference.parquet"
{"id":"default_test","trigger":{"type":"evidently:collector_trigger:IntervalTrigger","interval":5.0,"last_triggered":1732020013.5912094},"report_config":{"metrics":[],"tests":[{"lt":0.3,"type":"evidently:test:TestShareOfDriftedColumns","is_critical":true,"feature_importance":false},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V8","display_name":"V8","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V2","display_name":"V2","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V14","display_name":"V14","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V7","display_name":"V7","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V11","display_name":"V11","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V15","display_name":"V15","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V5","display_name":"V5","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V4","display_name":"V4","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V9","display_name":"V9","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"Class","display_name":"Class","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V10","display_name":"V10","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V16","display_name":"V16","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V3","display_name":"V3","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V1","display_name":"V1","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V13","display_name":"V13","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V12","display_name":"V12","dataset":"main"},"stattest":null,"stattest_threshold":null},{"type":"evidently:test:TestColumnDrift","is_critical":true,"column_name":{"type":"evidently:base:ColumnName","name":"V6","display_name":"V6","dataset":"main"},"stattest":null,"stattest_threshold":null}],"options":{"color":null,"render":null,"data_definition":null},"metadata":{"test_presets":["DataDriftTestPreset"]},"tags":[]},"reference_path":"default_test_reference.parquet","project_id":"0dcbd8a2-1f7c-45a6-a81e-8144e81fdff2","api_url":"http://ui:8000/","api_secret":null,"cache_reference":true,"is_cloud":null,"save_datasets":false}

And parquet file exist in container
image

  • Collector log after send data

evidently-collector-1  | INFO:     Started server process [1]
evidently-collector-1  | INFO:     Waiting for application startup.
evidently-collector-1  | INFO:     Application startup complete.
evidently-collector-1  | INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
evidently-collector-1  | INFO:     172.25.0.1:57866 - "GET /default_test HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57866 - "GET / HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57866 - "GET / HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57908 - "GET / HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57916 - "POST /default_test HTTP/1.1" 201 Created
evidently-collector-1  | INFO:     172.25.0.1:57912 - "GET / HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57912 - "GET /default_test HTTP/1.1" 200 OK
evidently-collector-1  | INFO:     172.25.0.1:57912 - "GET /default_test HTTP/1.1" 200 OK
evidently-collector-1  | INFO:     172.25.0.1:57924 - "POST /default_test/reference HTTP/1.1" 201 Created
evidently-collector-1  | INFO:     172.25.0.1:57928 - "GET /default_test HTTP/1.1" 200 OK
evidently-collector-1  | INFO:     172.25.0.1:57936 - "GET /favicon.ico HTTP/1.1" 404 Not Found
evidently-collector-1  | INFO:     172.25.0.1:57982 - "POST /default_test/data HTTP/1.1" 201 Created
evidently-collector-1  | ERROR - 2024-11-19 12:43:31,656 - evidently.collector.app - app - Error running report: [Errno 2] No such file or directory: 'default_test_reference.parquet'
evidently-collector-1  | Traceback (most recent call last):
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/app.py", line 145, in create_snapshot
evidently-collector-1  |     report.run, reference_data=collector.reference, current_data=current, column_mapping=ColumnMapping()
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/config.py", line 184, in reference
evidently-collector-1  |     self._reference = self._read_reference()
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/config.py", line 174, in _read_reference
evidently-collector-1  |     return pd.read_parquet(self.reference_path)
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 667, in read_parquet
evidently-collector-1  |     return impl.read(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 267, in read
evidently-collector-1  |     path_or_handle, handles, filesystem = _get_path_or_handle(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 140, in _get_path_or_handle
evidently-collector-1  |     handles = get_handle(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 882, in get_handle
evidently-collector-1  |     handle = open(handle, ioargs.mode)
evidently-collector-1  | FileNotFoundError: [Errno 2] No such file or directory: 'default_test_reference.parquet'
evidently-collector-1  | ERROR - 2024-11-19 12:43:31,656 - evidently.collector.app - app - Check snapshots factory error: 'TestSuite' object has no attribute 'id'
evidently-collector-1  | Traceback (most recent call last):
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/app.py", line 145, in create_snapshot
evidently-collector-1  |     report.run, reference_data=collector.reference, current_data=current, column_mapping=ColumnMapping()
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/config.py", line 184, in reference
evidently-collector-1  |     self._reference = self._read_reference()
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/config.py", line 174, in _read_reference
evidently-collector-1  |     return pd.read_parquet(self.reference_path)
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 667, in read_parquet
evidently-collector-1  |     return impl.read(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 267, in read
evidently-collector-1  |     path_or_handle, handles, filesystem = _get_path_or_handle(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/parquet.py", line 140, in _get_path_or_handle
evidently-collector-1  |     handles = get_handle(
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 882, in get_handle
evidently-collector-1  |     handle = open(handle, ioargs.mode)
evidently-collector-1  | FileNotFoundError: [Errno 2] No such file or directory: 'default_test_reference.parquet'
evidently-collector-1  |
evidently-collector-1  |
evidently-collector-1  | During handling of the above exception, another exception occurred:
evidently-collector-1  |
evidently-collector-1  |
evidently-collector-1  | Traceback (most recent call last):
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/app.py", line 232, in check_service_snapshots_periodically
evidently-collector-1  |     await check_snapshots_factory(service, service.storage)
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/app.py", line 131, in check_snapshots_factory
evidently-collector-1  |     await create_snapshot(collector, storage)
evidently-collector-1  |   File "/usr/local/lib/python3.10/site-packages/evidently/collector/app.py", line 153, in create_snapshot
evidently-collector-1  |     report_id=str(report.id),
evidently-collector-1  | AttributeError: 'TestSuite' object has no attribute 'id'
  • Code to reproduce
import time

from requests.exceptions import RequestException
from sklearn import datasets

from evidently.collector.client import CollectorClient
from evidently.collector.config import CollectorConfig, IntervalTrigger, ReportConfig

from evidently.test_suite import TestSuite
from evidently.test_preset import DataDriftTestPreset

from evidently.ui.dashboards import DashboardPanelTestSuite 
from evidently.ui.dashboards import ReportFilter
from evidently.ui.dashboards import TestSuitePanelType
from evidently.renderers.html_widgets import WidgetSize


COLLECTOR_TEST_ID = "default_test"

PROJECT_NAME = "Bank Marketing: online service "
WORKSACE_PATH = "bank_data"

client = CollectorClient("http://localhost:8001")
workspace = RemoteWorkspace("http://localhost:8880")
project = workspace.create_project(PROJECT_NAME)


bank_marketing = datasets.fetch_openml(name="bank-marketing", as_frame="auto")
bank_marketing_data = bank_marketing.frame
reference_data = bank_marketing_data[5000:5500]
prod_simulation_data = bank_marketing_data[7000:]
mini_batch_size = 50

def setup_test_suite():
	suite = TestSuite(tests=[DataDriftTestPreset()], tags=[])
	suite.run(reference_data=reference_data, current_data=prod_simulation_data[:mini_batch_size])
	return ReportConfig.from_test_suite(suite)

def workspace_setup():
    
	project.dashboard.add_panel(
		DashboardPanelTestSuite(
			title="Data Drift Tests",
			filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
			size=WidgetSize.HALF
		)
	)
	project.dashboard.add_panel(
		DashboardPanelTestSuite(
			title="Data Drift Tests",
			filter=ReportFilter(metadata_values={}, tag_values=[], include_test_suites=True),
			size=WidgetSize.HALF,
			panel_type=TestSuitePanelType.DETAILED
		)
	)
	project.save()

def setup_config():
	test_conf = CollectorConfig(
        trigger=IntervalTrigger(interval=5),
        report_config=setup_test_suite(), 
        project_id=str(project.id),
        api_url="http://ui:8000/"
    )

	client.create_collector(COLLECTOR_TEST_ID, test_conf)
	client.set_reference(COLLECTOR_TEST_ID, reference_data)

def send_data():
	print("Start sending data")
	for i in range(1):
		try:
			data = prod_simulation_data[i * mini_batch_size : (i + 1) * mini_batch_size]
			client.send_data(COLLECTOR_TEST_ID, data)
			print("sent")
		except RequestException as e:
			print(f"collector service is not available: {e.__class__.__name__}")
		time.sleep(1)

def main():
    workspace_setup()
    setup_config()
    send_data()
@Mohammadabd
Copy link

add this to your compose for collector:

working_dir: /config

Evidently container starts from /app and file paths for reference are set locally. maybe docker image can be fixed @emeli-dral

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants