Skip to content

环境说明

shell
[root@keeper01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

Python环境准备

下载&安装脚本

shell
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --no-check-certificate
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
# 接下来按照操作提示进行,可以另外指令安装路径

配置环境变量

修改/etc/profile.d/env.sh文件,内容如下

shell
export CONDA_HOME=/root/miniconda3
export PATH=$PATH:$CONDA_HOME/bin

保存完成执行如下:

shell
source /etc/profile
# 预期输出 /root/miniconda3/bin/conda
which conda

配置pypi环境

shell
conda config --set auto_activate_base false
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda config --set show_channel_urls yes

# 验证配置
[root@keeper01 superset]# cat ~/.condarc
channels:
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
  - defaults
show_channel_urls: true
auto_activate_base: false

pypi安装Superset

创建python环境

shell
mkdir -p ~/superset && cd ~/superset

conda create -n superset python=3.10
conda init
conda activate superset
# 预期是Python 3.10.x
python3 -V

(superset) [root@keeper01 superset]# python3 -V
Python 3.10.14

⚠️ 【问题】执行conda activate superset 出现错误:CondaError: Run 'conda init' before 'conda activate'

【解决】在执行了conda init,但是还是会报一样的错误。需要退出终端再进入就好了。

安装基础依赖

shell
yum install -y python-setuptools

yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel

安装SuperSet

使用pip安装

shell
pip install apache-superset -i https://mirrors.huaweicloud.com/repository/pypi/simple

最后会打印如下日志,就表示安装成功:

shell
Successfully installed Babel-2.15.0 Flask-Babel-2.0.0 Flask-JWT-Extended-4.6.0 Flask-Limiter-3.7.0 Flask-SQLAlchemy-2.5.1 Jinja2-3.1.4 Mako-1.3.5 MarkupSafe-2.1.5 PyJWT-2.8.0 Werkzeug-3.0.3 alembic-1.13.1 amqp-5.2.0 apache-superset-4.0.1 apispec-6.6.1 apsw-3.46.0.0 async-timeout-4.0.3 attrs-23.2.0 backoff-2.2.1 bcrypt-4.1.3 billiard-4.2.0 blinker-1.8.2 bottleneck-1.3.8 brotli-1.1.0 cachelib-0.9.0 cachetools-5.3.3 cattrs-23.2.3 celery-5.4.0 certifi-2024.2.2 cffi-1.16.0 charset-normalizer-3.3.2 click-8.1.7 click-didyoumean-0.3.1 click-option-group-0.5.6 click-plugins-1.1.1 click-repl-0.3.0 colorama-0.4.6 cron-descriptor-1.4.3 croniter-2.0.5 cryptography-42.0.7 deprecated-1.2.14 deprecation-2.1.0 dnspython-2.6.1 email-validator-2.1.1 exceptiongroup-1.2.1 flask-2.3.3 flask-appbuilder-4.5.0 flask-caching-2.3.0 flask-compress-1.15 flask-login-0.6.3 flask-migrate-3.1.0 flask-session-0.8.0 flask-talisman-1.1.0 flask-wtf-1.2.1 func_timeout-4.3.5 geographiclib-2.0 geopy-2.4.1 google-auth-2.29.0 greenlet-3.0.3 gunicorn-21.2.0 h11-0.14.0 hashids-1.3.1 holidays-0.25 humanize-4.9.0 idna-3.7 importlib-resources-6.4.0 importlib_metadata-7.1.0 isodate-0.6.1 itsdangerous-2.2.0 jsonschema-4.22.0 jsonschema-specifications-2023.12.1 kombu-5.3.7 korean-lunar-calendar-0.3.1 limits-3.12.0 llvmlite-0.42.0 markdown-3.6 markdown-it-py-3.0.0 marshmallow-3.21.2 marshmallow-sqlalchemy-0.28.2 mdurl-0.1.2 msgpack-1.0.8 msgspec-0.18.6 nh3-0.2.17 numba-0.59.1 numexpr-2.10.0 numpy-1.23.5 ordered-set-4.1.0 outcome-1.3.0.post0 packaging-24.0 pandas-2.0.3 paramiko-3.4.0 parsedatetime-2.6 pgsanity-0.2.9 platformdirs-4.2.2 polyline-2.0.2 prison-0.2.1 prompt-toolkit-3.0.43 pyarrow-14.0.2 pyasn1-0.6.0 pyasn1-modules-0.4.0 pycparser-2.22 pygments-2.18.0 pynacl-1.5.0 pyopenssl-24.1.0 pyparsing-3.1.2 pysocks-1.7.1 python-dateutil-2.9.0.post0 python-dotenv-1.0.1 python-geohash-0.8.5 pytz-2024.1 pyyaml-6.0.1 redis-4.6.0 referencing-0.35.1 requests-2.32.2 requests-cache-1.2.0 rich-13.7.1 rpds-py-0.18.1 rsa-4.9 selenium-4.9.1 shillelagh-1.2.19 shortid-0.1.2 simplejson-3.19.2 six-1.16.0 slack_sdk-3.27.2 sniffio-1.3.1 sortedcontainers-2.4.0 sqlalchemy-1.4.52 sqlalchemy-utils-0.38.3 sqlglot-23.17.0 sqlparse-0.4.4 sshtunnel-0.4.0 tabulate-0.8.10 trio-0.25.1 trio-websocket-0.11.1 typing-extensions-4.12.0 tzdata-2024.1 url-normalize-1.4.3 urllib3-2.2.1 vine-5.1.0 wcwidth-0.2.13 wrapt-1.16.0 wsproto-1.2.0 wtforms-3.1.2 wtforms-json-0.3.5 xlsxwriter-3.0.9 zipp-3.18.2 zstandard-0.22.0

验证

shell
(superset) [root@keeper01 superset]# which superset
/root/miniconda3/envs/superset/bin/superset

初始化Superset数据库

shell
superset db upgrade

小贴士

Superset采用的数据库是sqllite数据库,是一个轻量级的数据量。

错误及解决

📢 如果没有这一步,启动没有问题。但浏览器登录后访问会报错500,日志会提示sqlite3.OperationalError: no such table: user_attribute。参考:https://github.com/apache/superset/issues/7354

创建管理员用户

给superset创建管理员用户

shell
export FLASK_APP=superset
 export SUPERSET_SECRET_KEY="oh-so-secret"

在里面可以设置用户名与密码,其他的可以不用设置,直接enter就行

(superset) [root@keeper01 ~]# export FLASK_APP=superset
(superset) [root@keeper01 ~]# export SUPERSET_SECRET_KEY="oh-so-secret"
(superset) [root@keeper01 ~]# flask fab create-admin
Username [admin]: superset
User first name [admin]:
User last name [user]:
Email [admin@fab.org]:
Password:
Repeat for confirmation:
logging was configured successfully
2024-05-26 12:12:30,324:INFO:superset.utils.logging_configurator:logging was configured successfully
2024-05-26 12:12:30,339:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(
No PIL installation found
2024-05-26 12:12:30,784:INFO:superset.utils.screenshots:No PIL installation found
Recognized Database Authentications.
Admin User superset created.

错误及解决

【问题】Refusing to start due to insecure SECRET_KEY

【解决】 export SUPERSET_SECRET_KEY="oh-so-secret",参考 :Apache Superset: Refusing to start due to insecure SECRET_KEY

初始化

superset init

(superset) [root@keeper01 ~]# superset init
logging was configured successfully
2024-05-26 12:14:21,843:INFO:superset.utils.logging_configurator:logging was configured successfully
2024-05-26 12:14:21,850:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(
No PIL installation found
2024-05-26 12:14:22,101:INFO:superset.utils.screenshots:No PIL installation found
Syncing role definition
2024-05-26 12:14:27,503:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2024-05-26 12:14:27,652:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2024-05-26 12:14:27,668:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2024-05-26 12:14:27,867:INFO:superset.security.manager:Syncing Gamma perms
Syncing sql_lab perms
2024-05-26 12:14:28,064:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2024-05-26 12:14:28,245:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2024-05-26 12:14:28,252:INFO:superset.security.manager:Creating missing datasource permissions.
Traceback (most recent call last):
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: tables

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/superset/bin/superset", line 8, in <module>
    sys.exit(superset())
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 358, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 358, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/cli/main.py", line 65, in init
    security_manager.sync_role_definitions()
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/security/manager.py", line 891, in sync_role_definitions
    self.create_missing_perms()
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/security/manager.py", line 838, in create_missing_perms
    datasources = SqlaTable.get_all_datasources()
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/connectors/sqla/models.py", line 1954, in get_all_datasources
    return qry.all()
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2773, in all
    return self._iter().all()
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: tables
[SQL: SELECT tables.uuid AS tables_uuid, tables.created_on AS tables_created_on, tables.changed_on AS tables_changed_on, tables.id AS tables_id, tables.description AS tables_description, tables.default_endpoint AS tables_default_endpoint, tables.is_featured AS tables_is_featured, tables.filter_select_enabled AS tables_filter_select_enabled, tables."offset" AS tables_offset, tables.cache_timeout AS tables_cache_timeout, tables.params AS tables_params, tables.perm AS tables_perm, tables.schema_perm AS tables_schema_perm, tables.is_managed_externally AS tables_is_managed_externally, tables.external_url AS tables_external_url, tables.table_name AS tables_table_name, tables.main_dttm_col AS tables_main_dttm_col, tables.database_id AS tables_database_id, tables.fetch_values_predicate AS tables_fetch_values_predicate, tables.schema AS tables_schema, tables.sql AS tables_sql, tables.is_sqllab_view AS tables_is_sqllab_view, tables.template_params AS tables_template_params, tables.extra AS tables_extra, tables.normalize_columns AS tables_normalize_columns, tables.always_filter_main_dttm AS tables_always_filter_main_dttm, tables.created_by_fk AS tables_created_by_fk, tables.changed_by_fk AS tables_changed_by_fk
FROM tables
WHERE tables.is_sqllab_view = 0]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

启动Superset

安装gunicorn

shell
pip install gunicorn -i https://mirrors.huaweicloud.com/repository/pypi/simple

gunicorn是一个Python Web Server,可以和java中的Tomcat类比

启动Superset

shell
# gunicorn --workers 5 --timeout 120 --bind `hostname`:8787 superset:app --daemon
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon
验证
shell
(superset) [root@keeper01 superset]# netstat -lunpt | grep 8787
tcp        0      0 192.168.107.12:8787     0.0.0.0:*               LISTEN      41511/python

访问

浏览器访问 http://192.168.107.12:8787/,输入账号密码:`superset/superset`

关闭

shell
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9

开始探索MySQL数据

安装mysql数据库驱动

参考 https://superset.apache.org/docs/configuration/databases

shell
conda activate superset
conda install mysqlclient

重启superset

shell
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon

在宿主机MacOS上部署好MySQL

数据源配置

参考官网 https://superset.apache.org/docs/using-superset/creating-your-first-dashboard

开始探索ClickHouse数据

安装ClickHouse数据库驱动

shell
pip install clickhouse-connect -i https://mirrors.huaweicloud.com/repository/pypi/simple

重启superset

shell
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon
netstat -lunpt | grep 8787

准备ClickHouse数据

参考 https://clickhouse.com/docs/zh/getting-started/example-datasets/uk-price-paid

ClickHouse 环境见 https://yeyouluo.github.io/big-data/ck/ck-install-cluster-with-keeper.html

  • 由于外网网速较慢,因此将数据集文件下载到本地,并启动一个HTTP服务:python3 -m http.server

  • keeper01 主机上操作:

    shell
    [root@ck01 ~]# clickhouse-client -m
    # 输入密码: p@$$W0rd
  • 执行下列SQL:

    sql
    CREATE DATABASE IF NOT EXISTS sample_db ON CLUSTER cluster1;
    
    show databases;
    
    use sample_db;
    CREATE TABLE sample_db.uk_price_paid ON CLUSTER cluster1
    (
        price UInt32,
        date Date,
        postcode1 LowCardinality(String),
        postcode2 LowCardinality(String),
        type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
        is_new UInt8,
        duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
        addr1 String,
        addr2 String,
        street LowCardinality(String),
        locality LowCardinality(String),
        town LowCardinality(String),
        district LowCardinality(String),
        county LowCardinality(String)
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/uk_price_paid', '{replica}')
    ORDER BY (postcode1, postcode2, addr1, addr2);
    
    INSERT INTO sample_db.uk_price_paid
    WITH
       splitByChar(' ', postcode) AS p
    SELECT
        toUInt32(price_string) AS price,
        parseDateTimeBestEffortUS(time) AS date,
        p[1] AS postcode1,
        p[2] AS postcode2,
        transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
        b = 'Y' AS is_new,
        transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
        addr1,
        addr2,
        street,
        locality,
        town,
        district,
        county
    FROM url(
        'http://192.168.107.1:8000/pp-complete.csv',
        'CSV',
        'uuid_string String,
        price_string String,
        time String,
        postcode String,
        a String,
        b String,
        c String,
        addr1 String,
        addr2 String,
        street String,
        locality String,
        town String,
        district String,
        county String,
        d String,
        e String'
    ) SETTINGS max_http_get_redirects=10;
    
    -- 查看行数
    SELECT count() FROM uk_price_paid;
    -- 查看表大小
    SELECT formatReadableSize(total_bytes)
    FROM system.tables
    WHERE name = 'uk_price_paid';
    
    -- 3 个查询 -- 
    -- 查询 1. 每年平均价格
    SELECT
       toYear(date) AS year,
       round(avg(price)) AS price,
       bar(price, 0, 1000000, 80
    )
    FROM uk_price_paid
    GROUP BY year
    ORDER BY year;
    ---30 rows in set. Elapsed: 1.038 sec. Processed 29.07 million rows, 174.42 MB (28.01 million rows/s., 168.08 MB/s.)
    -- Peak memory usage: 1.01 MiB.
    
    -- 查询 2. 伦敦每年的平均价格
    SELECT
       toYear(date) AS year,
       round(avg(price)) AS price,
       bar(price, 0, 2000000, 100
    )
    FROM uk_price_paid
    WHERE town = 'LONDON'
    GROUP BY year
    ORDER BY year;
    -- 30 rows in set. Elapsed: 0.273 sec. Processed 29.07 million rows, 72.58 MB (106.36 million rows/s., 265.57 MB/s.)
    -- Peak memory usage: 1000.13 KiB.
    
    -- 查询 3. 最昂贵的社区
    SELECT
        town,
        district,
        count() AS c,
        round(avg(price)) AS price,
        bar(price, 0, 5000000, 100)
    FROM uk_price_paid
    WHERE date >= '2020-01-01'
    GROUP BY
        town,
        district
    HAVING c >= 100
    ORDER BY price DESC
    LIMIT 100;
    -- 100 rows in set. Elapsed: 1.054 sec. Processed 29.07 million rows, 290.70 MB (27.59 million rows/s., 275.90 MB/s.)
    -- Peak memory usage: 2.04 MiB.
    
    -- 添加投影
    ALTER TABLE sample_db.uk_price_paid on cluster cluster1 
        ADD PROJECTION projection_by_year_district_town
        (
            SELECT
                toYear(date),
                district,
                town,
                avg(price),
                sum(price),
                count()
            GROUP BY
                toYear(date),
                district,
                town
        );
        
    -- 填充现有数据(如果不进行 materialize 操作,则 ClickHouse 只会为新插入的数据创建 Projection)
    ALTER TABLE sample_db.uk_price_paid
        MATERIALIZE PROJECTION projection_by_year_district_town
    SETTINGS mutations_sync = 1;
    
    -- 再次执行上面3个查询,比对时间

配置仪表板

参考 https://clickhouse.com/docs/en/integrations/superset

该仪表板配置就是使用上面的数据集。

主要步骤:

  • 配置 Database Connection
  • 添加DataSets,选中 uk_price_paid
  • 添加Charts,按照需求配置图形
  • 添加Dashboards,将Charts加入到DashBoard,发布

最终的效果:

image-20240526184332487

基于 知识共享 CC BY-NC-SA 许可发布