环境说明
[root@keeper01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
Python环境准备
下载&安装脚本
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --no-check-certificate
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
# 接下来按照操作提示进行,可以另外指令安装路径
配置环境变量
修改/etc/profile.d/env.sh文件,内容如下
export CONDA_HOME=/root/miniconda3
export PATH=$PATH:$CONDA_HOME/bin
保存完成执行如下:
source /etc/profile
# 预期输出 /root/miniconda3/bin/conda
which conda
配置pypi环境
conda config --set auto_activate_base false
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda config --set show_channel_urls yes
# 验证配置
[root@keeper01 superset]# cat ~/.condarc
channels:
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
- defaults
show_channel_urls: true
auto_activate_base: false
pypi安装Superset
官方文档docker-compose 安装 superset 有不少坑,遂放弃。
创建python环境
mkdir -p ~/superset && cd ~/superset
conda create -n superset python=3.10
conda init
conda activate superset
# 预期是Python 3.10.x
python3 -V
(superset) [root@keeper01 superset]# python3 -V
Python 3.10.14
⚠️ 【问题】执行
conda activate superset
出现错误:CondaError: Run 'conda init' before 'conda activate'
【解决】在执行了conda init,但是还是会报一样的错误。需要退出终端再进入就好了。
安装基础依赖
yum install -y python-setuptools
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
安装SuperSet
使用pip安装
pip install apache-superset -i https://mirrors.huaweicloud.com/repository/pypi/simple
最后会打印如下日志,就表示安装成功:
shellSuccessfully installed Babel-2.15.0 Flask-Babel-2.0.0 Flask-JWT-Extended-4.6.0 Flask-Limiter-3.7.0 Flask-SQLAlchemy-2.5.1 Jinja2-3.1.4 Mako-1.3.5 MarkupSafe-2.1.5 PyJWT-2.8.0 Werkzeug-3.0.3 alembic-1.13.1 amqp-5.2.0 apache-superset-4.0.1 apispec-6.6.1 apsw-3.46.0.0 async-timeout-4.0.3 attrs-23.2.0 backoff-2.2.1 bcrypt-4.1.3 billiard-4.2.0 blinker-1.8.2 bottleneck-1.3.8 brotli-1.1.0 cachelib-0.9.0 cachetools-5.3.3 cattrs-23.2.3 celery-5.4.0 certifi-2024.2.2 cffi-1.16.0 charset-normalizer-3.3.2 click-8.1.7 click-didyoumean-0.3.1 click-option-group-0.5.6 click-plugins-1.1.1 click-repl-0.3.0 colorama-0.4.6 cron-descriptor-1.4.3 croniter-2.0.5 cryptography-42.0.7 deprecated-1.2.14 deprecation-2.1.0 dnspython-2.6.1 email-validator-2.1.1 exceptiongroup-1.2.1 flask-2.3.3 flask-appbuilder-4.5.0 flask-caching-2.3.0 flask-compress-1.15 flask-login-0.6.3 flask-migrate-3.1.0 flask-session-0.8.0 flask-talisman-1.1.0 flask-wtf-1.2.1 func_timeout-4.3.5 geographiclib-2.0 geopy-2.4.1 google-auth-2.29.0 greenlet-3.0.3 gunicorn-21.2.0 h11-0.14.0 hashids-1.3.1 holidays-0.25 humanize-4.9.0 idna-3.7 importlib-resources-6.4.0 importlib_metadata-7.1.0 isodate-0.6.1 itsdangerous-2.2.0 jsonschema-4.22.0 jsonschema-specifications-2023.12.1 kombu-5.3.7 korean-lunar-calendar-0.3.1 limits-3.12.0 llvmlite-0.42.0 markdown-3.6 markdown-it-py-3.0.0 marshmallow-3.21.2 marshmallow-sqlalchemy-0.28.2 mdurl-0.1.2 msgpack-1.0.8 msgspec-0.18.6 nh3-0.2.17 numba-0.59.1 numexpr-2.10.0 numpy-1.23.5 ordered-set-4.1.0 outcome-1.3.0.post0 packaging-24.0 pandas-2.0.3 paramiko-3.4.0 parsedatetime-2.6 pgsanity-0.2.9 platformdirs-4.2.2 polyline-2.0.2 prison-0.2.1 prompt-toolkit-3.0.43 pyarrow-14.0.2 pyasn1-0.6.0 pyasn1-modules-0.4.0 pycparser-2.22 pygments-2.18.0 pynacl-1.5.0 pyopenssl-24.1.0 pyparsing-3.1.2 pysocks-1.7.1 python-dateutil-2.9.0.post0 python-dotenv-1.0.1 python-geohash-0.8.5 pytz-2024.1 pyyaml-6.0.1 redis-4.6.0 referencing-0.35.1 requests-2.32.2 requests-cache-1.2.0 rich-13.7.1 rpds-py-0.18.1 rsa-4.9 selenium-4.9.1 shillelagh-1.2.19 shortid-0.1.2 simplejson-3.19.2 six-1.16.0 slack_sdk-3.27.2 sniffio-1.3.1 sortedcontainers-2.4.0 sqlalchemy-1.4.52 sqlalchemy-utils-0.38.3 sqlglot-23.17.0 sqlparse-0.4.4 sshtunnel-0.4.0 tabulate-0.8.10 trio-0.25.1 trio-websocket-0.11.1 typing-extensions-4.12.0 tzdata-2024.1 url-normalize-1.4.3 urllib3-2.2.1 vine-5.1.0 wcwidth-0.2.13 wrapt-1.16.0 wsproto-1.2.0 wtforms-3.1.2 wtforms-json-0.3.5 xlsxwriter-3.0.9 zipp-3.18.2 zstandard-0.22.0
验证
(superset) [root@keeper01 superset]# which superset
/root/miniconda3/envs/superset/bin/superset
初始化Superset数据库
superset db upgrade
小贴士
Superset采用的数据库是sqllite数据库,是一个轻量级的数据量。
错误及解决
📢 如果没有这一步,启动没有问题。但浏览器登录后访问会报错500,日志会提示
sqlite3.OperationalError: no such table: user_attribute
。参考:https://github.com/apache/superset/issues/7354
创建管理员用户
给superset创建管理员用户
export FLASK_APP=superset
export SUPERSET_SECRET_KEY="oh-so-secret"
在里面可以设置用户名与密码,其他的可以不用设置,直接enter就行
(superset) [root@keeper01 ~]# export FLASK_APP=superset
(superset) [root@keeper01 ~]# export SUPERSET_SECRET_KEY="oh-so-secret"
(superset) [root@keeper01 ~]# flask fab create-admin
Username [admin]: superset
User first name [admin]:
User last name [user]:
Email [admin@fab.org]:
Password:
Repeat for confirmation:
logging was configured successfully
2024-05-26 12:12:30,324:INFO:superset.utils.logging_configurator:logging was configured successfully
2024-05-26 12:12:30,339:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
warnings.warn(
No PIL installation found
2024-05-26 12:12:30,784:INFO:superset.utils.screenshots:No PIL installation found
Recognized Database Authentications.
Admin User superset created.
错误及解决
【问题】Refusing to start due to insecure SECRET_KEY
【解决】 export SUPERSET_SECRET_KEY="oh-so-secret"
,参考 :Apache Superset: Refusing to start due to insecure SECRET_KEY
初始化
superset init
(superset) [root@keeper01 ~]# superset init
logging was configured successfully
2024-05-26 12:14:21,843:INFO:superset.utils.logging_configurator:logging was configured successfully
2024-05-26 12:14:21,850:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask_limiter/extension.py:333: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
warnings.warn(
No PIL installation found
2024-05-26 12:14:22,101:INFO:superset.utils.screenshots:No PIL installation found
Syncing role definition
2024-05-26 12:14:27,503:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2024-05-26 12:14:27,652:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2024-05-26 12:14:27,668:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2024-05-26 12:14:27,867:INFO:superset.security.manager:Syncing Gamma perms
Syncing sql_lab perms
2024-05-26 12:14:28,064:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2024-05-26 12:14:28,245:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2024-05-26 12:14:28,252:INFO:superset.security.manager:Creating missing datasource permissions.
Traceback (most recent call last):
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: tables
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/envs/superset/bin/superset", line 8, in <module>
sys.exit(superset())
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 358, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/flask/cli.py", line 358, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/cli/main.py", line 65, in init
security_manager.sync_role_definitions()
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/security/manager.py", line 891, in sync_role_definitions
self.create_missing_perms()
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/security/manager.py", line 838, in create_missing_perms
datasources = SqlaTable.get_all_datasources()
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/superset/connectors/sqla/models.py", line 1954, in get_all_datasources
return qry.all()
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2773, in all
return self._iter().all()
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
result = self.session.execute(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
result = conn._execute_20(statement, params or {}, execution_options)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
ret = self._execute_context(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
self._handle_dbapi_exception(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
util.raise_(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File "/root/miniconda3/envs/superset/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: tables
[SQL: SELECT tables.uuid AS tables_uuid, tables.created_on AS tables_created_on, tables.changed_on AS tables_changed_on, tables.id AS tables_id, tables.description AS tables_description, tables.default_endpoint AS tables_default_endpoint, tables.is_featured AS tables_is_featured, tables.filter_select_enabled AS tables_filter_select_enabled, tables."offset" AS tables_offset, tables.cache_timeout AS tables_cache_timeout, tables.params AS tables_params, tables.perm AS tables_perm, tables.schema_perm AS tables_schema_perm, tables.is_managed_externally AS tables_is_managed_externally, tables.external_url AS tables_external_url, tables.table_name AS tables_table_name, tables.main_dttm_col AS tables_main_dttm_col, tables.database_id AS tables_database_id, tables.fetch_values_predicate AS tables_fetch_values_predicate, tables.schema AS tables_schema, tables.sql AS tables_sql, tables.is_sqllab_view AS tables_is_sqllab_view, tables.template_params AS tables_template_params, tables.extra AS tables_extra, tables.normalize_columns AS tables_normalize_columns, tables.always_filter_main_dttm AS tables_always_filter_main_dttm, tables.created_by_fk AS tables_created_by_fk, tables.changed_by_fk AS tables_changed_by_fk
FROM tables
WHERE tables.is_sqllab_view = 0]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
启动Superset
安装gunicorn
pip install gunicorn -i https://mirrors.huaweicloud.com/repository/pypi/simple
gunicorn是一个Python Web Server,可以和java中的Tomcat类比
启动Superset
# gunicorn --workers 5 --timeout 120 --bind `hostname`:8787 superset:app --daemon
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon
验证
(superset) [root@keeper01 superset]# netstat -lunpt | grep 8787
tcp 0 0 192.168.107.12:8787 0.0.0.0:* LISTEN 41511/python
访问
浏览器访问 http://192.168.107.12:8787/,输入账号密码:`superset/superset`
关闭
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
开始探索MySQL数据
安装mysql数据库驱动
conda activate superset
conda install mysqlclient
重启superset
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon
在宿主机MacOS上部署好MySQL
略
数据源配置
参考官网 https://superset.apache.org/docs/using-superset/creating-your-first-dashboard
开始探索ClickHouse数据
安装ClickHouse数据库驱动
pip install clickhouse-connect -i https://mirrors.huaweicloud.com/repository/pypi/simple
重启superset
ps -ef | awk '/gunicorn/ && !/awk/{print $2}' | xargs kill -9
gunicorn -w 5 -t 120 -b `hostname`:8787 "superset.app:create_app()" --daemon
netstat -lunpt | grep 8787
准备ClickHouse数据
参考 https://clickhouse.com/docs/zh/getting-started/example-datasets/uk-price-paid
ClickHouse 环境见 https://yeyouluo.github.io/big-data/ck/ck-install-cluster-with-keeper.html
由于外网网速较慢,因此将数据集文件下载到本地,并启动一个HTTP服务:
python3 -m http.server
在
keeper01
主机上操作:shell[root@ck01 ~]# clickhouse-client -m # 输入密码: p@$$W0rd
执行下列SQL:
sqlCREATE DATABASE IF NOT EXISTS sample_db ON CLUSTER cluster1; show databases; use sample_db; CREATE TABLE sample_db.uk_price_paid ON CLUSTER cluster1 ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0), is_new UInt8, duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/uk_price_paid', '{replica}') ORDER BY (postcode1, postcode2, addr1, addr2); INSERT INTO sample_db.uk_price_paid WITH splitByChar(' ', postcode) AS p SELECT toUInt32(price_string) AS price, parseDateTimeBestEffortUS(time) AS date, p[1] AS postcode1, p[2] AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type, b = 'Y' AS is_new, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county FROM url( 'http://192.168.107.1:8000/pp-complete.csv', 'CSV', 'uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String' ) SETTINGS max_http_get_redirects=10; -- 查看行数 SELECT count() FROM uk_price_paid; -- 查看表大小 SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid'; -- 3 个查询 -- -- 查询 1. 每年平均价格 SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80 ) FROM uk_price_paid GROUP BY year ORDER BY year; ---30 rows in set. Elapsed: 1.038 sec. Processed 29.07 million rows, 174.42 MB (28.01 million rows/s., 168.08 MB/s.) -- Peak memory usage: 1.01 MiB. -- 查询 2. 伦敦每年的平均价格 SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 2000000, 100 ) FROM uk_price_paid WHERE town = 'LONDON' GROUP BY year ORDER BY year; -- 30 rows in set. Elapsed: 0.273 sec. Processed 29.07 million rows, 72.58 MB (106.36 million rows/s., 265.57 MB/s.) -- Peak memory usage: 1000.13 KiB. -- 查询 3. 最昂贵的社区 SELECT town, district, count() AS c, round(avg(price)) AS price, bar(price, 0, 5000000, 100) FROM uk_price_paid WHERE date >= '2020-01-01' GROUP BY town, district HAVING c >= 100 ORDER BY price DESC LIMIT 100; -- 100 rows in set. Elapsed: 1.054 sec. Processed 29.07 million rows, 290.70 MB (27.59 million rows/s., 275.90 MB/s.) -- Peak memory usage: 2.04 MiB. -- 添加投影 ALTER TABLE sample_db.uk_price_paid on cluster cluster1 ADD PROJECTION projection_by_year_district_town ( SELECT toYear(date), district, town, avg(price), sum(price), count() GROUP BY toYear(date), district, town ); -- 填充现有数据(如果不进行 materialize 操作,则 ClickHouse 只会为新插入的数据创建 Projection) ALTER TABLE sample_db.uk_price_paid MATERIALIZE PROJECTION projection_by_year_district_town SETTINGS mutations_sync = 1; -- 再次执行上面3个查询,比对时间
配置仪表板
参考 https://clickhouse.com/docs/en/integrations/superset
该仪表板配置就是使用上面的数据集。
主要步骤:
- 配置 Database Connection
- 添加DataSets,选中
uk_price_paid
表 - 添加Charts,按照需求配置图形
- 添加Dashboards,将Charts加入到DashBoard,发布
最终的效果: