Sign In
Sign In

Introduction to ClickHouse: Features and Getting Started

Introduction to ClickHouse: Features and Getting Started
Hostman Team
Technical writer
ClickHouse
18.09.2024
Reading time: 6 min

In this beginner’s guide we will talk about ClickHouse, describe its advantages and use cases, and explain how to get started with ClickHouse. However, before we start, we should answer the most important question: what is ClickHouse?

ClickHouse is a columnar database management system (DBMS) designed for processing analytical queries with high performance. It is optimized for handling large volumes of data, providing fast aggregation, filtering, and sorting queries. 

Unlike traditional relational databases that store data in rows, ClickHouse organizes information by columns, making it highly efficient for tasks like data aggregation, filtering, and sorting. It was developed to handle massive datasets, enabling companies to perform complex analytics in record time.

ClickHouse is open-source, scalable, and versatile as it supports multiple data formats and can be used for handling almost everything, from financial analysis to blockchain monitoring.

Features and Benefits of ClickHouse

ClickHouse has several key features that make it an attractive option for working with large datasets. Here are some of its main benefits:

  • Columnar Data Storage: This allows for efficient data compression and faster query execution by operating on columns. It makes ClickHouse ideal for analytical tasks like data aggregation, report generation, time-series analysis, and other complex queries.

  • High Performance: The speed of queries related to aggregation, filtering, and sorting makes ClickHouse a perfect system for analytical tasks.

  • Scalability: ClickHouse has a scalable architecture that allows data and queries to be distributed across multiple nodes, ensuring system availability without performance loss under high loads.

  • Support for Various Data Formats: ClickHouse supports CSV, TSV, JSON, and several other formats, making it a versatile tool for handling various data types.

  • Open Source and Free: As open-source software, ClickHouse allows customization and flexible configuration to suit specific needs.

Thanks to these features, ClickHouse is widely used in large-scale projects to process large volumes of data and complex analytical tasks.

ClickHouse Use Cases

Here are the main types of tasks where the free ClickHouse performs as well as, or even better than, many paid competitors:

  • Web Application Analytics: ClickHouse is used to store and process large volumes of data about user activity on websites, their preferences, time spent on pages, and other metrics. The DBMS enables complex analytical queries to identify trends, optimize user experience, and support business decisions.

  • Digital Advertising Optimization and Management: ClickHouse stores data on ad campaigns, their effectiveness, target audience, and other parameters. Analyzing this data helps optimize ad spend, improve efficiency, and increase conversion rates.

  • Operational Log Analysis from Multiple Sources: In this case, ClickHouse is used to collect, store, and analyze data about system performance, errors, and events. This helps quickly identify issues and improve system reliability.

  • Security Log Monitoring: ClickHouse is ideal for storing event logs and security audits. Comprehensive log data analysis helps detect potential threats, respond to incidents promptly, and ensure information system security. It works effectively with AI that detects patterns in the data, which humans might struggle to find, indicating possible network intrusions. These features are used by CloudFlare developers, who rely on ClickHouse to store data on traffic, requests, blocks, and other network parameters.

  • Financial Analysis: ClickHouse is used to store financial data, reports, transactions, and other company operations. Analyzing this data helps make informed financial strategies, investments, and budgeting decisions.

  • Product Quality Analysis Based on Incoming Data: When manufacturing complex electronic components and other high-tech devices, ClickHouse is invaluable because it can simultaneously receive and process information on thousands of parameters affecting the quality of produced components.

  • Blockchain Analytics: In this field, ClickHouse is used to store blockchain blocks, transactions, contracts, and other blockchain system data. Analyzing this data helps track transactions, verify the integrity of the blockchain, and ensure network security.

This list shows that ClickHouse is a universal DBMS that can be used to solve a wide variety of tasks.

Installing ClickHouse

ClickHouse is initially designed to run on Linux, FreeBSD, and macOS. You can install ClickHouse on your local machine or a cloud server.

Quick Deployment

The easiest way to quickly deploy ClickHouse is to run the following command, which will determine if your operating system is supported, and then download the appropriate ClickHouse file:

curl https://clickhouse.com/ | sh

If there are no conflicts with your system, you can proceed to start the server using the following command:

./clickhouse server

This command will first create all necessary directories and files, after which the server will start. Connect to the server by opening a new terminal and running:

./clickhouse client

The system will return information about the client version and the connection status via localhost (the status should be "connected"). You can now start working with the database by sending SQL queries.

Installing ClickHouse on Linux Ubuntu

To install ClickHouse on Ubuntu or Debian using deb packages via sudo, follow these sequential commands in the terminal:

apt-get install -y apt-transport-https ca-certificates dirmngrapt-get install -y apt-transport-https ca-certificates dirmngr
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
apt-get update
apt-get install -y clickhouse-server clickhouse-client
service clickhouse-server start
clickhouse-client

Once the system confirms the installation, you can start working with databases offline.

The Easiest Way to Install ClickHouse

If you don't want to spend time entering commands for installation and want to start working with ClickHouse right away, Hostman offers a great solution. Register on the website or log in to your Hostman account and deploy ClickHouse in the cloud with a few clicks.

Here's how to do it:

  1. Select Databases from the left menu in the control panel and click Create database.

  2. Select ClickHouse and scroll down to choose parameters like region, pricing plan, network, and additional services (such as creating a backup). Then click Order.

91bd8b4e 59fd 4a67 9fc3 A6319038ad8e

Your ClickHouse database will be deployed and ready to use in a few minutes.

That's it! Now you know all the ways to install ClickHouse for remote work. For more information on how to work with this database, check out ClickHouse docs.

ClickHouse
18.09.2024
Reading time: 6 min

Similar

ClickHouse

Creating Tables in ClickHouse

Like other database management systems, ClickHouse offers many functions for working with tables.  But before working with tables, we must create a database where these tables will be stored. Creating a Database in ClickHouse To create a database in ClickHouse, use the CREATE DATABASE command, which creates a new database in ClickHouse's storage. To select an already created database, use the USE command. These commands look like this (database names are just examples): CREATE DATABASE my_new_databaseUSE my_old_database In the Hostman panel, you can create a ClickHouse database in just a few clicks: Go to the "Databases" page and click the "Create" button. On the next page, select ClickHouse in the "Database Type" section, fill in the required parameters below, and click the "Order" button on the right. Now, we can proceed to work with tables. How to Create Tables in ClickHouse To create tables in ClickHouse, use the CREATE TABLE command, followed by the table name and the column parameters. Let's jump into an example and explain other related commands. Example 1 We'll start with an example of creating a table to store data about the models of a car brand: CREATE TABLE autobrand_models ( model_id UInt32, model_name String, release_year UInt16, engine_type String, horsepower UInt16, price Decimal(10, 2) ) ENGINE = MergeTree() ORDER BY model_id; In this example, we create the table autobrand_models with the following fields: model_id: The model identifier, 32-bit unsigned integer (UInt32). model_name: The name of the model (String). release_year: The year the model was released, 16-bit unsigned integer (UInt16). engine_type: The type of engine (String). horsepower: The engine's horsepower, 16-bit unsigned integer (UInt16). price: The price, decimal number with up to 10 digits and 2 decimal places (Decimal(10, 2)). This table allows storing information about car models, including their ID, name, release year, engine type, horsepower, and price. Notice the following two lines: ENGINE = MergeTree(): We're using the MergeTree engine, one of the most commonly used engines for data storage in ClickHouse. The engine determines how and where the data is stored, which queries are supported, and whether data replication is enabled. While ClickHouse supports many engines, MergeTree is the obvious choice for simple tables like this one. ORDER BY model_id: This specifies that the data will be ordered by the model_id column. Example 2 Now let's talk about how to use primary keys. These can be specified in different ways: within or outside the column list. In the first case, primary keys are written like this: CREATE TABLE db.new_table ( name01 type01, name02 type02, ..., PRIMARY KEY(expr01[, expr02,...]) ) If you want to specify them outside the table definition, it looks like this: CREATE TABLE db.new_table ( name01 type01, name02 type02, ... ) PRIMARY KEY(expr01[, expr02,...]); As we can see, the difference in the code is the position of the closing parenthesis: in the second case, the primary keys are placed outside the table definition. Let's look at a specific example of primary key usage: CREATE TABLE shop.sales_by_category ( category_id UInt32, product_name String, sale_date Date, quantity UInt16, price Decimal(10, 2), revenue Decimal(10, 2), PRIMARY KEY(category_id, sale_date) ) ENGINE = MergeTree() ORDER BY (category_id, sale_date); Here: category_id: The category ID, a 32-bit unsigned integer (UInt32). product_name: The product name (String). sale_date: The sale date (Date). quantity: The quantity sold, 16-bit unsigned integer (UInt16). price: The price, decimal with 2 decimal places (Decimal(10, 2)). revenue: The total revenue, also in decimal format (Decimal(10, 2)). PRIMARY KEY: The primary keys are category_id and sale_date, defined within the column list. ORDER BY (category_id, sale_date): This defines the data sorting order in the table. Adding Comments to Tables You can add comments to tables using the COMMENT command. Here's an example of adding a comment and then displaying it. To add a comment, you can do something like this: CREATE TABLE countdown (x String) ENGINE = Memory COMMENT 'Temporary table for report'; To display the comment, use this command: SELECT name, comment FROM system.tables WHERE name = 'countdown'; The result will look like this: name | comment -------------|------------------------------------------------ countdown | Temporary table for report Temporary Tables in ClickHouse ClickHouse allows you to create temporary tables that exist only for the current query and are created outside of a database. These tables are automatically deleted after the session ends or the connection is closed. Use the CREATE TEMPORARY TABLE command to create a temporary table in ClickHouse. Here's the simplest structure for such a table: CREATE TEMPORARY TABLE temp_table ( id UInt32, name String ) ENGINE = Memory; In this example, we create the temporary table temp_table with fields id and name. The engine is set to Memory, meaning the data will be stored in RAM. Temporary tables are useful when you need to quickly store and use data within the current session without keeping it afterward. Example: Analyzing Sales Data Let's assume we have a database for an online store and want to analyze sales for a specific period. To do this, we may need to create a temporary table to collect data from multiple tables and perform analytical queries. Here's how it can be done: CREATE TEMPORARY TABLE temp_sales_data AS SELECT products.product_id, products.product_name, sales.sale_date, sales.sale_amount FROM products JOIN sales ON products.product_id = sales.product_id WHERE sales.sale_date BETWEEN '2024-08-01' AND '2024-08-31'; In this example, we create a temporary table, temp_sales_data, to place data about products and sales for August 2024. We join the products and sales tables by product ID and select only those records where the sale date falls within the specified period. After creating the temp_sales_data table, we can perform various analytical queries on this data, such as calculating the total sales for the month or finding the most popular products. Note that the temporary table will automatically be deleted when the session ends, which is convenient for analytics since it doesn't clutter the database with temporary reports. Modifying Table Structure ClickHouse allows changing table structures using the REPLACE command. However, keep in mind that this command applies only to existing tables; otherwise, the system will return an error. Suppose we have a table myBigData with data and want to replace all records where CounterID is less than a certain value. In ClickHouse, this can be done in a single line: REPLACE TABLE myBigData SELECT * FROM myBigData WHERE CounterID < 5356; This query replaces all records in the myBigData table where CounterID < 5356 with the result of the selection from the same table. This way, the REPLACE command efficiently updates data in a table based on a query from the same table. Conclusion We've explored creating regular and temporary tables in ClickHouse and learned how to modify data and add comments to tables. For more advanced ClickHouse table features, check out the documentation.
23 September 2024 · 6 min to read
ClickHouse

User Management in ClickHouse

In this article, we'll describe how to create a ClickHouse user and configure its settings: add a password and grant permissions. Creating a User Let's say we need to create a new user named hostman_admin. This is done using the ClickHouse command CREATE USER: CREATE USER hostman_admin IDENTIFIED BY 'password' We created a user and set a password.  However, you can also set a password in another way. The main ClickHouse user settings can be found in the users.xml configuration file (this file is located in the /etc/clickhouse-server/ directory on the ClickHouse server) in the users section. Open this file and add the following lines: <users> <hostman_admin> <password></password> <!-- Or --> <password_sha256_hex></password_sha256_hex> </hostman_admin> </users> This means that we can assign either a regular password or one in hexadecimal format, leaving one of the values empty (default). If we are creating a database for tests or it contains no sensitive information, a simple password can be used, for example: <users> <hostman_admin> <password>simplepass</password> </hostman_admin> </users> Otherwise, you can input the password in hexadecimal format (0-f), leaving the first value empty, for example: <users> <hostman_admin> <password></password> <!-- Or --> <password_sha256_hex>76f95cf446430c895d5923a7860af00476f95cf446430c895d5923a7860af004</password_sha256_hex> </hostman_admin> </users> You can get the hash for a specific string as follows: echo -n 'secretpass' | sha256sum If you plan to work with one of the MySQL clients, you'll need to create a SHA1-type password, which is added using the following tag: <password_double_sha1_hex>76f95cf446430c895d5923a7860af00476f95cf4</password_double_sha1_hex> You can get the hash similarly: echo -n 'secretpass' | sha1sum Configuring User Permissions in ClickHouse At this point, our created user does not have any special privileges. Let's assign them the admin role using the GRANT command. This way we grant all privileges to the user: GRANT ALL ON my_database.my_table TO hostman_admin WITH GRANT OPTION; In the users.xml file, pay attention to the <access_management> tag, which can take values 0 and 1. If you set the value to 1, it will turn on access control via SQL: <access_management>1</access_management> Now let's configure the network through which we will access ClickHouse. To do this, find the <networks> block in the configuration file and specify the necessary IP addresses in any suitable format: IPv4 or IPv6. Here's an example configuration: <networks>`211.182.215.2`, `2b03:6d9::2`, `3b13:7c9::3/64`</networks> To specify local access, you can add the following (replace with the necessary IP): <ip>::1</ip><ip>127.0.0.1</ip> Restricting Data Access Sometimes, it's useful to restrict a user by giving them access to only certain parts of the database. Let's create a new user and ensure they can only access specific rows in the database table. First, create the user: CREATE USER hostman_new IDENTIFIED BY 'password' Next, we'll need to create a view that filters data based on certain conditions and then grant access to this view to the user, limiting their data visibility. The view is created like this (replace original_table, some_column, and certain_value with the appropriate values): CREATE VIEW limited_view AS SELECT * FROM original_table WHERE some_column = 'certain_value' Now, grant our user access to this view: GRANT SELECT ON limited_view TO hostman_new We need to verify that we assigned the permissions correctly. For this, we can query system tables like system.grants and system.privileges, for example: SELECT * FROM system.grants WHERE user_name = 'hostman_new' AND table = 'limited_view' If you see a row with SELECT permissions, then the permissions were granted correctly. Deleting a User Account If the user is no longer needed, you can delete it. This is done using the DROP command: DROP USER hostman_admin   That's all. We've learned how to manage user accounts in ClickHouse: create, configure permissions, and delete them.
19 September 2024 · 4 min to read
ClickHouse

How to Install and Use ClickHouse on Ubuntu

ClickHouse is an open-source column-oriented database management system developed by Yandex, designed for real-time analytical processing of large volumes of data. It is specifically optimized for online analytical processing (OLAP) workloads, making it ideal for applications requiring fast analytical queries on vast datasets. System Requirements Before proceeding with the installation, ensure that your system meets the following requirements: Ubuntu operating system (latest LTS version recommended for stability) Minimum 2GB RAM (4GB or more recommended) Adequate disk space for data storage Update Package Lists Open a terminal window and update the package lists by running the following command: sudo apt update Install ClickHouse Use the following command to install ClickHouse on your Ubuntu system: sudo apt-get install -y apt-transport-https ca-certificates curl gnupgcurl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpgecho "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee \    /etc/apt/sources.list.d/clickhouse.listsudo apt-get update Start ClickHouse Server Once the installation is complete, start the ClickHouse server with the following command: sudo apt-get install -y clickhouse-server clickhouse-client To verify that ClickHouse has been installed successfully, check its status by executing: sudo service clickhouse-server start Configuration and Setup Launch the ClickHouse client by typing the following command in your terminal: clickhouse-client Create a Database Inside the ClickHouse client, create a new database using the following SQL command: CREATE DATABASE my_database; Then use the command use database: USE my_database; This command selects my_database as the current database to execute further commands within. Create a Table Here's an example of how to create a table within the database. Suppose we want to create a table named my_table with the following columns: id (UInt32), name (String), and age (UInt8). CREATE TABLE my_table(    id UInt32,    name String,    age UInt8)ENGINE = MergeTree()ORDER BY id; This command creates a table named my_table with three columns (id, name, and age). The MergeTree engine is used, which is suitable for large datasets. The ORDER BY id clause specifies that the data will be ordered by the id column, which is often used to optimize query performance. Basic Usage and Commands Insert Data To insert data into a ClickHouse table, use the INSERT INTO SQL statement. For example: INSERT INTO my_table (column1, column2) VALUES (1, 'value1'); Query Data Retrieve data from a table using the SELECT SQL statement. Here's an example: SELECT * FROM my_table; Deleting Data To delete rows from a ClickHouse table, use the DELETE SQL statement. When using the DELETE statement in ClickHouse, you need to specify conditions that determine which rows will be deleted. These conditions can be based on various comparisons and logical operations. Here are examples of different types of conditions and the corresponding SQL commands to delete rows from my_table. Delete Rows Based on Equality Delete rows where column1 is equal to 1. ALTER TABLE my_table DELETE WHERE column1 = 1; Delete Rows Based on Inequality Delete rows where column1 is not equal to 1. ALTER TABLE my_table DELETE WHERE column1 != 1; Delete Rows Based on Greater Than Delete rows where column1 is greater than 2. ALTER TABLE my_table DELETE WHERE column1 > 2; Delete Rows Based on Less Than Delete rows where column1 is less than 3. ALTER TABLE my_table DELETE WHERE column1 < 3; Delete Rows Based on Greater Than or Equal To Delete rows where column1 is greater than or equal to 3. ALTER TABLE my_table DELETE WHERE column1 >= 3; Delete Rows Based on Less Than or Equal To Delete rows where column1 is less than or equal to 2. ALTER TABLE my_table DELETE WHERE column1 <= 2; Delete Rows Based on String Matching Delete rows where column2 is equal to value1. ALTER TABLE my_table DELETE WHERE column2 = 'value1'; Delete Rows Based on a Range Delete rows where column1 is between 2 and 4. ALTER TABLE my_table DELETE WHERE column1 BETWEEN 2 AND 4; Delete Rows Based on Logical AND Delete rows where column1 is greater than 2 and column2 is value3. ALTER TABLE my_table DELETE WHERE column1 > 2 AND column2 = 'value3'; Delete Rows Based on Logical OR Delete rows where column1 is less than 2 or column2 is value5. ALTER TABLE my_table DELETE WHERE column1 < 2 OR column2 = 'value5'; Delete Rows Based on NULL Delete rows where column2 is NULL (assuming column2 can be NULL). ALTER TABLE my_table DELETE WHERE column2 IS NULL; Modifying Data Update existing records in a ClickHouse table with the UPDATE SQL statement: UPDATE my_table SET column1 = value1 WHERE condition; Advanced Features and Functions Let's expand on each section with detailed explanations and error-free code snippets. Materialized Views Implement materialized views for efficient data aggregation. Create a materialized view with the following SQL command: CREATE MATERIALIZED VIEW my_materialized_view ENGINE = MergeTree() AS SELECT * FROM my_table; Partitioning Utilize partitioning to manage and query large datasets effectively. Create a partitioned table using the following SQL syntax: CREATE TABLE my_partitioned_table(    column1 Date,    column2 String)ENGINE = MergeTree()PARTITION BY toYYYYMM(column1)ORDER BY column1; Monitoring and Maintenance System Monitoring Monitor ClickHouse performance using built-in metrics and tools. Access the ClickHouse system tables to view performance metrics: SELECT * FROM system.metrics; Backup and Restore To back up ClickHouse data to a local disk, follow these steps: 1. Configure Backup Destination First, you need to set up the backup destination by creating a configuration file /etc/clickhouse-server/config.d/backup_disk.xml. This file specifies the local storage path for backups. Paste the following into the backup_disk.xml file:   <clickhouse>       <storage_configuration>           <disks>               <backups>                   <type>local</type>                   <path>/backups/</path>               </backups>           </disks>       </storage_configuration>       <backups>           <allowed_disk>backups</allowed_disk>           <allowed_path>/backups/</allowed_path>       </backups>  </clickhouse> 2. Backup Data Use the BACKUP SQL statement to back up a table. Specify the destination as a local disk. BACKUP TABLE my_table TO Disk('backups', 'backup_name.zip'); 3. Restore Data To restore a table from a backup, use the RESTORE SQL statement. RESTORE TABLE my_table FROM Disk('backups', 'backup_name.zip');SETTINGS allow_non_empty_tables=true In case if the empty tables are needed to be restored then the line SETTINGS allow_non_empty_tables=true is not needed in the command. 4. Incremental Backup Incremental backups can be performed to save only the changes since the last backup. BACKUP TABLE my_table TO Disk('backups', 'incremental.zip')SETTINGS base_backup = Disk('backups', 'base_backup.zip'); 5. Password Protection Protect your backups with a password for added security. BACKUP TABLE my_table TO Disk('backups', 'protected_backup.zip')SETTINGS password='your_password'; These steps ensure a reliable and secure backup and restore process using ClickHouse's native tools. For more detailed information, visit the ClickHouse documentation. Troubleshooting and Issue Resolution Log Analysis Analyze ClickHouse logs to identify and troubleshoot issues. Review ClickHouse log files located in /var/log/clickhouse-server/ for error messages and warnings. Community Support Seek assistance from the ClickHouse community for resolving complex problems. Join ClickHouse forums and mailing lists to ask questions and seek advice from experienced users. And do not forget to check ClickHouse Docs for better understanding.
30 May 2024 · 7 min to read

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support