Fountain Voyage

The long long journey...

Big Data Storage Review Class

Tim published on 2023-02-13 included in Learning Notes Data Science & Machine Learning and Course-Notes

This blog content covers various aspects of distributed database design and optimization. It first introduces the background and requirements of big data storage systems, pointing out the shortcomings of traditional relational models in horizontal scaling, system reliability, and consistency requirements. Then, the article analyzes the changes in client/server architecture under different architectures, explores the relationship between share nothing architecture, database and table partitioning architecture, storage-computation separation architecture, and client/server architecture, and introduces the pattern structure and data transparency of relational distributed database systems. In the distributed database design section, the article discusses in detail the principles of sharding, query optimization strategies, and access optimization methods, emphasizing the characteristic parameter calculations of selection operations, projection operations, natural join operations, and semi-join operations. Subsequently, the article delves into the characteristics of HBase, including how it addresses issues with HDFS, the meaning and characteristics of regions, the content of CRUD operations, and the read-write process. In the data structure section, the article introduces the implementation principles and application scenarios of skip lists, LSM trees, and Bloom filters, pointing out that skip lists support fast writes and low update costs, LSM trees are suitable for sequential writes and random lookups, and Bloom filters are used for effective object exclusion. In the distributed transactions and consistency section, the article explores the concept of nested transactions, consistency levels of distributed databases, CAP theory, and BASE theory, and describes in detail the execution process of the two-phase commit protocol and its existing problems. The article finally discusses the basic concepts of concurrency control, solutions for distributed concurrency control, and application scenarios of distributed locks, providing readers with a comprehensive knowledge system of distributed databases.

Docker Export and Migration

Tim published on 2023-01-29 included in Computer Technology

This article mainly discusses the operation methods of Docker migration and backup, especially the differences between the docker save and docker export commands. The docker save command is used to save all layers of a Docker image into a tar file, retaining all build information and version history of the image. The corresponding read command is docker load. On the other hand, the docker export command is used to save the current state of a Docker container as a flattened file system, without any build or layer information. The corresponding read command is docker import. The file generated by docker export is usually smaller and suitable for publishing applications, while docker save is more suitable for continuous modification and development of container content. The article demonstrates through an actual test how to make modifications in a container, export the container using docker export, and then import the image using docker import. The test results show that although the imported container is different from the original container, the modifications are retained. For a single container, this method can be used for quick import and export, but for a group of containers created by docker compose, the docker save and docker load commands need to be used. In addition, the article points out that besides transferring containers using files, container migration can also be done through Docker Hub and Dockerfile, etc.

Service Listening Address

Tim published on 2023-01-26 included in Network

After setting up the MySQL service on ECS, remote access was still not possible despite proper firewall and security group configurations. By using the nmap tool to scan server ports, it was found that MySQL’s port 3306 was closed, although other ports like 22 and 80 were open. Further inspection of port usage on ECS revealed that the MySQL service was bound to the local loopback address 127.0.0.1, preventing external service provision. The solution is to modify the MySQL configuration file to change the binding address from 127.0.0.1 to 0.0.0.0, allowing the MySQL service to listen to all IPv4 addresses and support remote access. It should be noted that simply commenting out the binding address may cause MySQL to listen only to IPv6 addresses and not IPv4 addresses. This method successfully resolved the issue of not being able to remotely access MySQL. In summary, many software and frameworks default to binding to the address 127.0.0.1, which needs to be manually changed to 0.0.0.0 or ::: to support remote access over IPv4 or IPv6.

HSV Debugging Tool

Tim published on 2023-01-17 included in Tools & Applications

In image processing, especially when using OpenCV, accurately extracting colors in the HSV color space is a common task. However, when there are multiple color targets in the image, color extraction can become complex. To solve this problem, this article introduces a small tool that allows users to set the upper and lower limits of the three HSV values by dragging sliders. The tool displays the adjusted results in real-time on the mask and result layers, greatly simplifying the color extraction process. Users can quickly locate the HSV range of multiple targets, even down to a specific value, by simply dragging the sliders. This article provides a complete Python code example demonstrating how to create a window with six sliders using OpenCV, allowing users to adjust the minimum and maximum values of hue, saturation, and value. The code also shows how to read an image, convert it to the HSV color space, generate a mask based on the set HSV range, and apply the mask to the original image to display the result. In this way, users can easily extract the desired color regions from the image.

Latex Formula Syntax

Tim published on 2023-01-14 included in Tools & Applications

This blog mainly introduces how to use Latex syntax and the KaTeX engine to write and render mathematical formulas. First, the article demonstrates how to write Greek letters and multiline formulas, and through examples, shows mathematical expressions such as matrices, vectors, overbraces, underlines and hats, radicals, fractions, subscripts, multiplication, inequalities, and products. Next, the article explains in detail how to enable KaTeX support in the Hugo theme to ensure that mathematical formulas can be automatically rendered. To avoid escape characters in Markdown documents affecting formula rendering, the article provides some replacement methods for escape characters, such as replacing _ with \_, etc. In addition, the blog introduces KaTeX’s plugin features, including Copy-tex and mhchem. Copy-tex can retain the LaTeX source code when copying formulas, while mhchem is used for writing chemical equations. Finally, the article also introduces the character annotation or comment extension syntax supported by the FixIt theme and the Markdown extension syntax for fractions. Through this content, readers can better master the skills of writing and presenting mathematical formulas.

HDFS WebUI Access Issue

Tim published on 2022-12-19 included in Data Science & Machine Learning

In a Hadoop cluster consisting of 3 datanodes, 1 namenode, and 1 secondary namenode, although command checks show everything is normal, the HDFS web page 50070 and file port 9000 are inaccessible. Upon investigation, two issues were found: first, the 50070 port is not in the service list because the HDFS web port in Hadoop 3.x has changed to 9870, so the 9870 port needs to be accessed to view the web UI. Second, the 9000 port is bound to an internal IP, making it inaccessible. The solution is to modify the fs.defaultFS value in the configuration file to hdfs://0.0.0.0:9000 to ensure normal service on the 9000 port. With these adjustments, the Hadoop cluster’s web page and file port issues are resolved, and the cluster resumes normal operation.