The long long journey...The long long journey...
This article details the application scenarios of fine-tuning, prompt engineering, and chain of thought techniques. Using blog post generation as an example, it provides specific steps for preparing data, fine-tuning the model, and evaluating results. By fine-tuning OpenAI's newly released GPT-4o-mini model via an online interface, its performance is close to GPT-4, with a cost only half of GPT-3.5.
Standard output (stdout) and standard error (stderr) are two concepts that, while simple, play a core role in log recording, error handling, and data stream management. This article will explore the differences and applications of stdout and stderr, especially how to use them effectively in a Python environment.
Standard Output (stdout) and Standard Error (stderr) In most operating systems, standard output and standard error are two main output streams of a process. They provide a mechanism for the process to send information and error messages to a terminal or file. Although these two streams might be physically the same (for example, both displayed on the same terminal interface), they are used for different purposes logically:
Standard Output (stdout): Typically used to output the results of program execution or normal running information. Standard Error (stderr): Specifically used for outputting error messages or warnings, which are usually intended to be seen or recorded even when standard output is redirected. print and logging in Python In Python, the print function by default sends information to stdout, while the logging module sends log messages to stderr by default. This is done to differentiate normal program output from log (including error and debug information) output, making it easier for developers to manage and filter output information.
Background When it’s necessary to share or analyze web content, long screenshots are an extremely practical form, as they can completely display a page. However, when dealing with these long screenshots, how to maintain their integrity and readability while making them convenient for subsequent operations has always been a challenge. For example, currently (early 2024), mainstream AI image models still cannot process very large, complex pictures. If a long screenshot is forced into a model, it will lead to poor performance of the model output (many details cannot be recognized). To solve this problem, I have developed a tool based on OpenCV, aimed at simplifying the process of handling long screenshots while preserving their content’s integrity and readability.
This project is open-sourced on my Github: https://github.com/Tim-Saijun/Web-page-Screenshot-Segmentation
Different from many existing tools or methods, Web-page Screenshot Segmentation uses OpenCV to automatically identify and follow the natural separation lines of web content, automatically finding the most suitable segmentation points. This means that titles, paragraphs, or charts can be neatly preserved in the segmented images without content breaks or omissions.
Language models led by GPT have completely changed the writing of crawlers. Previously, crawling a specific site might require special configuration or processing due to each site’s unique structure to extract desired information. However, with GPT, it’s not impossible for a crawler to extract any information it wants from all sites. To this end, I wrote a general crawler that uses GPT to extract information during the crawling process and open-sourced it on Github.
Introduction GPT-Web-Crawler is a web crawler based on python and puppeteer that can crawl web pages and extract content from them (including page titles, URLs, keywords, descriptions, all text content, all images, and screenshots). It is very easy to use - just a few lines of code are needed to crawl web pages and extract content, making it quite suitable for those not familiar with web crawling and hoping to use web crawlers to extract content from web pages.
Introduction Definition of the IoT Technical Understanding The Internet of Things (IoT) refers to an intelligent network where objects’ information is collected through intelligent sensing devices, transmitted over networks, and processed at designated information centers, ultimately achieving automated information exchange and processing among objects, and between humans and objects. Application Understanding The IoT integrates all objects in the world into one network, forming the IoT, which then connects with the existing “Internet” to integrate human society with physical systems, achieving finer and dynamic management of production and life. Common Understanding Combining RFID (Radio-Frequency Identification) and WSN (Wireless Sensor Network) to provide services in monitoring, command dispatch, remote data collection and measurement, and remote diagnosis for users in their production and living. Characteristics of the IoT Comprehensive Perception Using RFID, sensors, QR codes, etc., to obtain information about objects anytime and anywhere. Reliable Transmission Through the integration of networks and the internet, the information of objects is transmitted to users in real-time and accurately. Intelligent Processing Utilizing computing, data mining, and artificial intelligence technologies, such as fuzzy recognition, to analyze and process massive data and information and intelligently control objects. Conceptual Model of the IoT Perception (sensing layer), Transmission (network layer), and Computing (application layer)
Foreword The Aurora theme, developed by @San Diamond, is a Hexo theme that’s atmospheric and beautiful. However, as a design intended for the public, there are still some niche needs that are not met. Therefore, I made modifications according to my own needs, which is Aurora-s. It is important to note that:
In my modifications, there are some text prompts under the loading animations that cannot be customized. The modified version of Aurora-s is based on Aurora V2.5.2. Efforts will be made to keep up with the original updates, but full adoption cannot be guaranteed. Aurora is almost impeccable in terms of aesthetics, but still lacks in functionality, such as the reading experience. The above image was captured on a notebook’s small screen original page, showing that the spacing between components is very large, and the content display density is very low, requiring constant scrolling on small screens, which is a very poor reading experience (not so noticeable on large screens). Therefore, the main purpose of the magic modification is to increase display density and optimize the reading experience.