Research on Clustering Analysis Algorithm Based on GT4

1 Introduction

The popularity and application of computer network technology has brought about earth-shaking changes in people's lives, and at the same time generated a lot of messy data on the network. The development of grid technology and web technology provides new technical support for people to find valuable information from distributed network resources, and also produces many grid-based data mining systems. Data mining algorithms are the main measure to determine the performance of a data mining system. The design of any software system is inseparable from the algorithm. The execution efficiency of data mining technology is also related to data mining algorithms. With the maturity and development of database technology and data mining technology, data such as classification, clustering, decision tree, association, etc. The mining algorithm is quite mature. It can study existing data mining methods, data mining patterns and data mining processes to build a grid-based data mining system. The author takes the clustering analysis algorithm in many data mining algorithms as an example to introduce the design process of data mining algorithm based on GT4 (Globus Tookit 4.0, GT4 Core Web Service Core).

Clustering Analysis is a widely used data mining algorithm. The theoretical research and practical application of the algorithm are very mature. Applying this mature theory to grid-based distributed systems will greatly improve the data. The efficiency of the excavation. This paper mainly studies how to apply CURE (Cluster Using Representa TIon) algorithm and K-average method algorithm for cluster analysis to GT4 data mining system.

2. System structure design

The data source based on the GT4 data mining system is a distributed data source, and the distributed data source refers to a data source system that is physically distributed and logically concentrated. In this system, each computer in this grid is a node of this grid, called a grid node. Among the many nodes, there is a grid node to control and manage other nodes. This node is called the grid center control node, and the decision support is completed by the grid center control node. If a data mining task is to be completed, the idle mesh node can first complete the mining task of the node according to the mining requirement, and then the grid center control node summarizes the data mining situation of each node. The information managed by the local mesh node has limitations, and the scope involved is small. It mainly manages the data of a single node and summarizes the data mining results of the local nodes. However, the data of these local nodes and the data of the global nodes are certain. Associated. According to the above analysis, the data mining task under the grid platform is completed by global data mining and local data mining.

3. Algorithm Web Service Design

3.1 Web Service Design of Global Clustering Algorithm

The relationship between the global control grid nodes and the local grid nodes in the grid environment can be understood as the relationship between the upper and lower layers, so that we can learn from the hierarchical clustering analysis algorithm according to the hierarchical bottom-up clustering method. Think of the global control node as the top level of hierarchical clustering. The global clustering algorithm of this subject draws on the traditional use of representative point clustering algorithm CURE.

The CURE algorithm combines hierarchical methods with partitioning methods, using a representative, fixed number of spatial points to represent a cluster. At the beginning of the algorithm, each point is a cluster, and then the nearest cluster is combined until the number of clusters is the required K. First, each data point, that is, the local grid node, is regarded as a cluster, and then They are then contracted toward the center with a specific shrinkage factor.

The main steps of the CURE algorithm are as follows:

(1) randomly extracting the sample set from the data source sample object to generate a sample set S;

(2) dividing the sample set S into a group of partitions, each partition size being S/p;

(3) performing local clustering on each divided part;

(4) Eliminating abnormal data whose cluster growth is too slow by random sampling;

(5) clustering the local clusters, and the representative points falling in each newly formed cluster shrink or move toward the cluster center according to the user-defined contraction factor;

(6) Mark the data in the cluster with the cluster number with the corresponding mark.

With data mining algorithms, you can complete data mining tasks. The main function of the global clustering algorithm is to respond to the user's data mining request, send the corresponding request to the local mesh node, and sort out the mining result of the local mesh node. The structure of the global clustering algorithm Web Service resource includes four parts: algorithm Web Service interface, algorithm resource attribute document, algorithm function implementation and algorithm function release.

The most important step in the global mining of data using the traditional clustering algorithm is to deploy the global clustering algorithm to GT4. The Web Service design of the global clustering algorithm has to go through the following steps:

The first step is to describe the data mining service interface in WSDL (Web Service Description Language, the XLM language provided by Web Service). The service interface can be defined in Java. The Java-to-WSDL tool is used to convert the Java-defined interface to WSDL file.

Step 2: Write the Global Clustering Algorithm (CURE) code in Java;

The third step: deploy the file with WSDD configuration file and JNDI (GT4 own file);

The fourth step: use the Ant tool to package all the above files to generate a GAR file;

Step 5: Deploy a global data mining service to the Web Service container.

3.2 Web Service Design of Local Clustering Algorithm

The main function of the local clustering algorithm is to complete the data mining task of the local grid node and upload the data mining result to the global control node. The data mining task of local mesh nodes is similar to the traditional single-machine data mining task. The local clustering algorithm uses the traditional clustering algorithm K-averaging method, and K is used as the parameter to divide N objects into K clusters and clusters. It has a high degree of similarity, and the similarity between clusters is low [34]. The data mining tasks of this thesis are mainly implemented by local mesh nodes. The following is a detailed introduction to the main implementation process of the K-average algorithm:

(1) K objects are arbitrarily selected from the data set as the initial center of each cluster.

(2) According to the existing cluster center situation, the distance formula is used to calculate the distance of other objects to the center of each cluster. (Optional distance formulas are: Euclid, line formula, distance formula, Manhatan distance formula, Mingkowski distance formula).

(3) The object is assigned to the cluster corresponding to the nearest center according to the distance value of each of the obtained objects.

(4) Regenerate the center of each cluster.

(5) Determine whether it converges. If it converges, that is, the cluster does not change, then the division is stopped, otherwise, (2) to (5) are repeated.

The K-average algorithm is a classic clustering algorithm. The K-average algorithm is deployed to GT4, and the Web Service design of the local clustering algorithm is completed. The deployment method is similar to the global algorithm.

4 Conclusion

The data mining service resources in the data mining system based on GT4 have the central control node of the grid (ie, the global node) for unified management. In the process of local grid node mining, the best data set is allocated to the local according to its processing capability. Nodes, so that the computing load of the entire system is relatively balanced. The size of its data mining system can dynamically scale with the number of services. When the system wants to add new local mining nodes, you only need to deploy local Web Service resources. Applying the grid to the distributed data mining system and building a grid-based data mining system will definitely make it widely used in various fields.

Buzzer with Speaker for Alarm System Appliance, This is used for audio, video, radio, automotive,  motocycle, bus, bike, truck .

1) Professional wire harness manufacturer for home appliance, Electronic, industrial control, medical and so on. We have good business with some top 500 future company, like Lenovo, Huawei, HP, Eaton, etc.

2) Our wire complies with UL/CE certification; all comply with above ROHS 2.0. Support customer especially request like no red phosphorus. Our company has got ISO2000 certificates.

3) We accept OEM/ODM design, welcome any customized cable too, whenever you have 500pcs or 100K pcs demand, we will reply you quickly and quote you best prices if we can support. 

4) Strict quality control, Passed TS Quality control system, All the products are 100% test before delivery.


Buzzer Wire Harness

Wiring Harness Kit,Electronic Buzzer Automatic Wire Harness,Buzzer Automatic Wire Harness,Buzzer Wire Harness

Dongguan YAC Electric Co,. LTD. ,