Journal of information and communication convergence engineering 2022; 20(2): 103-112

Published online June 30, 2022

https://doi.org/10.6109/jicce.2022.20.2.103

© Korea Institute of Information and Communication Engineering

High Utility Itemset Mining by Using Binary PSO Algorithm with V-shaped Transfer Function and Nonlinear Acceleration Coefficient Strategy

Bodong Tao , Ok Keun Shin , and Hyu Chan Park*

Department of Computer Engineering, Korea Maritime and Ocean University, Busan 49112, Korea

Correspondence to : Hyu Chan Park (E-mail: hcpark@kmou.ac.kr, Tel: +82-51-410-4573)
Department of Computer Engineering, Korea Maritime and Ocean University, Busan 49112, Korea.

Received: March 1, 2022; Revised: May 11, 2022; Accepted: May 12, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The goal of pattern mining is to identify novel patterns in a database. High utility itemset mining (HUIM) is a research direction for pattern mining. This is different from frequent itemset mining (FIM), which additionally considers the quantity and profit of the commodity. Several algorithms have been used to mine high utility itemsets (HUIs). The original BPSO algorithm lacks local search capabilities in the subsequent stage, resulting in insufficient HUIs to be mined. Compared to the transfer function used in the original PSO algorithm, the V-shaped transfer function more sufficiently reflects the probability between the velocity and position change of the particles. Considering the influence of the acceleration factor on the particle motion mode and trajectory, a nonlinear acceleration strategy was used to enhance the search ability of the particles. Experiments show that the number of mined HUIs is 73% higher than that of the original BPSO algorithm, which indicates better performance of the proposed algorithm.

Keywords High utility itemsets, Particle swarm optimization, V-shaped transfer function, Nonlinear acceleration coefficient strategy.

Data mining is the process of searching for information hidden in large amounts of data using algorithms. The searched information can be either nonnumerical or inductive. It can also be used for information management, decision support, and other purposes.

Frequent itemset mining (FIM) is an important data mining technique. Frequent patterns refer to the set of items that appear frequently in a dataset, such as the set of items that appear frequently in transaction data. However, the same item may be purchased more than once in a single transaction, and the profit of each item varies. FIM does not consider these factors. To address these shortcomings of frequent itemsets, high utility itemset mining was proposed.

High-utility itemset mining (HUIM) mines a set of items that satisfy certain conditions using a certain strategy. A mining method is required to mine itemsets with a relatively low frequency and high utility value. For instance, low sales of jewelry, and high profits; high sales of bread and low profits. Therefore, the target to be mined should be defined beforehand and a minimum utility value should be provided as the judgment condition.

The process of finding all itemsets in the database that exceed the minimum utility value is called utility mining, and the identified itemsets are called utility itemsets. In practical application scenarios, the utility of an item set can be measured by a variety of factors, such as weight, profit, or cost, which can be chosen by the users themselves according to their needs [1]. Therefore, utility mining can be more intuitive for decision problems in practical scenarios.

For the traditional HUIM algorithm, when the size of the mined database is large or the number of items included is extremely large, the search space is large and requires a long time. Metaheuristic algorithms can be used to solve such problems.

Metaheuristic algorithm is a modification of the heuristic algorithm that combines a randomized algorithm with a local search algorithm. Kannimuthu developed high utility pattern extraction using a genetic algorithm with ranked mutation of the minimum utility threshold to mine high utility itemsets (HUIs) [2]. This algorithm requires crossover and mutation during the planning process to randomly generate the next solution. However, many calculations are required in the initial stage to find a satisfactory itemset, which requires a long time.

Another metaheuristic algorithm, particle swarm optimization (PSO), is a population-based stochastic optimization algorithm [3]. It is guided by individual and global solutions to determine the optimal solutions by changing the velocity and position of the particles. Compared with genetic algorithms, PSO algorithms save time by eliminating the need to perform crossover and mutation operations.

The original PSO algorithm is primarily used to solve continuous space optimization problems. Generally, it is used to find the optimal solution of a function; however, when mining high utility item sets, each item has only two states, that is, with or without this item. Therefore, a binary PSO (BPSO) algorithm was designed to solve this problem [4].

In BPSO, the discrete problem space is mapped to a continuous particle motion space with appropriate modifications. It retains the velocity position update strategy of classical PSO to find the optimal solution. The updated formula for the particle velocity remains the same; however, the value of an individual position can only be zero or one.

The probability of particle position change is determined by the transfer function. The absolute value of particle velocity should be considered when selecting the transfer function. For larger absolute values of the velocity, the transfer function should provide a higher possibility of position change. Because the velocity of a particle is related to its superiority or inferiority, a larger velocity indicates that it is farther from the optimal solution and vice versa [5].

The original BPSO algorithm uses a transfer function that is a sigmoid function, also known as an S-shaped function. Mirjalili divided the transfer function of the BPSO into S-shaped and V-shaped functions [6]. The experimental analysis of the S-shaped transfer function shows that the function lacks late local search capability. Mirjalili compared the effectiveness of V-shaped and S-shaped transfer functions on benchmark functions in optimization tests and verified the excellent search capability of the V-shaped transfer function on many benchmark functions.

During the particle search process, the variation in the acceleration coefficients of the particles can also affect the way the particles are searched. The acceleration coefficients were divided into linear and nonlinear. According to Sun's experiments, the nonlinear method is more accurate and stable when searching for an optimal solution [7].

Considering the mining ability and accuracy of the algorithm in the process of high utility itemsets mining, we propose a mining method based on the BPSO algorithm with a V-shaped transfer function and nonlinear acceleration coefficient strategy, called VBPSO.

The experiments show that the proposed VBPSO algorithm mines more HUIs than the BPSO. Although the running time of the VBPSO algorithm was longer, the number of HUIs mined in the same unit time was greater than that of the BPSO algorithm.

In this chapter, works related to HUIM and PSO are briefly reviewed.

A. High Utility Itemset Mining

High-utility itemset mining is an evolution of weighted FIM. In weighted FIM, only the number of occurrences of an item and its weight are considered; the number of occurrences of the item in a transaction is not considered. However, in a transaction, the same product may be purchased more than once, and the total profit generated by the transaction cannot be calculated solely based on the profit of a single product. Therefore, to meet the demand of merchants to pursue high sales profit without ignoring profitable but low support items, researchers have proposed high-utility itemset mining. It considers the unit profit (the weight of items) and number (the number of items in the transaction set) of items. Traditional weighted FIM is unsuitable for this scenario.

The problem of mining HUIs was first proposed by Chan [8]. Subsequently, Yao found HUIs in terms of the quantity of an item as an internal utility and its profit per unit as an external utility [9]. To solve this problem, they proposed a mining algorithm. Lin proposed using the PSO algorithm to mine HUIs, and subsequently proposed mining HUIs using the binary PSO algorithm to reduce mining time [1].

Utility is at the core of high utility itemset mining. It is determined by two factors: the external effects between different items in the database (the profit of the item), and the internal effects of the item in the transaction (the number of items). The utility of an item set in the database is the sum of the products of the external and internal utilities of each item in the item set. If the utility of an itemset is not less than the minimum threshold set by the user, the itemset is considered a high utility itemset.

B. Particle Swarm Optimization

The PSO algorithm, proposed by Kennedy and Eberhart in 1995, simulates the behavior of a group of birds searching for food in an area. In this area, there is only one piece of food. None of the birds knew where the food is. However, they know how far their current position is from the food; thus, the optimal strategy to find the food is to search the area around the bird closest to it.

The PSO algorithm is inspired by this biological population behavior property and is used to solve the optimization problem. Particles are used to model the aforementioned bird individuals, and each particle can be considered a search individual in an N-dimensional space. The current position of a particle is a candidate solution to the corresponding optimization problem, and the flight process of the particle is the search process of the individual. The flight velocity of a particle can be dynamically adjusted according to the individual's historical optimal position and the population's historical optimal position (Kennedy and Eberhart 1995).

The steps of PSO can be illustrated as follows:

1. Initialization: First, we set the number of particles and velocity interval, and then we set the maximum number of iterations and position information for the entire search space. The velocity and position of each particle are then randomly initialized within the velocity interval and search space.

2. Iteration: This involves defining the fitness function, the individual optimal solution (pbest) is the optimal solution found by each particle individually, that is, the best result after calculation using the fitness function. The result with the best outcome among these individual optimal solutions is treated as the global optimal solution (gbest). The pbest and gbest values are then compared with the historical values, and the largest values of pbest and gbest are updated and retained. Then, the particle updated its position based on pbest, gbest, and its own inertia. The particle position update method is illustrated in Fig. 1.

Fig. 1. Particle position update method.

3. Termination condition: The global optimal solution of the particle, as well as its position and velocity, are continuously updated until the termination criterion is reached.

During the PSO algorithm update, the corresponding particle position and velocity are described as follows:

vidt+1=w1×vidt+c1×rand×pbestxidt+c2×rand×gbestxidt.

xidt+1=xidt+vidt+1

In Eqs. (1) and (2), t denotes the number of iterations and w1 denotes the inertia factor, which is intended to balance the global and local searches. vid denotes the velocity of the id-th particle, rand() denotes a random number in the range [0,1], xid denotes the position of the id-th particle. Furthermore, c1 and c2 are the individual and social learning factors for each particle, respectively, and are also referred to as cognitive acceleration and social acceleration, respectively. Individual learning and social learning determine the local search ability and global search ability of the particle, respectively. If the value of the individual learning of the particle is larger, the local search ability of the particle is stronger, and if the value of the particle’s social learning is larger, the global search ability of the particle is stronger.

Kennedy and Eberhart suggested that the values of c1 and c2 be set to two to balance the local search and global search ability. However, Ratanweera proposed a linear acceleration strategy in which c1 gradually decreased and c2 gradually increased as the number of iterations increased. This enables the particlés flight velocity to refer more to its own information at the early stage of the algorithm's operation, while focusing more on social information afterward. This allows the final result to be closer to the optimal solution than the PSO algorithm, which uses a constant acceleration value [10].

Subsequently, Feng showed through experimental validation that c1 and c2 provide better results when an asymmetric range of variation (c1 decreases from 2.75 to 1.25, and c2 increases from 0.5 to 2.25) is used [11]. However, this method has the drawback that the particles converge early to the local optimal solution; thus, Chen proposed a nonlinear acceleration adjustment strategy constructed using the arc cos function. Experiments showed that the algorithm improved the convergence of the multipeaked function and enhanced the ability to search for the global optimal solution [12].

Particle swarm algorithms have been widely used in data mining. Jiang proposed a multi-objective particle swarm algorithm to generate association rules for the relationship between the sentiment dimensions and design attributes [13]. Indira proposed an adaptive method for controlling the inertia and learning factors in particle swarm algorithms to mine association rules [14]. Jitendra proposed a method for mining positive and negative association rules. They applied the coding step to affirmative and negative rules separately, in terms of the statistical correlation between transaction databases [15].

However, in practical scenarios, many problems have variables in a discrete variable space, rather than in a continuous interval. Kennedy and Eberhart designed a discrete BPSO to solve the limitations of continuous PSO [4].

The BPSO algorithm maps a discrete problem space to a continuous particle motion space by converting each particle into a set of binary variables. The BPSO requires a transfer function to convert real values to binary values of zero or one. Transfer functions define the possibility of changing the elements of the position vector from zero to one. The original transfer functions were used to perform probabilistic updates using the sigmoid transition functions. The advantage of the V-shaped transfer function over the sigmoid transfer function is that it encourages particles to remain at their current position when the velocity value is low and switch to their complement when the velocity value is high. This allows to enhance the local search capability of the algorithm, thus improving the performance of the BPSO algorithm. Zhang mined frequent itemsets using the BPSO algorithm based on the V-shaped transfer function [16].

High-utility itemset mining is the calculation of the utility value of an itemset from transactions in the database. It differs from FIM in that it considers the quantity and profit of items. The utility value must be calculated using a series of formulae. The problem description and related definitions in high utility itemset mining are described as follows.

Let I = {i1, i2, …, in} be a finite set of items, and item ij (1 ≤ jn) in this set has uniquely determined profit values p(ij). The set of items is called an itemset, and if the itemset contains k items, it is called a k-itemset. The transaction database comprises a transaction table and an external utility table. The transaction table consists of multiple transactions T = {T1, T2, …, Tm}, where each transaction TcT, each transaction Tc has a unique identifier c called TID, and each transaction contains items and their purchases. The minimum utility threshold is then set as min_threshold according to the user’s preference. This will be used as the basis for the evaluation of HUIs.

Table 1 presents an example to demonstrate this thesis. In Table 1, there are ten transactions and six different items, denoted by a to f. Table 2 presents the external utility table (profit table).

Table 1 . Transaction database

TIDTransaction (item, quantity)
T1b:13, c:9, d:12
T2b:5, d:6, e:8, f:15
T3d:15, e:13, f:13
T4b:14, d:7
T5a:6, b:10, d:15, e:10
T6b:6, c:6
T7b:2, d:14
T8a:2, b:6, d:9, e:10
T9b:5, d:13, e:12
T10a:10, b:15, d:11, e:4

Table 2 . Profit table

Itemabcdef
Profit2981057


There are seven definitions for the high utility of itemset calculations as follows.

Definition 1. The utility value of item ij in transaction Tc that contains this item is denoted as u (ij, Tc), which is equal to the number of ij in transaction Tc, denoted as q(ij, Tc), multiplied by the corresponding profit p(ij) which is defined as

uij,Tc=qij,Tc×pij

For example, T1 contains three items b, c, and d, and calculates them separately: u(b, T1) = q(b, T1) × p(b) = 13 × 9 = 117, u(c, T1) = q(c, T1) × p(c) = 9 × 8 = 72, and u(d, T1) = q(d, T1) × p(d) = 12 × 10 = 120.

Definition 2. In transaction Tc, the utility of item set X is denoted as u(X, Tc), which is defined as

uX,Tc= ij xx T c u i j ,Tc

For example, in transaction T1, the utility values of itemsets bc and bcd are calculated as u(bc, T1) = u(b, T1) + u(c, T1) = 117 + 72 = 189 and u(bcd, T1) = u(b, T1) + u(c, T1) + u(d, T1) = 117 + 72 + 120 = 309.

Definition 3. The utility of itemset X in a DB database is denoted as u(X), which is defined as

uX=xTcTcDB uij, T c

For example, in the entire transaction database, the utility of itemsets a and bc are calculated as u(a) = u(a, T5) + u(a, T8) + u(a, T10) = 12 + 4 + 20 = 36 and u(bc) = u(bc, T1) + u(bc, T6) = 117 + 72 + 54 + 48 = 291.

Definition 4. The total utility of a transaction Tc is denoted as tu(Tc) and is defined as

tuTc= x T c ux,Tc

For example, tu(T1) = u(b, T1) + u(c, T1) + u(d, T1) = 117 + 72 + 120 = 309. Similarly, tu(T2) = 250; tu(T3)=306; tu(T4) = 196; tu(T5) = 302; tu(T6) = 102; tu(T7) = 158; tu(T8) = 198; tu(T9) = 235; and tu(T10) = 285.

Definition 5. The total utility of a DB database is denoted as TU and is defined as

TU=TcDBtu T c

The total utility in this example is calculated as TU = 309 + 250 + 306 + 196 + 302 + 102 + 158 + 198 + 235 + 285 = 2341.

Definition 6. A HUI is a set of item sets whose utility is greater than the threshold (TU × min_threshold).

HUI=X|uX>TU×min_threshold

HUIM finds the itemsets with utility values greater than the threshold from the transaction database and the profit table, in which the threshold is set by the user.

In this section, the mining process is described in detail in conjunction with the PSO algorithm.

A. Particle Encoding

The proposed algorithm is based on the BPSO algorithm. The particles are the basic units of the BPSO algorithm, and each particle in the PSO algorithm corresponds to an itemset, that is, a potential HUI. Each position of the item set is composed of either one or zero, which represents the presence or absence of the corresponding item, respectively. As shown in Fig. 2, only the corresponding positions of items a and e are one, and thus the itemset corresponding to particle (10010) is ae.

Fig. 2. Encoding of particle.

B. Binarization

Using the same rule for particle encoding, the items contained in each transaction and the quantities corresponding to each item are converted into a binary matrix and a corresponding quantity matrix according to this rule.

The utility value of the itemset is calculated by scanning the binary matrix and quantity matrix. Binarizing the database can reduce the amount of time spent scanning the database, which must be traversed once each time the utility value of a particle is calculated. The scan time of the binary matrix was shorter than that of the character-based data.

Considering Table 1 as an example, the converted binary item matrix and quantity matrix of this database are shown in Figs. 3 and 4, respectively.

Fig. 3. Converted binary item matrix.
Fig. 4. Converted quantity matrix.

C. Particle Evaluation

In the BPSO algorithm, particle evaluation is based on the results of the fitness function. The fitness value is used to represent the utility value of the item set. The potential solutions generated by each particle are fed into the fitness function, and the merit of the particle is measured according to the magnitude of the fitness value. The best position of an individual is denoted as pbest and the best position in the population is denoted as gbest. They serve as the basis for particle movement. The fitness function in this algorithm can be defined as:

Fitnesspi=uX

Pi is the ith generated particle. X is the set of items in pi which is set to one. The utility value of X was then calculated. If the calculated result is not less than the utility threshold, it is considered to be an element of the HUI and is added to the set of HUIs.

Consider the utility value calculation of item set (ae) as an example. Itemsets (ae) are present in transactions T5, T8, and T10. Therefore, the utility value was calculated as u(ae) = u(ae, T5) + u(ae, T8) + u(ae, T10) = 62 + 54 + 40 = 156.

D. V-shaped Transfer Function

The updated formula for the velocity of the BPSO algorithm is the same as that of the original PSO algorithm, as shown in Eq. (1). The probability of particle position change is based on the transfer function. The transfer function should be the probability of changing each dimension of the position vector from zero to one and vice versa.

The original transfer function for the BPSO position-vector update is a sigmoid function. The sigmoid transfer function is shown in Fig. 5, where X is the particle velocity. The sigmoid function is a monotonically increasing function that does not conform to the rule that the probability of its position change increases with the absolute value of the velocity. The V-shaped function does not force the particle to change its value from zero to one. Alternatively, it updates the position to its complement when the particle is moving extremely fast and keeps the particle at its original position when the velocity is extremely slow. The velocity of the particles in the PSO algorithm represents the quality of the particles. The V-shaped transfer function is shown in Fig. 6.

Fig. 5. S-shaped transfer function.
Fig. 6. V-shaped transfer function.

According to the velocity update formula, when the particle is farther from the optimal solution, the velocity of the particle increases. Therefore, when the velocity of the particle is larger, the particle should have a greater probability of changing its position vector, that is, it should be updated to the complement of that position vector. Therefore, we selected the V-shaped transfer function as the transfer function of the proposed algorithm.

In addition, compared with the sigmoid function, the V-shaped function has more powerful local search capability, which is necessary when mining the HUIs. This is because the resulting result is not only the global optimal solution, but also more potential solutions can be obtained only when it has a stronger local search capability. This is because the more capable the local search is, the wider the search will be.

The V-shaped function used in this study is shown in Eqs. (10) and (11), where xid(t)-1 represents the complement of xid(t) and rand( ) is a random value between zero and one.

Tvidt=|2πarctan2πvidt|

xidt+1=xid t1,ifrand<Tvid txidt,ifrandTvid t.

E. Nonlinear Acceleration Strategy

In Eq. (1), c1 and c2 are the accelerations of the particles. Their values determine how the particle searches; the larger the c1, the stronger the local search ability of the particle, whereas the larger the c2, the stronger the global search ability of the particle. Therefore, the magnitude of their values affects the way particles are searched.

The ideal state in the process of mining HUIs should be to expand the search space of particles at the early stage of the search to obtain diverse particles. With more potential solutions at the early stage, it can provide more references to particles in the search, as well as enhance the ability to search for global optimal solutions and find more HUIs in subsequent stages.

Therefore, considering the impact of acceleration on the BPSO algorithm, Chen used the proposed nonlinear acceleration adjustment strategy constructed by the arc cosine function (Chen, 2017), as shown in Eqs. (12) and (13). The trends of c1 and c2 are shown in Fig. 7. CurrentIterances denotes the current number of iterations, whereas MaxIterances denotes the maximum number of iterations, which is a parameter pre-set by the user before the algorithm is executed. c1min and c2min are the initial values of c1 and c2, respectively, and c1max and c2max are the final values of c1 and c2 iterations, respectively. This strategy brings the search state of the particle closer to the ideal state. Therefore, to increase the diversity of the particles and to find more HUIs, we used a nonlinear acceleration strategy in the velocity update formula of the particles.

Fig. 7. Acceleration coefficients c1 and c2.

c1=c1min+c1maxc1minarcos2×currentIterancesmaxIterancesπ1

c2=c2min+c2maxc2minarcos2×currentIterancesmaxIterancesπ1

F. Mining Process

The designed mining model, based on BPSO, is divided into four steps. The results are shown in Fig. 8.

Fig. 8. Process of mining algorithm.

Pre-processing consists of generating an item binary matrix and a quantity matrix. The preprocessing step focuses on providing the required type of data and facilitating the mining process. Therefore, in this process, the database is converted into a binary matrix and a quantity matrix. We then calculate the total utility value of the database DB.

The initialization process consists of initializing the particles and obtaining pbest and gbest. In the initialization process, the particles are initialized by a binary matrix, and each particle is provided with a random initial velocity; then, the value of each particle is calculated by the fitness function to derive pbest and gbest. Furthermore, in the subsequent iteration process, the particles are updated by performing the corresponding updates for velocity, pbest, and gbest with a V-shaped transfer function and nonlinear acceleration coefficient strategy. During the iteration process, if the utility value of the particle is greater than the set threshold, the corresponding position of the particle is added to the set of utility itemsets at the time of the update. This process is repeated until the upper limit of the number of iterations is reached, and finally, the set of HUIs is obtained.

In this section, the proposed HUIM-VBPSO (v-shaped) algorithm is compared with HUIM-BPSO (s-shaped). All experiments were performed in Microsoft Windows 10, with an Intel i5 3.0 GHz CPU and 16.38 GB RAM. The experiments were conducted simultaneously using three different databases: chess, mushroom, and foodmart datasets, which are commonly used for HUIM experiments. The information on the three databases is presented in Table 3. The number of particles was set to 20, both algorithms were iterated 10,000 times, and the velocity range of the particles was set between [-10, 10].

Table 3 . Dataset

DatasetNumber of itemsTotal number of transactions
Chess763,196
Mushroom1208,124
Foodmart1,55921,557


A. Number of High Utility Itemsets

The percentage of discovered HUIs for HUIM-BPSO and the proposed algorithm HUIM-VBPSO are listed in Table 4. The number of HUIs mined by all algorithms for the three datasets is shown in Figs. 9-11.

Table 4 . Percentage of discovered HUIs

Threshold (%)HUIM-BPSO (%)HUIM-VBPSO (%)
Chess dataset
25.3047.0688.24
25.5057.1492.85
25.7058.6296.55
Mushroom dataset
14.0034.7860.87
14.2538.4666.67
14.5040.0070.00
Foodmart dataset
0.1215.8725.40
0.1318.4228.94
0.1420.0033.44

Fig. 9. Number of HUIs mined from the Chess dataset.
Fig. 10. Number of HUIs mined from the Foodmart dataset.
Fig. 11. Number of HUIs mined from the Foodmart dataset.

The experiments show that the proposed VBPSO algorithm mines more HUIs than the BPSO. According to the experimental data, the proportion of HUIs mined by the BPSO algorithm decreased when the number of items increased, which may be a limitation of the PSO algorithm.

B. Running Time

The running times of the HUIM-BPSO and HUIM-VBPSO are listed in Table 5. The number of HUIs mined by all the algorithms for the three datasets is shown in Figs. 12-14. For the runtime experiments, the thresholds for the three datasets were set to be consistent with those for the HUI mining experiments.

Table 5 . Running time of algorithms (in seconds)

Threshold (%)HUIM-BPSO (s)HUIM-VBPSO (s)
Chess dataset
25.301,0721,475
25.501,0891,496
25.701,0991,483
Mushroom dataset
14.001,8502,451
14.251,7962,475
14.501,8822,468
Foodmart dataset
0.1220,07325,753
0.1320,18625,920
0.1420,12426,968

Fig. 12. Running time of algorithm - Chess dataset.
Fig. 13. Running time of algorithm - Mushroom dataset.
Fig. 14. Running time of algorithm - Foodmart dataset.

The running time of the VBPSO algorithm was longer than that of the BPSO algorithm. This is because, in the early stage, the local search needs to be intensified to expand the search area. This step requires more time than the BPSO. However, the number of HUIs mined in the same unit time was greater than that of the BPSO.

High-utility itemset mining is an important issue in data mining and can replace FIM to reveal high-profit products. To mine such itemsets, this study proposes a new BPSO algorithm called VBPSO, which is an extension of the original BPSO algorithm with a V-shaped transfer function and nonlinear acceleration coefficient strategy. The proposed VBPSO algorithm exhibited better performance. Several experiments showed that although the running time of the proposed VBPSO is 32% longer, the number of mined HUIs is 73% higher compared with the original BPSO. Therefore, it can be used for practical applications that are not real time.

In addition to the PSO algorithm, other metaheuristic algorithms have been proposed in recent years. In the future, we will try to adopt these new algorithms for HUIM to determine whether they can mine more HUIs or reduce the running time.

  1. J. C. -W. Lin, and L. Yang, and P. Fournier-Viger, and T. -P. Hong, A binary PSO approach to mine high-utility itemsets, Soft Comput, vol. 21, no. 17, pp. 5103-5121, Mar, 2016.
    CrossRef
  2. S. Kannimuthu, and K. Premalatha, Discovery of high utility itemsets using genetic algorithm with ranked mutation, Appl Artif Intell, vol. 28, no. 4, pp. 337-359, Apr, 2014.
    CrossRef
  3. J. Kennedy, and R. Eberhart, Particle swarm optimization, in IEEE Int Conf Neural Netw, Perth: WA, Australia, vol. 4, pp. 1942-1948, 1995.
    CrossRef
  4. J. Kennedy, and R. C. Eberhart, A discrete binary version of particle swarm algorithm, in IEEE Int Conf Syst Man Cybern, Orlando: FL, USA, vol. 5, pp. 4104-4108, 1997.
    CrossRef
  5. E. Rashedi and H. Nezamabadi-pour and S. Saryazdi, BGSA: binary gravitational search algorithm, Natural Computing, vol. 9, no. 3, pp. 727-745, Dec, 2009.
    CrossRef
  6. S. Mirjalili, and A. Lewis, S-shaped versus V-shaped transfer functions for binary particle swarm optimization, Swarm and Evolutionary Computation, vol. 9, pp. 1-14, Apr, 2013.
    CrossRef
  7. Sun Xiang, The influence of dynamic nonlinear acceleration coefficients on PSO, Computer Engineering & Science, vol. 33, no. 10, pp. 131-134, Oct, 2011.
  8. R. Chan and Q. Yang and Y. -D. Shen, Mining high utility itemsets, in IEEE Int Conf Data Mining, Melbourne: FL, USA, pp. 19-26, 2003.
    CrossRef
  9. H. Yao and H. J. Hamilton and C. J. Butz, A Foundational Approach to Mining Itemset Utilities from Databases, in Proceedings of the 2004 SIAM International Conference on Data Mining, Lake Buena Vista: FL, USA, pp. 482-486, 2004.
    CrossRef
  10. A. Ratanweera and S. K. Halgamuge and H. C. Waston, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients, IEEE Transactions on evolutionary computation, vol. 8, no. 3, pp. 240-255, Jun, 2004.
    CrossRef
  11. F. Xiang and C. Guo-long and G. Wen-zhong, Setting and experimental analysis of acceleration factor in particle swarm optimization algorithm, Journal of Jimei University (Natural Science), vol. 11, no. 2, pp. 146-151, 2006.
  12. C. Shui-Ii, and C. Guo-rong, and G. Wen-zhong, and C. Gou-long, Study on the Nonlinear Strategy of Acceleration Coefficient in Particle Swarm Optimization (PSO) Algorithm, Journal of Yangtze University(Natural Science Edition) Sci & Eng, vol. 4, 2007.
  13. H. Jiang, and C. K. Kwong, and W. Y. Park, and K. M. Yu, A multiobjective PSO approach of mining association rules for affective design based on online customer reviews, Journal of Engineering Design, vol. 29, no. 7, pp. 381-403, May, 2018.
    CrossRef
  14. K. Indira, and S. Kanmani, Association rule mining through adaptive parameter control in particle swarm optimization, Comput Stat. Springer-V erlag Berlin, vol. 30, no. 1, pp. 251-277, Mar, 2015.
    CrossRef
  15. J. Agrawal, and S. Agrawal, and A. Singhai, and S. Sharma, SET-PSObased approach for mining positive and negative association rules, Knowl Inf Syst, vol. 45, no. 2, pp. 453-471, Nov, 2014.
    CrossRef
  16. Z. Zhong-jie and J. Huang and Y. Wei, Frequent item sets mining from high-dimensional dataset based on a novel binary particle swarm optimization, Journal of Central South Univ, vol. 23, no. 7, pp. 1700-1708, Jul, 2016.
    CrossRef

Bodong Tao

is a Ph.D. student in the Department of Computer Engineering at Korea Maritime and Ocean University. He received his B.S. degree in software engineering from East China Jiaotong University in 2015 and his M.S. degree in computer engineering from Korea Maritime and Ocean University in 2022. His research areas include data mining and big data analysis.


Ok Keun Shin

is a professor in the Department of Computer Engineering at Korea Maritime and Ocean University. He received his Ph.D. from the Universite de Fanche-Comte in 1995. His research areas include signal processing and embedded systems.


Hyu Chan Park

is a professor in the Department of Computer Engineering at Korea Maritime and Ocean University. He received his Ph.D. in computer engineering from the Korea Advanced Institute of Science and Technology in 1995. His research areas include databases, data mining, big data analysis, and marine information systems.


Article

Journal of information and communication convergence engineering 2022; 20(2): 103-112

Published online June 30, 2022 https://doi.org/10.6109/jicce.2022.20.2.103

Copyright © Korea Institute of Information and Communication Engineering.

High Utility Itemset Mining by Using Binary PSO Algorithm with V-shaped Transfer Function and Nonlinear Acceleration Coefficient Strategy

Bodong Tao , Ok Keun Shin , and Hyu Chan Park*

Department of Computer Engineering, Korea Maritime and Ocean University, Busan 49112, Korea

Correspondence to:Hyu Chan Park (E-mail: hcpark@kmou.ac.kr, Tel: +82-51-410-4573)
Department of Computer Engineering, Korea Maritime and Ocean University, Busan 49112, Korea.

Received: March 1, 2022; Revised: May 11, 2022; Accepted: May 12, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The goal of pattern mining is to identify novel patterns in a database. High utility itemset mining (HUIM) is a research direction for pattern mining. This is different from frequent itemset mining (FIM), which additionally considers the quantity and profit of the commodity. Several algorithms have been used to mine high utility itemsets (HUIs). The original BPSO algorithm lacks local search capabilities in the subsequent stage, resulting in insufficient HUIs to be mined. Compared to the transfer function used in the original PSO algorithm, the V-shaped transfer function more sufficiently reflects the probability between the velocity and position change of the particles. Considering the influence of the acceleration factor on the particle motion mode and trajectory, a nonlinear acceleration strategy was used to enhance the search ability of the particles. Experiments show that the number of mined HUIs is 73% higher than that of the original BPSO algorithm, which indicates better performance of the proposed algorithm.

Keywords: High utility itemsets, Particle swarm optimization, V-shaped transfer function, Nonlinear acceleration coefficient strategy.

I. INTRODUCTION

Data mining is the process of searching for information hidden in large amounts of data using algorithms. The searched information can be either nonnumerical or inductive. It can also be used for information management, decision support, and other purposes.

Frequent itemset mining (FIM) is an important data mining technique. Frequent patterns refer to the set of items that appear frequently in a dataset, such as the set of items that appear frequently in transaction data. However, the same item may be purchased more than once in a single transaction, and the profit of each item varies. FIM does not consider these factors. To address these shortcomings of frequent itemsets, high utility itemset mining was proposed.

High-utility itemset mining (HUIM) mines a set of items that satisfy certain conditions using a certain strategy. A mining method is required to mine itemsets with a relatively low frequency and high utility value. For instance, low sales of jewelry, and high profits; high sales of bread and low profits. Therefore, the target to be mined should be defined beforehand and a minimum utility value should be provided as the judgment condition.

The process of finding all itemsets in the database that exceed the minimum utility value is called utility mining, and the identified itemsets are called utility itemsets. In practical application scenarios, the utility of an item set can be measured by a variety of factors, such as weight, profit, or cost, which can be chosen by the users themselves according to their needs [1]. Therefore, utility mining can be more intuitive for decision problems in practical scenarios.

For the traditional HUIM algorithm, when the size of the mined database is large or the number of items included is extremely large, the search space is large and requires a long time. Metaheuristic algorithms can be used to solve such problems.

Metaheuristic algorithm is a modification of the heuristic algorithm that combines a randomized algorithm with a local search algorithm. Kannimuthu developed high utility pattern extraction using a genetic algorithm with ranked mutation of the minimum utility threshold to mine high utility itemsets (HUIs) [2]. This algorithm requires crossover and mutation during the planning process to randomly generate the next solution. However, many calculations are required in the initial stage to find a satisfactory itemset, which requires a long time.

Another metaheuristic algorithm, particle swarm optimization (PSO), is a population-based stochastic optimization algorithm [3]. It is guided by individual and global solutions to determine the optimal solutions by changing the velocity and position of the particles. Compared with genetic algorithms, PSO algorithms save time by eliminating the need to perform crossover and mutation operations.

The original PSO algorithm is primarily used to solve continuous space optimization problems. Generally, it is used to find the optimal solution of a function; however, when mining high utility item sets, each item has only two states, that is, with or without this item. Therefore, a binary PSO (BPSO) algorithm was designed to solve this problem [4].

In BPSO, the discrete problem space is mapped to a continuous particle motion space with appropriate modifications. It retains the velocity position update strategy of classical PSO to find the optimal solution. The updated formula for the particle velocity remains the same; however, the value of an individual position can only be zero or one.

The probability of particle position change is determined by the transfer function. The absolute value of particle velocity should be considered when selecting the transfer function. For larger absolute values of the velocity, the transfer function should provide a higher possibility of position change. Because the velocity of a particle is related to its superiority or inferiority, a larger velocity indicates that it is farther from the optimal solution and vice versa [5].

The original BPSO algorithm uses a transfer function that is a sigmoid function, also known as an S-shaped function. Mirjalili divided the transfer function of the BPSO into S-shaped and V-shaped functions [6]. The experimental analysis of the S-shaped transfer function shows that the function lacks late local search capability. Mirjalili compared the effectiveness of V-shaped and S-shaped transfer functions on benchmark functions in optimization tests and verified the excellent search capability of the V-shaped transfer function on many benchmark functions.

During the particle search process, the variation in the acceleration coefficients of the particles can also affect the way the particles are searched. The acceleration coefficients were divided into linear and nonlinear. According to Sun's experiments, the nonlinear method is more accurate and stable when searching for an optimal solution [7].

Considering the mining ability and accuracy of the algorithm in the process of high utility itemsets mining, we propose a mining method based on the BPSO algorithm with a V-shaped transfer function and nonlinear acceleration coefficient strategy, called VBPSO.

The experiments show that the proposed VBPSO algorithm mines more HUIs than the BPSO. Although the running time of the VBPSO algorithm was longer, the number of HUIs mined in the same unit time was greater than that of the BPSO algorithm.

II. RELATED WORKS

In this chapter, works related to HUIM and PSO are briefly reviewed.

A. High Utility Itemset Mining

High-utility itemset mining is an evolution of weighted FIM. In weighted FIM, only the number of occurrences of an item and its weight are considered; the number of occurrences of the item in a transaction is not considered. However, in a transaction, the same product may be purchased more than once, and the total profit generated by the transaction cannot be calculated solely based on the profit of a single product. Therefore, to meet the demand of merchants to pursue high sales profit without ignoring profitable but low support items, researchers have proposed high-utility itemset mining. It considers the unit profit (the weight of items) and number (the number of items in the transaction set) of items. Traditional weighted FIM is unsuitable for this scenario.

The problem of mining HUIs was first proposed by Chan [8]. Subsequently, Yao found HUIs in terms of the quantity of an item as an internal utility and its profit per unit as an external utility [9]. To solve this problem, they proposed a mining algorithm. Lin proposed using the PSO algorithm to mine HUIs, and subsequently proposed mining HUIs using the binary PSO algorithm to reduce mining time [1].

Utility is at the core of high utility itemset mining. It is determined by two factors: the external effects between different items in the database (the profit of the item), and the internal effects of the item in the transaction (the number of items). The utility of an item set in the database is the sum of the products of the external and internal utilities of each item in the item set. If the utility of an itemset is not less than the minimum threshold set by the user, the itemset is considered a high utility itemset.

B. Particle Swarm Optimization

The PSO algorithm, proposed by Kennedy and Eberhart in 1995, simulates the behavior of a group of birds searching for food in an area. In this area, there is only one piece of food. None of the birds knew where the food is. However, they know how far their current position is from the food; thus, the optimal strategy to find the food is to search the area around the bird closest to it.

The PSO algorithm is inspired by this biological population behavior property and is used to solve the optimization problem. Particles are used to model the aforementioned bird individuals, and each particle can be considered a search individual in an N-dimensional space. The current position of a particle is a candidate solution to the corresponding optimization problem, and the flight process of the particle is the search process of the individual. The flight velocity of a particle can be dynamically adjusted according to the individual's historical optimal position and the population's historical optimal position (Kennedy and Eberhart 1995).

The steps of PSO can be illustrated as follows:

1. Initialization: First, we set the number of particles and velocity interval, and then we set the maximum number of iterations and position information for the entire search space. The velocity and position of each particle are then randomly initialized within the velocity interval and search space.

2. Iteration: This involves defining the fitness function, the individual optimal solution (pbest) is the optimal solution found by each particle individually, that is, the best result after calculation using the fitness function. The result with the best outcome among these individual optimal solutions is treated as the global optimal solution (gbest). The pbest and gbest values are then compared with the historical values, and the largest values of pbest and gbest are updated and retained. Then, the particle updated its position based on pbest, gbest, and its own inertia. The particle position update method is illustrated in Fig. 1.

Figure 1. Particle position update method.

3. Termination condition: The global optimal solution of the particle, as well as its position and velocity, are continuously updated until the termination criterion is reached.

During the PSO algorithm update, the corresponding particle position and velocity are described as follows:

vidt+1=w1×vidt+c1×rand×pbestxidt+c2×rand×gbestxidt.

xidt+1=xidt+vidt+1

In Eqs. (1) and (2), t denotes the number of iterations and w1 denotes the inertia factor, which is intended to balance the global and local searches. vid denotes the velocity of the id-th particle, rand() denotes a random number in the range [0,1], xid denotes the position of the id-th particle. Furthermore, c1 and c2 are the individual and social learning factors for each particle, respectively, and are also referred to as cognitive acceleration and social acceleration, respectively. Individual learning and social learning determine the local search ability and global search ability of the particle, respectively. If the value of the individual learning of the particle is larger, the local search ability of the particle is stronger, and if the value of the particle’s social learning is larger, the global search ability of the particle is stronger.

Kennedy and Eberhart suggested that the values of c1 and c2 be set to two to balance the local search and global search ability. However, Ratanweera proposed a linear acceleration strategy in which c1 gradually decreased and c2 gradually increased as the number of iterations increased. This enables the particlés flight velocity to refer more to its own information at the early stage of the algorithm's operation, while focusing more on social information afterward. This allows the final result to be closer to the optimal solution than the PSO algorithm, which uses a constant acceleration value [10].

Subsequently, Feng showed through experimental validation that c1 and c2 provide better results when an asymmetric range of variation (c1 decreases from 2.75 to 1.25, and c2 increases from 0.5 to 2.25) is used [11]. However, this method has the drawback that the particles converge early to the local optimal solution; thus, Chen proposed a nonlinear acceleration adjustment strategy constructed using the arc cos function. Experiments showed that the algorithm improved the convergence of the multipeaked function and enhanced the ability to search for the global optimal solution [12].

Particle swarm algorithms have been widely used in data mining. Jiang proposed a multi-objective particle swarm algorithm to generate association rules for the relationship between the sentiment dimensions and design attributes [13]. Indira proposed an adaptive method for controlling the inertia and learning factors in particle swarm algorithms to mine association rules [14]. Jitendra proposed a method for mining positive and negative association rules. They applied the coding step to affirmative and negative rules separately, in terms of the statistical correlation between transaction databases [15].

However, in practical scenarios, many problems have variables in a discrete variable space, rather than in a continuous interval. Kennedy and Eberhart designed a discrete BPSO to solve the limitations of continuous PSO [4].

The BPSO algorithm maps a discrete problem space to a continuous particle motion space by converting each particle into a set of binary variables. The BPSO requires a transfer function to convert real values to binary values of zero or one. Transfer functions define the possibility of changing the elements of the position vector from zero to one. The original transfer functions were used to perform probabilistic updates using the sigmoid transition functions. The advantage of the V-shaped transfer function over the sigmoid transfer function is that it encourages particles to remain at their current position when the velocity value is low and switch to their complement when the velocity value is high. This allows to enhance the local search capability of the algorithm, thus improving the performance of the BPSO algorithm. Zhang mined frequent itemsets using the BPSO algorithm based on the V-shaped transfer function [16].

III. PROBLEM STATEMENT

High-utility itemset mining is the calculation of the utility value of an itemset from transactions in the database. It differs from FIM in that it considers the quantity and profit of items. The utility value must be calculated using a series of formulae. The problem description and related definitions in high utility itemset mining are described as follows.

Let I = {i1, i2, …, in} be a finite set of items, and item ij (1 ≤ jn) in this set has uniquely determined profit values p(ij). The set of items is called an itemset, and if the itemset contains k items, it is called a k-itemset. The transaction database comprises a transaction table and an external utility table. The transaction table consists of multiple transactions T = {T1, T2, …, Tm}, where each transaction TcT, each transaction Tc has a unique identifier c called TID, and each transaction contains items and their purchases. The minimum utility threshold is then set as min_threshold according to the user’s preference. This will be used as the basis for the evaluation of HUIs.

Table 1 presents an example to demonstrate this thesis. In Table 1, there are ten transactions and six different items, denoted by a to f. Table 2 presents the external utility table (profit table).

Table 1 . Transaction database.

TIDTransaction (item, quantity)
T1b:13, c:9, d:12
T2b:5, d:6, e:8, f:15
T3d:15, e:13, f:13
T4b:14, d:7
T5a:6, b:10, d:15, e:10
T6b:6, c:6
T7b:2, d:14
T8a:2, b:6, d:9, e:10
T9b:5, d:13, e:12
T10a:10, b:15, d:11, e:4

Table 2 . Profit table.

Itemabcdef
Profit2981057


There are seven definitions for the high utility of itemset calculations as follows.

Definition 1. The utility value of item ij in transaction Tc that contains this item is denoted as u (ij, Tc), which is equal to the number of ij in transaction Tc, denoted as q(ij, Tc), multiplied by the corresponding profit p(ij) which is defined as

uij,Tc=qij,Tc×pij

For example, T1 contains three items b, c, and d, and calculates them separately: u(b, T1) = q(b, T1) × p(b) = 13 × 9 = 117, u(c, T1) = q(c, T1) × p(c) = 9 × 8 = 72, and u(d, T1) = q(d, T1) × p(d) = 12 × 10 = 120.

Definition 2. In transaction Tc, the utility of item set X is denoted as u(X, Tc), which is defined as

uX,Tc= ij xx T c u i j ,Tc

For example, in transaction T1, the utility values of itemsets bc and bcd are calculated as u(bc, T1) = u(b, T1) + u(c, T1) = 117 + 72 = 189 and u(bcd, T1) = u(b, T1) + u(c, T1) + u(d, T1) = 117 + 72 + 120 = 309.

Definition 3. The utility of itemset X in a DB database is denoted as u(X), which is defined as

uX=xTcTcDB uij, T c

For example, in the entire transaction database, the utility of itemsets a and bc are calculated as u(a) = u(a, T5) + u(a, T8) + u(a, T10) = 12 + 4 + 20 = 36 and u(bc) = u(bc, T1) + u(bc, T6) = 117 + 72 + 54 + 48 = 291.

Definition 4. The total utility of a transaction Tc is denoted as tu(Tc) and is defined as

tuTc= x T c ux,Tc

For example, tu(T1) = u(b, T1) + u(c, T1) + u(d, T1) = 117 + 72 + 120 = 309. Similarly, tu(T2) = 250; tu(T3)=306; tu(T4) = 196; tu(T5) = 302; tu(T6) = 102; tu(T7) = 158; tu(T8) = 198; tu(T9) = 235; and tu(T10) = 285.

Definition 5. The total utility of a DB database is denoted as TU and is defined as

TU=TcDBtu T c

The total utility in this example is calculated as TU = 309 + 250 + 306 + 196 + 302 + 102 + 158 + 198 + 235 + 285 = 2341.

Definition 6. A HUI is a set of item sets whose utility is greater than the threshold (TU × min_threshold).

HUI=X|uX>TU×min_threshold

HUIM finds the itemsets with utility values greater than the threshold from the transaction database and the profit table, in which the threshold is set by the user.

IV. MINING METHODOLOGY AND PROCESS

In this section, the mining process is described in detail in conjunction with the PSO algorithm.

A. Particle Encoding

The proposed algorithm is based on the BPSO algorithm. The particles are the basic units of the BPSO algorithm, and each particle in the PSO algorithm corresponds to an itemset, that is, a potential HUI. Each position of the item set is composed of either one or zero, which represents the presence or absence of the corresponding item, respectively. As shown in Fig. 2, only the corresponding positions of items a and e are one, and thus the itemset corresponding to particle (10010) is ae.

Figure 2. Encoding of particle.

B. Binarization

Using the same rule for particle encoding, the items contained in each transaction and the quantities corresponding to each item are converted into a binary matrix and a corresponding quantity matrix according to this rule.

The utility value of the itemset is calculated by scanning the binary matrix and quantity matrix. Binarizing the database can reduce the amount of time spent scanning the database, which must be traversed once each time the utility value of a particle is calculated. The scan time of the binary matrix was shorter than that of the character-based data.

Considering Table 1 as an example, the converted binary item matrix and quantity matrix of this database are shown in Figs. 3 and 4, respectively.

Figure 3. Converted binary item matrix.
Figure 4. Converted quantity matrix.

C. Particle Evaluation

In the BPSO algorithm, particle evaluation is based on the results of the fitness function. The fitness value is used to represent the utility value of the item set. The potential solutions generated by each particle are fed into the fitness function, and the merit of the particle is measured according to the magnitude of the fitness value. The best position of an individual is denoted as pbest and the best position in the population is denoted as gbest. They serve as the basis for particle movement. The fitness function in this algorithm can be defined as:

Fitnesspi=uX

Pi is the ith generated particle. X is the set of items in pi which is set to one. The utility value of X was then calculated. If the calculated result is not less than the utility threshold, it is considered to be an element of the HUI and is added to the set of HUIs.

Consider the utility value calculation of item set (ae) as an example. Itemsets (ae) are present in transactions T5, T8, and T10. Therefore, the utility value was calculated as u(ae) = u(ae, T5) + u(ae, T8) + u(ae, T10) = 62 + 54 + 40 = 156.

D. V-shaped Transfer Function

The updated formula for the velocity of the BPSO algorithm is the same as that of the original PSO algorithm, as shown in Eq. (1). The probability of particle position change is based on the transfer function. The transfer function should be the probability of changing each dimension of the position vector from zero to one and vice versa.

The original transfer function for the BPSO position-vector update is a sigmoid function. The sigmoid transfer function is shown in Fig. 5, where X is the particle velocity. The sigmoid function is a monotonically increasing function that does not conform to the rule that the probability of its position change increases with the absolute value of the velocity. The V-shaped function does not force the particle to change its value from zero to one. Alternatively, it updates the position to its complement when the particle is moving extremely fast and keeps the particle at its original position when the velocity is extremely slow. The velocity of the particles in the PSO algorithm represents the quality of the particles. The V-shaped transfer function is shown in Fig. 6.

Figure 5. S-shaped transfer function.
Figure 6. V-shaped transfer function.

According to the velocity update formula, when the particle is farther from the optimal solution, the velocity of the particle increases. Therefore, when the velocity of the particle is larger, the particle should have a greater probability of changing its position vector, that is, it should be updated to the complement of that position vector. Therefore, we selected the V-shaped transfer function as the transfer function of the proposed algorithm.

In addition, compared with the sigmoid function, the V-shaped function has more powerful local search capability, which is necessary when mining the HUIs. This is because the resulting result is not only the global optimal solution, but also more potential solutions can be obtained only when it has a stronger local search capability. This is because the more capable the local search is, the wider the search will be.

The V-shaped function used in this study is shown in Eqs. (10) and (11), where xid(t)-1 represents the complement of xid(t) and rand( ) is a random value between zero and one.

Tvidt=|2πarctan2πvidt|

xidt+1=xid t1,ifrand<Tvid txidt,ifrandTvid t.

E. Nonlinear Acceleration Strategy

In Eq. (1), c1 and c2 are the accelerations of the particles. Their values determine how the particle searches; the larger the c1, the stronger the local search ability of the particle, whereas the larger the c2, the stronger the global search ability of the particle. Therefore, the magnitude of their values affects the way particles are searched.

The ideal state in the process of mining HUIs should be to expand the search space of particles at the early stage of the search to obtain diverse particles. With more potential solutions at the early stage, it can provide more references to particles in the search, as well as enhance the ability to search for global optimal solutions and find more HUIs in subsequent stages.

Therefore, considering the impact of acceleration on the BPSO algorithm, Chen used the proposed nonlinear acceleration adjustment strategy constructed by the arc cosine function (Chen, 2017), as shown in Eqs. (12) and (13). The trends of c1 and c2 are shown in Fig. 7. CurrentIterances denotes the current number of iterations, whereas MaxIterances denotes the maximum number of iterations, which is a parameter pre-set by the user before the algorithm is executed. c1min and c2min are the initial values of c1 and c2, respectively, and c1max and c2max are the final values of c1 and c2 iterations, respectively. This strategy brings the search state of the particle closer to the ideal state. Therefore, to increase the diversity of the particles and to find more HUIs, we used a nonlinear acceleration strategy in the velocity update formula of the particles.

Figure 7. Acceleration coefficients c1 and c2.

c1=c1min+c1maxc1minarcos2×currentIterancesmaxIterancesπ1

c2=c2min+c2maxc2minarcos2×currentIterancesmaxIterancesπ1

F. Mining Process

The designed mining model, based on BPSO, is divided into four steps. The results are shown in Fig. 8.

Figure 8. Process of mining algorithm.

Pre-processing consists of generating an item binary matrix and a quantity matrix. The preprocessing step focuses on providing the required type of data and facilitating the mining process. Therefore, in this process, the database is converted into a binary matrix and a quantity matrix. We then calculate the total utility value of the database DB.

The initialization process consists of initializing the particles and obtaining pbest and gbest. In the initialization process, the particles are initialized by a binary matrix, and each particle is provided with a random initial velocity; then, the value of each particle is calculated by the fitness function to derive pbest and gbest. Furthermore, in the subsequent iteration process, the particles are updated by performing the corresponding updates for velocity, pbest, and gbest with a V-shaped transfer function and nonlinear acceleration coefficient strategy. During the iteration process, if the utility value of the particle is greater than the set threshold, the corresponding position of the particle is added to the set of utility itemsets at the time of the update. This process is repeated until the upper limit of the number of iterations is reached, and finally, the set of HUIs is obtained.

V. EXPERIMENTS AND DISCUSSION

In this section, the proposed HUIM-VBPSO (v-shaped) algorithm is compared with HUIM-BPSO (s-shaped). All experiments were performed in Microsoft Windows 10, with an Intel i5 3.0 GHz CPU and 16.38 GB RAM. The experiments were conducted simultaneously using three different databases: chess, mushroom, and foodmart datasets, which are commonly used for HUIM experiments. The information on the three databases is presented in Table 3. The number of particles was set to 20, both algorithms were iterated 10,000 times, and the velocity range of the particles was set between [-10, 10].

Table 3 . Dataset.

DatasetNumber of itemsTotal number of transactions
Chess763,196
Mushroom1208,124
Foodmart1,55921,557


A. Number of High Utility Itemsets

The percentage of discovered HUIs for HUIM-BPSO and the proposed algorithm HUIM-VBPSO are listed in Table 4. The number of HUIs mined by all algorithms for the three datasets is shown in Figs. 9-11.

Table 4 . Percentage of discovered HUIs.

Threshold (%)HUIM-BPSO (%)HUIM-VBPSO (%)
Chess dataset
25.3047.0688.24
25.5057.1492.85
25.7058.6296.55
Mushroom dataset
14.0034.7860.87
14.2538.4666.67
14.5040.0070.00
Foodmart dataset
0.1215.8725.40
0.1318.4228.94
0.1420.0033.44

Figure 9. Number of HUIs mined from the Chess dataset.
Figure 10. Number of HUIs mined from the Foodmart dataset.
Figure 11. Number of HUIs mined from the Foodmart dataset.

The experiments show that the proposed VBPSO algorithm mines more HUIs than the BPSO. According to the experimental data, the proportion of HUIs mined by the BPSO algorithm decreased when the number of items increased, which may be a limitation of the PSO algorithm.

B. Running Time

The running times of the HUIM-BPSO and HUIM-VBPSO are listed in Table 5. The number of HUIs mined by all the algorithms for the three datasets is shown in Figs. 12-14. For the runtime experiments, the thresholds for the three datasets were set to be consistent with those for the HUI mining experiments.

Table 5 . Running time of algorithms (in seconds).

Threshold (%)HUIM-BPSO (s)HUIM-VBPSO (s)
Chess dataset
25.301,0721,475
25.501,0891,496
25.701,0991,483
Mushroom dataset
14.001,8502,451
14.251,7962,475
14.501,8822,468
Foodmart dataset
0.1220,07325,753
0.1320,18625,920
0.1420,12426,968

Figure 12. Running time of algorithm - Chess dataset.
Figure 13. Running time of algorithm - Mushroom dataset.
Figure 14. Running time of algorithm - Foodmart dataset.

The running time of the VBPSO algorithm was longer than that of the BPSO algorithm. This is because, in the early stage, the local search needs to be intensified to expand the search area. This step requires more time than the BPSO. However, the number of HUIs mined in the same unit time was greater than that of the BPSO.

VI. CONCLUSIONS

High-utility itemset mining is an important issue in data mining and can replace FIM to reveal high-profit products. To mine such itemsets, this study proposes a new BPSO algorithm called VBPSO, which is an extension of the original BPSO algorithm with a V-shaped transfer function and nonlinear acceleration coefficient strategy. The proposed VBPSO algorithm exhibited better performance. Several experiments showed that although the running time of the proposed VBPSO is 32% longer, the number of mined HUIs is 73% higher compared with the original BPSO. Therefore, it can be used for practical applications that are not real time.

In addition to the PSO algorithm, other metaheuristic algorithms have been proposed in recent years. In the future, we will try to adopt these new algorithms for HUIM to determine whether they can mine more HUIs or reduce the running time.

Fig 1.

Figure 1.Particle position update method.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 2.

Figure 2.Encoding of particle.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 3.

Figure 3.Converted binary item matrix.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 4.

Figure 4.Converted quantity matrix.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 5.

Figure 5.S-shaped transfer function.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 6.

Figure 6.V-shaped transfer function.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 7.

Figure 7.Acceleration coefficients c1 and c2.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 8.

Figure 8.Process of mining algorithm.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 9.

Figure 9.Number of HUIs mined from the Chess dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 10.

Figure 10.Number of HUIs mined from the Foodmart dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 11.

Figure 11.Number of HUIs mined from the Foodmart dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 12.

Figure 12.Running time of algorithm - Chess dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 13.

Figure 13.Running time of algorithm - Mushroom dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Fig 14.

Figure 14.Running time of algorithm - Foodmart dataset.
Journal of Information and Communication Convergence Engineering 2022; 20: 103-112https://doi.org/10.6109/jicce.2022.20.2.103

Table 1 . Transaction database.

TIDTransaction (item, quantity)
T1b:13, c:9, d:12
T2b:5, d:6, e:8, f:15
T3d:15, e:13, f:13
T4b:14, d:7
T5a:6, b:10, d:15, e:10
T6b:6, c:6
T7b:2, d:14
T8a:2, b:6, d:9, e:10
T9b:5, d:13, e:12
T10a:10, b:15, d:11, e:4

Table 2 . Profit table.

Itemabcdef
Profit2981057

Table 3 . Dataset.

DatasetNumber of itemsTotal number of transactions
Chess763,196
Mushroom1208,124
Foodmart1,55921,557

Table 4 . Percentage of discovered HUIs.

Threshold (%)HUIM-BPSO (%)HUIM-VBPSO (%)
Chess dataset
25.3047.0688.24
25.5057.1492.85
25.7058.6296.55
Mushroom dataset
14.0034.7860.87
14.2538.4666.67
14.5040.0070.00
Foodmart dataset
0.1215.8725.40
0.1318.4228.94
0.1420.0033.44

Table 5 . Running time of algorithms (in seconds).

Threshold (%)HUIM-BPSO (s)HUIM-VBPSO (s)
Chess dataset
25.301,0721,475
25.501,0891,496
25.701,0991,483
Mushroom dataset
14.001,8502,451
14.251,7962,475
14.501,8822,468
Foodmart dataset
0.1220,07325,753
0.1320,18625,920
0.1420,12426,968

References

  1. J. C. -W. Lin, and L. Yang, and P. Fournier-Viger, and T. -P. Hong, A binary PSO approach to mine high-utility itemsets, Soft Comput, vol. 21, no. 17, pp. 5103-5121, Mar, 2016.
    CrossRef
  2. S. Kannimuthu, and K. Premalatha, Discovery of high utility itemsets using genetic algorithm with ranked mutation, Appl Artif Intell, vol. 28, no. 4, pp. 337-359, Apr, 2014.
    CrossRef
  3. J. Kennedy, and R. Eberhart, Particle swarm optimization, in IEEE Int Conf Neural Netw, Perth: WA, Australia, vol. 4, pp. 1942-1948, 1995.
    CrossRef
  4. J. Kennedy, and R. C. Eberhart, A discrete binary version of particle swarm algorithm, in IEEE Int Conf Syst Man Cybern, Orlando: FL, USA, vol. 5, pp. 4104-4108, 1997.
    CrossRef
  5. E. Rashedi and H. Nezamabadi-pour and S. Saryazdi, BGSA: binary gravitational search algorithm, Natural Computing, vol. 9, no. 3, pp. 727-745, Dec, 2009.
    CrossRef
  6. S. Mirjalili, and A. Lewis, S-shaped versus V-shaped transfer functions for binary particle swarm optimization, Swarm and Evolutionary Computation, vol. 9, pp. 1-14, Apr, 2013.
    CrossRef
  7. Sun Xiang, The influence of dynamic nonlinear acceleration coefficients on PSO, Computer Engineering & Science, vol. 33, no. 10, pp. 131-134, Oct, 2011.
  8. R. Chan and Q. Yang and Y. -D. Shen, Mining high utility itemsets, in IEEE Int Conf Data Mining, Melbourne: FL, USA, pp. 19-26, 2003.
    CrossRef
  9. H. Yao and H. J. Hamilton and C. J. Butz, A Foundational Approach to Mining Itemset Utilities from Databases, in Proceedings of the 2004 SIAM International Conference on Data Mining, Lake Buena Vista: FL, USA, pp. 482-486, 2004.
    CrossRef
  10. A. Ratanweera and S. K. Halgamuge and H. C. Waston, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients, IEEE Transactions on evolutionary computation, vol. 8, no. 3, pp. 240-255, Jun, 2004.
    CrossRef
  11. F. Xiang and C. Guo-long and G. Wen-zhong, Setting and experimental analysis of acceleration factor in particle swarm optimization algorithm, Journal of Jimei University (Natural Science), vol. 11, no. 2, pp. 146-151, 2006.
  12. C. Shui-Ii, and C. Guo-rong, and G. Wen-zhong, and C. Gou-long, Study on the Nonlinear Strategy of Acceleration Coefficient in Particle Swarm Optimization (PSO) Algorithm, Journal of Yangtze University(Natural Science Edition) Sci & Eng, vol. 4, 2007.
  13. H. Jiang, and C. K. Kwong, and W. Y. Park, and K. M. Yu, A multiobjective PSO approach of mining association rules for affective design based on online customer reviews, Journal of Engineering Design, vol. 29, no. 7, pp. 381-403, May, 2018.
    CrossRef
  14. K. Indira, and S. Kanmani, Association rule mining through adaptive parameter control in particle swarm optimization, Comput Stat. Springer-V erlag Berlin, vol. 30, no. 1, pp. 251-277, Mar, 2015.
    CrossRef
  15. J. Agrawal, and S. Agrawal, and A. Singhai, and S. Sharma, SET-PSObased approach for mining positive and negative association rules, Knowl Inf Syst, vol. 45, no. 2, pp. 453-471, Nov, 2014.
    CrossRef
  16. Z. Zhong-jie and J. Huang and Y. Wei, Frequent item sets mining from high-dimensional dataset based on a novel binary particle swarm optimization, Journal of Central South Univ, vol. 23, no. 7, pp. 1700-1708, Jul, 2016.
    CrossRef
JICCE
Sep 30, 2022 Vol.20 No.3, pp. 143~233

Stats or Metrics

Share this article on

  • line
  • mail

Journal of Information and Communication Convergence Engineering Jouranl of information and
communication convergence engineering
(J. Inf. Commun. Converg. Eng.)

eISSN 2234-8883
pISSN 2234-8255