Recently, in various application fields such as sensor networks, network monitoring, anomaly intrusion detections, financial tickers, bio-informatics, telecommunications data management, web personalization, and others, data takes the form of continuo ...
Recently, in various application fields such as sensor networks, network monitoring, anomaly intrusion detections, financial tickers, bio-informatics, telecommunications data management, web personalization, and others, data takes the form of continuous data streams rather than finite stored data sets. A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate, and the knowledge embedded in the data stream is more likely to be changed over time. Therefore, to find an embedded knowledge in a data stream efficiently, a novel mining method that differs from the conventional mining method of a fixed data set is required. Especially, considering the changeability of an online data stream over time, identifying the recent change of the data stream can provide valuable information for the data stream analysis. Due to these requirements, it is almost impossible to apply conventional data mining methods to mining task in an online data stream.</br>
In this research, a mining method of frequent patterns over an online transactional data stream is proposed. Moreover, several optimization techniques are also proposed, which make its memory usage be small. In other words, the current set of monitored itemsets in an online data stream is minimized by two major operations: delayed-insertion and pruning. The former is delaying the insertion of a new itemset in new transactions until the itemset becomes significant enough to be monitored. The latter is pruning a monitored itemset when the itemset turns out to be insignificant. The number of monitored itemsets can be flexibly controlled by the thresholds of these two operations. By these operations, the processing time per transaction and that for finding its mining result at any moment of the proposed method can also be minimized.</br>
Generally, most of mining algorithms over a data stream do not differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present. Therefore, they are not able to extract the recent change of information in a data stream adaptively. In this research, to identify the recent change of a data steam, information differentiation techniques over a data stream such as a decay mechanism and a sliding window technique are also proposed. In a mining method over a data stream based on a decay mechanism, the effect of old transactions on the current mining result of the data steam is diminished by decaying the old occurrences of each itemset over time. While, in a mining method over a data stream based on a sliding window technique, the desired life-time of information in a newly generated transaction is defined by the size of a sliding window. Consequently, only recently generated transactions in the range of the window are considered to find the recently frequent itemsets of a data stream.