The recent advance in sensor and wireless network technologies enables us to overcome the limitation imposed by space and time in collecting and monitoring data in real time for decision making. Due to its resource constraints such as built-in memory, ...
The recent advance in sensor and wireless network technologies enables us to overcome the limitation imposed by space and time in collecting and monitoring data in real time for decision making. Due to its resource constraints such as built-in memory, microprocessor, battery based power, and limited network bandwidth in wireless sensor networks, it is not easy for a sensor node to send its entire data to other nodes and to collect large volume of data from other nodes. In addition, the communication cost is usually higher than the data processing cost in energy consumption aspect. Therefore, stream data processing research is very important under the limited power and wireless communication environment.
In this dissertation, we propose several data processing techniques for multivariate stream data collected from a data centric application area. The proposed multivariate stream data processing techniques contain multivariate stream data reduction, window-based data classification and state-based temporal classification methods. In the studies of multivariate stream data reduction, the existing data reduction techniques have been compared each other and evaluated using the various experimental data. The proposed window-based data classification method uses document classification algorithms for which multivariate stream data are preprocessed into strings. Finally, in the state-based temporal classification, stream data are segmented into fixed-size windows and then they are transformed into a window-class list. In classification model construction, sequence-class patterns are found from a set of training stream data with help of AprioriAll algorithm which finds maximum sequential patterns. From the sequence-class patterns and associated timing constraints on state classes, TPN (Time Petri Net) model is constructed each of which corresponds to a classification model. In the course of classification, sample data are translated into a window-class list each element of which is fed into the constructed TPN as a token. This dissertation addresses the following subjects in details.
Analysis of the sensor network architectures and data model: The hierarchical/distributed clustering is a general architecture in sensor networks. In a hierarchical sensor network, from leaf node to server node, data transmission strategy and data storage type are different in a viewpoint of spatial or conceptual aspect. In this dissertation, we present a general WSN architecture and a data model developed with the consideration of sensor data type, storage structure, query type, and application characteristics.
Evaluation of multivariate stream data reduction methods: A typical wireless sensor network composed of wireless devices and sensors has restrictions in handling stream data due to limited network bandwidth, microprocessor, battery power, and real-time reaction requirement. To solve these problems, many researchers have studied data cleaning, filtering, approximate data, and data reduction in sensor networks. In this dissertation, we evaluate the existing multivariate data reduction techniques according to the fixed window size-based data characteristics and then show the experimental results.
A window based multivariate stream data classification: Since the stream data in sensor network is window-based multivariate data and each attribute has continuous signal values, it is impossible to apply existing tuple based data classification methods to stream data directly (such as: Decision tree and Bayesian classifier) without data pre-processing. To handle these problems, we propose a two-step method. In the first step, a continuous sensor stream is transformed into a sequence of symbols based on the data changes, and then from the sequence subsequences are generated by the n-gram method. In the second step, in order to classify the stream data, text classification algorithms are applied to the collection of the generated subsequences.