This study developed an analytical method based on zero-inflated regression models, one of the special application forms of mixture models, as a method to extract meaningful information from data with network structure We propose the extension and app ...
This study developed an analytical method based on zero-inflated regression models, one of the special application forms of mixture models, as a method to extract meaningful information from data with network structure We propose the extension and application methods and examples of models that can be used in various fields. Recently, as the use of various mobile devices has expanded, vast amounts of data have been collected and analyzed more than ever before. The so-called Big Data is becoming a hot topic in various fields, and is a field where researchers, as well as many companies and stakeholders are focused. In general, big data is considered as an opportunity for new development and leap by extracting valuable information from these vast amounts of data beyond the meaning of massive data. Among the analyzes using this vast amount of data, it is relatively common that the network.
In the actual network analysis, the analysis is carried out with a network of a part of the whole data. However, even if only a part is connected, the object of the analysis is the whole network, so the sparsity of the network connection causes difficulties in estimating the model. In this study, we examined the problem of this part. Structured zeros are the cases where there is no possibility of actual observation from a statistical point of view, and sampling zeros, which are probable but not observed in practice, are called sampling zeros.
This case is applied to network analysis. In other words, there is no connection possibility at all between the specific points in the network and the case where there is connection possibility but not connected at the collected network data. In a large network of many entities, it seems to be complicated, but if you actually represent it as an adjacency matrix, the connections between entities are very rare. That is, the adjacent matrix is represented by a sparse matrix in which most of the values of the matrix are zero. Therefore, if we can distinguish between the sampling values and the structural zero values for many of these zero values, we can explain the phenomenon by associating the potential connectivity with the sampling values in addition to the observed connectivity.
This study proposes an extended model based on the distribution of the excess mixture based on the exponential family random graph models. These most network graph models have been used for analysis by using only network information on the presence or absence of connections. However, the model proposed in this study is an extended form to analyze various types of network data. In particular, empirical analysis shows that the limitations of the existing index probability graph model can be overcome and that it is possible to provide effective information different from existing ones through analysis of various network data.