Abstract by an effective technique. In this paper we

Abstract – As the population of mobile users is increasing day
by day, the data generated by the mobile cellular networks increases
drastically. These data seems to be high in terms of velocity, variety and
value. For efficient use of mobile cellular networks these data need to be
analyzed by an effective technique. In this
paper we review the various methods of analyzing data generated by mobile
cellular networks. We aim to introduce the general background of data generated
by mobile cellular networks and review certain technologies related to this. In this study we
are about to discuss various analysis methods and the case studies involving mobile
big data.

Keywords – Big Data Analytics, Mobile cellular networks,  Hadoop  

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now



Recent decades have witnessed
tremendous increase in data in terms of size, speed, variety, value and veracity
( called 5 V’s of Big data). The term of big data is mainly used to describe this enormous
datasets. The big data is comprised of masses of unstructured data which also
requires real time analysis. If these datasets are effectively organized and
managed many useful and in depth knowledge can be obtained which leads to
finding solutions for various unsolved issues.

                  MOBILE BIG DATA

 Android Apps has provided more than 650,000
applications, covering nearly all




categories. Such
massive data and abundant applications call for mobile analysis, but also bring
about a few challenges. Mobile  sensing,
moving flexibility, noise, and a large amount of redundancy are the unique
characteristics of mobile data. Mobile phones are now useful for building and
maintaining communities, and these communities with geographical locations and
communities based on different cultural backgrounds and interests with their
growing number of users and improved performance. Mobile phones can support
rich interaction at anytime and anywhere while the traditional network
communities or SNS communities are in short of online interaction among
members, and the communities are active only when members are sitting before
computers. Mobile communities are defined as that a group of individuals with the
same hobbies (i.e., health, safety, and entertainment, etc.) gather together on
networks, meet to make a common goal, decide measures through consultation to
achieve the goal, and start to implement their plan. The recent study in
wireless sensor networks and mobile phones has lead to various mobile
applications like real time health tracking etc., medical data from sensors
seem to be of different characteristics in terms of attributes, time and space
relations, as well as physiological relations, etc.

 In addition, such datasets involve privacy and
safety protection. In Garg et al. introduce a multi-modal transport analysis
mechanism of raw data for real-time monitoring of health. Under the
circumstance that only highly comprehensive characteristics related to health
are available, Park et al .in examined approaches to better utilize. Application
has been developed by researchers of Norway university for smart phones, which
analyzes paces when people walk and uses the pace information for unlocking the
safety system.

Apart from online
health tracking applicatins many other applications can be developed from the
analysis of big data generated by mobile cellular networks. Much useful
information can be obtained by geolocating the mobile phones, recording phone
calls and so on which will be helpful for the operators to provide better
performance to their customers. Many new customer friendly applications can be
developed by analyzing these enormous datasets.


Wireless cellular networks have
witnessed tremendous advances in recent decades. Due to ever increasing mobile
applications, mobile cellular networks have become both generators and carriers
of massive data. These data are generated while geo-locating mobile devices,
recording phone calls, and capturing mobile applications’ activities. These
enormous data should be paid much attention, for efficient use of the mobile
cellular networks and to increase the revenue of the mobile cellular operators.
When compared to traditional data analytics Big data analytics would be an
efficient method in analyzing such enormous unstructured data.  Traditional data analytics prove to be
inadequate while encountering data involved in mobile cellular network. Big
data anlytics deals with both structured and unstructured data while only
structured data is dealt in traditional data analytics. In making realtime
decisions traditional data analytics proves to be inadequate. Traditional data
analytics fails in such cases such as to improve the performance of mobile
cellular networks and to increase the revenue of its operators. While these
insightful data is being analysed by big data analytics while traditional data
analytics fail to do so. In mobile cellular networks the complete data of the
customers is scaterred in various business department big data analytics is
capable of collecting these data and extract useful information from these data
while the traditional data analysis concentrates only on specific department..
The big data analytics helps the mobile operators in making dynamic and
autonomous decision rather than traditional data analytics.  


This study discusses
some commonly used algorithms for analyzing wireless network  traffic data and that are analysed and
exploited by specially designed learning units (LUs) installed at both the BSs
and CUs.

Stochastic modeling: Stochastic modeling methods use
probabilistic models to capture the explicit features and dynamics of the data
traf?c 2. Some commonly used stochastic models may include: order-K Markov model,
hidden Markov model, geometric model, time series, linear/nonlinear random
dynamic systems, etc. Markov models and Kalman ?lters are widely used to
predict user mobility and service requirements. The collected user data are
often used for parameter estimation of stochastic models, such as estimating
the transition probability matrix of a Markov chain.


Data mining: Data mining focuses on exploiting the
implicit structures in the mobile data sets. Also taking the mobility
prediction problem as an example, individual user’s mobility pattern could be
extracted and discovered by ?nding the most frequent trajectory segments in the
mobility log. Prediction could be made accordingly by matching the current
trajectory to the mobility pro?le. Clustering is another useful technique to
identify the different patterns in the data sets. It is widely used in
context-aware mobile computing, where a mobile user’s context and behavioral
information, such as sleeping and working, are identi?ed from wireless sensing
data for providing context-related services.

Machine learning: The main objective of machine learning
is to establish functional relationship between input data and output actions,
thus achieving auto-processing capability for unseen patterns of data inputs.
Among the many useful techniques in machine learning applied to wireless
communications, classi?cation (determining the type of input data) and
regression analysis (data ?tting) are two common methods, whose applications
include context identi?cation of mobile usage and prediction of traf?c levels
(classi?cation), or ?tting the distributions of trajectory length, mobile user
location, and channel holding times (regression). Besides, reinforcement
learning, such as Q-learning , is useful for taking proper real-time actions to
maximize certain long-term rewards. A typical example is making the handoff and
admission control decision (action), given the current traf?c load (state) and
incoming new requests (event), in which the reward could be evaluated against
the reduction of dropped calls or failed connections.

Large-scale data analytics: Wireless bigdata poses
many challenges to the aforementioned conventional data-analytical methods due
to its high volume, large dimensionality, uneven data qualities, and the
complex features therein. To improve signal processing ef?ciency, one can
combine the following complexity reduction techniques with the conventional
data analytical tools for large-scale data processing.

Distributed optimization algorithms,
such as primal/dual decomposition and alternating direction method of
multipliers (ADMM), are very useful to decouple large-scale statistical
learning problems into small sub problems for parallel computations so as to
relieve both the computational burden at the CU and the bandwidth pressures to
the fronthaul/backhaul links.

Dimension reduction methods are useful
to reduce the data volume to be processed while capturing the key features of
big data. Among various methods, principle component analysis (PCA), along its
many variants, is the mostly used method today. In addition, tensor
decomposition methods are also popular in mobile data processing, which seek to
approximately represent a high-order multi way array (tensor) as a linear
combination of outer products of low-order tensors. By doing so, the hardware
requirement and cost for storing the high-order arrays of mobile data could be

advanced learning methods could be used to handle incomplete or complex data
sets. Interesting examples include active learning, which deals with partially
labeled data set; online learning for responding in real-time to sequentially
received data; stochastic learning that makes a decision periodically in each
time interval; and deep learning for modeling complex behaviors contained in a data



 The following case studies are discussed as an
illustration of introducing big data analytics into mobile cellular networks.
Here the study focusses on improving network performance and deriving valuable
insights. The application areas are studied to cover different scenarios, from
current deployed mobile cellular networks to upcoming 5G, from network
operational optimization.


Due to the widespread
use of mobile internet, the volume of traf?c data increases at an unprecedented
rate. The internet acts as a carrier of the traf?c data. The cellular operators
are responsible to manage the network resource appropriately to balance network
load and optimize network utilization. Traf?c monitoring and analyzing is an
elementary but essential part for network management, enabling performance
analysis and prediction, failure detection, security management, etc. In the
context of big traf?c data traditional approaches to monitor and analyze the
traf?c data seem, however, straightforward and inadequate. In, the
interrelationship between big data and software-de?ned networking (SDN) has
been studied. Liu et al. 3 proposed a novel large-scale network traf?c
monitoring and analysis system based on a Hadoop platform. The system is
practically deployed in a commercial cellular network with 4.2 Tbytes input
volume every day. The evaluation results indicate that the proposed system is
capable of processing big network-generated data and revealing certain traf?c
and user behavior phenomenon. Since understanding the traf?c dynamics and usage
condition is of signi?cance for improving network performance, the topic of
traf?c characteristics becomes a hot focus. In 5, the authors investigated
three features of network traf?c, namely network access time, traf?c volume,
and diurnal patterns from the perspective of device models. The traf?c
characteristics from the perspective of service providers were revealed in
46. Another angle from operating systems was introduced in 47. All the
above results are bene?cial for cellular network operators to make
corresponding adjustments for network capacity management and revenue growth.


 Location data analysis is informative since human
activities are based on locations. The location based big data arises from GPS
sensors, WiFi, bluetooth through mobile devices, have become precious strategic
resources. These location data would provide support for government
administration, such as public facility planning, transportation system constructions,
demographic trends, risk warnings for crowed people, rapid emergence responses,
crime hot spots analysis, etc. It can also gain amazing business insights, such
as mobile advertising and marketing. In this an end-to-end Hadoop-based system
was developed with a number of functional algorithms operated on call record
details (CRDs). With the information about subscribers’ habits and interests, it
is capable of providing invaluable information about when, where and how a
category of individuals (e.g., sports fans, music lover, et.)move.



 Zhang and Qiu 8 used large random matrices
as building blocks to model the big data arising from a 5G massive MIMO system
that is implemented using software-de?ned radios. They exploited the fact that
all data processing is done at CPU so all the modulated waveforms are stored at
the RAMS or at the hard drives. On the other hand, big data analytics based on
the random-matrix theory is applied to the collected data from their test bed,
where a mobile user communicates with the massive MIMO base station while moving.The
experimental results can estimate the user’s moving speed, whether motionless, at
an early constant speed, at as low speed or at a higher speed. This analytics
is also implemented to re?ect the correlation residing in the transmitted signals.
These applications validate the fact that the massive MIMO system is not only a
communication system, but also a massive data platform which can brings
tremendous values through big data analytics.



 In mobile cellular networks, the transmission
of voice and data is accompanied by control messages, which are termed as
signaling. The signaling works according to the prede?ned protocols and ensure
the communication’s security, reliability, regularity and ef?ciency. Signaling
monitoring plays an important role in appropriate allocation of network
resources, improving the quality of network services, real-time identifying
network problems, and etc. With the rapid development of various mobile
cellular networks, the volume of signaling data grows tremendously and the
traditional signaling monitoring systems have too many problems to deal with.
This discussion  describes a signaling
data monitoring and analyzing system architecture with big data analytics. This
architecture mainly consists of three components: data collecting, data
analyzing and applications. In data collection, various signaling protocols are
copied from multiple network interfaces without interrupting normal operations.
Afterwards, these copies are gathered and ?ltered through the protocol
processor and then sent to the analyzer. In the analyzer. the data is processed
using various algorithms, such as decomposition, correlation analysis, etc. Finally,
the analysis results can be used by various applications. For example, Celibi
et al.  analyzed the BSSAP messages from
A interface in a Hadoop platform to identify handovers from 3G to 2G. The
simulation results show that the identi?ed 3G coverage holes are consistent
with the drive test results.


 One critical task of big data analytics in
mobile cellular networks is the integration of very heterogenous data correlation
mining in massive database. Data sources are rich in types such as data rate,
packet drop, mobility, etc. Different base stations host these data over time.
They need be aggregated across space and time to obtain big data analytics. For
example, for cyber security, there are many different heterogeneous sources, such
as ”numerous distributed packet sniffers, system log ?les, SNMP traps and
queries, user pro?le databases, system messages, and operator commands.”
Essentially, data fusion is a technique to make overall sense of data from
different sources that commonly have different data structures.


This survey has been done to explore
various methods of big data analytics in mobile cellular networks we have studied various case studies involved in
mobile cellular networks and challenges involved in them. This study will be
useful to carry out the work in  future.