Others Interview Questions and Answers (265) - Page 9

What is 'Dataframe' in 'R' language?

A data frame is a structure in R that holds data and is similar to the datasets found
in standard statistical packages e.g. SAS, SPSS, and Stata. The columns are variables and the rows are observations.
We can have variables of different types (for example, numeric, character) in the same data frame. Data frames are the main structures 'R' use to store datasets.
Explain the concept of 'Vectors' in 'R' language

Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form the vector.
a <- c(1, 2, 5, 3, 6, -2, 4)

b <- c("one", "two", "three")

Here, a is numeric vector, b is a character vector, and c is a logical vector.
The data in a vector must only be one type or mode (numeric, character, or logical). We can’t mix modes in the same vector.

We can refer to elements of a vector using a numeric vector of positions within brackets.For example, a[c(2, 4)] refers to the 2nd and 4th element of vector a.
Explain the concept of 'Matrices' in 'R' language

A matrix is a two-dimensional array where each element has the same mode (numeric,character, or logical). Matrices are created with the matrix function .


myymatrix <- matrix(vector, nrow=number_of_rows, ncol=number_of_columns,

byrow=logical_value, dimnames=list(char_vector_rownames, char_vector_colnames

- 'vector' contains the elements for the matrix
- 'nrow' and 'ncol' specify the row and column dimensions
- 'dimnames' contains optional row and column labels stored in character vectors.
- 'byrow' indicates whether the matrix should be filled in by row (byrow=TRUE) or by column (byrow=FALSE). The default is by column.

Below we will see a 5x4 Matrix creation

> y <- matrix(1:20, nrow=5, ncol=4) q

> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

> cells <- c(1,26,24,68)
> rnames <- c("R1", "R2")
> cnames <- c("C1", "C2") w
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
dimnames=list(rnames, cnames))

> mymatrix
C1 C2
R1 1 26
R2 24 68
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=FALSE,
dimnames=list(rnames, cnames))
> mymatrix e
C1 C2
R1 1 24
R2 26 68

First, we are creating a 5x4 matrix . Then we create a 2x2 matrix with labels and fill the matrix by rows . Finally, we create a 2x2 matrix and fill the matrix by columns.
Explain the concept of 'Array' in 'R' language

Arrays are similar to matrices but can have more than two dimensions. They’re created with an array function of the following form:

myarray <- array(vector, dimensions, dimnames)

- vector contains the data for the array
- dimensions is a numeric vector giving the maximal index for each dimension
- dimnames is an optional list of dimension labels.

The following is an example of creating a three-dimensional (2x3x4) array of numbers.

> dim1 <- c("A1", "A2")

> dim2 <- c("B1", "B2", "B3")
> dim3 <- c("C1", "C2", "C3", "C4")
> z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
> z
, , C1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , C3
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4
B1 B2 B3
A1 19 21 23
A2 20 22 24

Explain the 'Attach' and 'Detach' function of 'R' language

The attach() function adds the data frame to the R search path. When a variable name is encountered, data frames in the search path are checked in order to locate the variable.
The detach() function removes the data frame from the search path.detach() does nothing to the data frame itself.



plot(mpg, disp)
plot(mpg, wt)

What is 'WebScrapping' in 'R' language?

In webscraping, the user extracts information embedded in a web page available over
the internet and saves it into R structures for further analysis. One way to accomplish
this is to download the web page using the readLines() function and manipulate it
with functions such as grep() and gsub() . For complex web pages, the RCurl and
XML packages can be used to extract the information desired.
How can we import data from SPSS into 'R' platform?

SPSS datasets can be imported into R via the read.spss() function in the foreign package . Also, we can use the spss.get() function in the Hmisc package.
spss.get() is a wrapper function that automatically sets many parameters of read.

First, download and install the Hmisc package .


Then use the following code to import the data:


mydataframe <- spss.get("mydata.sav", use.value.labels=TRUE)

In this code, mydata.sav is the SPSS data file to be imported, use.value.labels=TRUE tells the function to convert variables with value labels into R factors
with those same levels, and mydataframe is the resulting R data frame.
How to import data from netCDF into 'R' platform?

Unidata’s netCDF (network Common Data Form) open source software contains machine-independent data formats for the creation and distribution of array-oriented scientific
data. netCDF is commonly used to store geophysical data. The ncdf and ncdf4 packages provide high-level R interfaces to netCDF data files.

The ncdf package provides support for data files created with Unidata’s netCDF library and is available for Windows, Mac OS X, and Linux platforms.

Consider this code:


nc <- nc_open("mynetCDFfile")
myarray <- get.var.ncdf(nc, myvar)

In this example, all the data from the variable myvar , contained in the netCDF file mynetCDFfile , is read and saved into an R array called myarray.
What is the use of 'Title' function in 'R' language?

Use the title() function to add title and axis labels to a plot.

title(main="main title", sub="sub-title", xlab="x-axis label", ylab="y-axis label")

Graphical parameters (such as text size, font, rotation, and color) can also be specified
in the title() function. For example, the following produces a red title and a blue
subtitle, and creates green x and y labels that are 25 percent smaller than the default
text size:

title(main="My Title", col.main="red",

sub="My Sub-title", col.sub="blue",
xlab="My X label", ylab="My Y label",
col.lab="green", cex.lab=0.75)

What is Cyber Forensics?

It is the technique by which we investigate the application and preserve evidence from a particular computing device.It involves documenting the investigation process so as to prove the device's data didn't change during the course of an investigation.Digital copy is the only medium of investigation.
What is cloud broker?

It is a third-party individual or business that acts as an intermediary between the purchaser of a cloud computing service and the sellers of that service.It is also used to describe a software application that facilitates the distribution of work between different cloud service providers.The broker's role is to save the purchaser time by researching services from different vendors and providing the customer with information about how to use cloud computing to support business goals.
What is text mining?

Text mining also known as text data mining or text analytics is the process of discovering high quality information from the textual data sources. The application of text mining techniques to solve specific business problems is called business text analytics or simply text analytics. Text mining techniques can facilitate organizations derive valuable business insight from the wealth of textual information they possess.

Text mining transforms textual data into structured format through the use of several techniques. It involves identification and collection of the textual data sources, NLP techniques like part of speech tagging and syntactic parsing, entity/concept extraction
which identifies named features like people, places, organizations, etc., disambiguation, establishing relationship between different entities/concepts, pattern and trend analysis and visualization techniques.
What is Text Cleansing?

Text cleansing is the process of cleansing noisy text from the textual sources. Noisy textual data can be found in SMSes, email, online chat, news articles, blogs and web pages. Such text may have spelling errors, abbreviations, non-standard terminology,
missing punctuation, misleading case information, as well as false starts, repetitions, and special characters.
Why it is important to clean the NOISES from texts?

Noise can be defined as any kind of difference in the surface form of an electronic text from the original, intended or actual text. The text used in the short message service (SMS) and on-line forums like twitter, chat and discussion boards and social networking sites is often distorted mainly because the recipients can very well understand the shorter form of the longer words and also reduces the time and effort of the sender. Most of the text is created and stored so that humans can understand it, and it is not always easy for a computer to process that text.

With the increase in noisy text data generated in various social communication media, cleansing of such text has become necessary and also because the of-the-shelf NLP techniques generally fail to work because of several reasons like sparsity, outof-
vocabulary words and irregular syntactic structures in such texts.

A few of the cleaning techniques are:

- Removing stop words (deleting very common words like "a", "the", "and", etc.).

- Stemming (ways of combining words that have the same linguistic root or stem).
What are Stop Words and how to remove them?

Stop words are words which are filtered before or after processing of textual data. There is not one definite list of stop words which all tools use, if even used. Some tools specifically avoid removing them to support phrase search. The most common stop words found in the text are “the”, “is”, “at”, “which” and “on”. These kinds of stop words can sometimes cause problems when looking for the phrases that include
them. Some search engines remove some of the most common words from the query on order to improve performance.
What is Stemming?

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form, generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". On the
other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument".
What is tokenization in the context of Test Mining?

Tokenization is the process of breaking piece of text into smaller pieces like words, phrases, symbols and other elements which are called tokens. Even a whole sentence can be considered as a token. During the tokenization process some characters like punctuation marks can be removed. The tokens then become an input for other
processes in text mining like parsing.
What is Part-of-speech (POS) tagging?

Part-of-speech tagging also known as grammatical tagging or wordcategory
disambiguation is the process of assigning a word in the text corresponding to a particular part of speech like noun, verb, pronoun, preposition, adverb, adjective or other lexical class marker to each word in a sentence.
The input to a tagging algorithm is a string of words of a natural language sentence and a specified tagset (a finite list of Part-ofspeech tags). The output is a single best POS tag for each word. Tags play an important role in Natural language applications like
speech recognition, natural language parsing and information retrieval.
What is Syntactical Parsing?

Syntactical parsing is the process of performing syntactical analysis on a string of words, phrase or a sentence according to certain rules of grammar. Syntactical parsing discovers structure in the text and is used to determine if a text conforms to an expected format. It involves breaking of text into different elements and identifying
syntactical relationship between different elements.
The basic idea behind syntactical analysis is to create a syntactic structure or a
parse tree from a sentence in a given natural language text to determine how a sentence is broken down into phrases, how the phrases are broken down into sub-phrases, and all the way down to the actual structure of the words used. In order to parse natural language text two basic grammars are used:- the constituency and
dependency grammars.
What is Information Extraction?

Information extraction identifies the key phrases and relationships within the textual data. This is done by a process called pattern matching which looks for predefined sequences in the text.
Information extraction infers the relationships between all the identified people, places and time from the text to extract the meaningful information. For handling huge volumes of textual data Information extraction can be very useful. The meaningful
information is collected and stores in the data repositories for Knowledge discovery, mining and analysis
Found this useful, bookmark this page to the blog or social networking websites. Page copy protected against web site content infringement by Copyscape

 Interview Questions and Answers Categories