Data processing & analysis

See your data in a different light


Data processing

Once your data has been captured, it will need processing before it is usable. Data retrieved from the web can be messy and require preparation before you let it near your system. We have powerful tools that can turn bad data into good data which will give your business the edge.

Standardisation

Ensuring that data is in a standard format. Data retrieved from different sources often varies in its representation of the same data (i.e. "Mr John Smith" in one system may be "Mr. J. Smith" in another).

Corrective analysis

Correcting invalid or incorrect data. Often, data is entered incorrectly and contains unlikely or impossible values (i.e. a date of 32nd January 2019). Often the correct value can be derived from the surrounding data.

De-duplication

Data retrieved from the web or legacy systems often contains duplicates. These duplicates must be identified and removed to ensure the integrity of the data. It is common for duplicate records to require merging as each of the duplicates may contain valid data.

Verification

Data from just one source may contain errors. For systems that require high levels of accuracy, data can be checked against a second or third source. Our tools perform analysis of each record to identify the correct data and eliminate the error.

Normalisation

Before being used for analysis or stored, data should be normalised. This involves restructuring the data so that it is easily queryable and eliminates performance bottlenecks from poor data schema design.

Custom processing

If required, we can implement custom data processing steps to ensure that the data your system receives adheres to the quality and formats you require. This could be as simple as adding the correct country code to a phone number or as in depth as generating structured entity/relationship maps for the data.

Data analysis/enrichment

Sentiment analysis

Sentiment analysis is the analysis of text to determine the sentiment portrayed within. For some given text, we will identify the sentiment and can give you the probability of that sentiment being accurate. This data can be used to determine the common feeling toward certain topics or events.

Positive

"I really enjoyed my meal at restaurant x."

Negative

"The meal I had at restaurant x was awful."

Neutral

"I went for a meal in restaurant x yesterday."

Subjectivity analysis

Subjectivity analysis will tell you whether a piece of text is subjective or objective. We will also, like Sentiment Analysis, give you the probability expressing how confident we are in our prediction. This data is useful in conjunction with the Sentiment Analysis results above to categorise text to determine the underlying meaning.

Subjective

The text expresses an opinion:
"The service in restaurant x was the best I've ever experienced."

Objective

The text means to express a fact:
"Restuarant x has an extensive wine list and menu."

Aspect based sentiment analysis

Aspect based sentiment analysis extracts various aspects from text and indicates whether the feeling towards them is positive, negative or neutral. For example, a hotel review such as "The room was very spacious and comfortable. The food in the restaurant was dreadful; it was undercooked. The cleaners were very efficient and polite." could be broken down to show:

Accommodation
Food & drink
Staff

We currently support multiple aspect sets (such as hotels as described above, restaurants etc.) and are adding more all the time. Custom aspect sets relating to a specific business domain are fully supported by our service.

Parts of speech analysis

Parts of speech analysis analyses text to identify the parts of speech according to the Penn Treebank II tag set. Each word is analysed to categorise it as a noun, verb, determiner etc. This is normally used as data on which further analysis can be performed.

Object detection

Object detection is the analysis of images to identify common objects within. Our services can currently identify 90 common objects such as people, cars, buses, sandwiches etc. A rectangle is superimposed on the image to identify the area in which the object is detected. Each object is annotated with the object type and the probability of the prediction.

Logo detection

Logo detection is similar to object detection above, but rather than identifying common objects, it identifies logos within images. We are able to train our detection service to identify any logo so if we're not already identifying a particular logo you require, it's no problem! Below is an example of our KFC logo detection.

All of our analysis tools are available as API services for integration into your existing products. Click here for more information.

Get in touch

If you're interested in utilising our data processing & analysis services, get in contact!