Text Data Mining Coursework

Now, such an outcome could be called actionable knowledge,

because a consumer can take the knowledge and make a decision, and act on it.

So, in this case text mining supplies knowledge for optimal decision making.

But again, the two are not so clearly distinguished, so

we don't necessarily have to make a distinction.

Text retrieval is also needed for knowledge provenance.

And this roughly corresponds to the interpretation of text

mining as turning text data into actionable knowledge.

Once we find the patterns in text data, or

actionable knowledge, we generally would have to verify the knowledge.

By looking at the original text data.

So the users would have to have some text retrieval support, go back to the original

text data to interpret the pattern or to better understand an analogy or

to verify whether a pattern is really reliable.

So this is a high level introduction to the concept of text mining,

and the relationship between text mining and retrieval.

Similarly, a geo sensor would sense the location and then report.

The location specification, for

example, in the form of longitude value and latitude value.

A network sends over the monitor network traffic,

or activities in the network and are reported.

Some digital format of data.

Similarly we can think of humans as subjective sensors.

That will observe the real world and from some perspective.

And then humans will express what they have observed in the form of text data.

So, in this sense, human is actually a subjective sensor that would also

sense what's happening in the world and

then express what's observed in the form of data, in this case, text data.

Now, looking at the text data in this way has an advantage of being

able to integrate all types of data together.

And that's indeed needed in most data mining problems.

So, but by treating text data as the data observed from human sensors,

we can treat all this data together in the same framework.

So the data mining problem is basically to turn such data,

turn all the data in your actionable knowledge to that we can take advantage

of it to change the real world of course for better.

So this means the data mining problem is

basically taking a lot of data as input and giving actionable knowledge as output.

Inside of the data mining module, you can also see

we have a number of different kind of mining algorithms.

And this is because, for different kinds of data,

we generally need different algorithms for mining the data.

For example,

video data might require computer vision to understand video content.

And that would facilitate the more effective mining.

And we also have a lot of general algorithms that are applicable

to all kinds of data and those algorithms, of course, are very useful.

Although, for a particular kind of data,

we generally want to also develop a special algorithm.

So this course will cover specialized algorithms that

are particularly useful for mining text data.

[MUSIC]

Text and data mining are the computer-based processes of extracting relevant information and/or patterns from machine-readable text or data. The work usually involves examining large data sets. Text and data mining is used across disciplines, from history to biology, computer science to political science.

Depending on your discipline and the kind of data you want to mine, the process may be called something other than text mining or data mining. Related terms include: text data mining, text analytics, content mining, audio mining, software mining, image mining, metadata mining, and video mining.

Try it out

The Proceedings of the Old Bailey, 1674-1913, is a fully searchable collection of 197,745 criminal trials held at London's central criminal court. The Old Bailey Demonstrator facilitates the dynamic exploration of trial results and the export of trial texts and collections of trial URLs to the suite of linguistic analysis tools from Voyant Tools.

Google’s n-Gram Viewer and Bookworm let you graph occurrences of words in Google Books and a variety of open digital collections, respectively.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *