> How do I extract individual lines from text files based on certain criteria?

How do I extract individual lines from text files based on certain criteria?

Posted at: 2014-12-18 
Export your IRC logs as Delimited files.

Import the Delimited file into a DataBase Program.

Now you should be able to sort and make queries to extract specific data from your new database.

Used this method successfully many times for various logs, etc.

If your willing to figure out a programming language to do this in, I suggest you do so.

This can be done using string manipulation.

Say you want to filter this:

08/16/14 12:08:41 am it's fine haha

You can filter and save this in many different ways, by date, by name () or by text using string count.

Most programming languages have readline functions. Every line is ended with a line break, i.e Return Key.

If you wanted to filter by date(by year) and save it to a file it will go like this:

Substring("08/16/14 12:08:41 am it's fine haha", 6, 2) -- This type of function will start the filter search on the 6th character of the string, and search 2 character length. Which in turn will return "14"- then you can save it on to a file that's in year 2014.

The easiest way to probably do this without any compiler is through VBScript, or some type of server side scripting such as Php. You could probably try Javascript as well.

Look into Substring with VBScript, PhP or Javascript. Then look into opening, reading and saving files.

Combine your code and you should be able to do what you want to do, and then some.

This can also be achieved by SQL. However, it requires manual work such as copying and pasting.

The work would probably be similar.

Import your data into a SQL. SQLs also have substring functions (its coded/called differently for different SQL languages) Query by text:

SELECT * from DATA where substring("08/16/14 12:08:41 am it\'s fine haha", 6, 2)

or somewhere of similar code (not sure if that's the exact code for it). This will return all of the text that are "14" or that year.

Select all the rows and copy and paste to a txt file.

Goodluck.

I have ~11 years of IRC logs, and I'd like to take everything I've said in that time and put it into a single (or several divided by year or whatever) text file, with the aim of making a markov chatbot. I literally have no idea where to start with this. Most of the documentation I've found online is about the actual coding of a bot, and not about compiling your corpus.

The logs are text files, and each line begins with a timestamp, and then with the speaker's name in pointy brackets (somehow in my time doing html I didn't learn what < and > are called!), like this:

08/16/14 12:08:41 am it's fine haha

Is there any way to automate/speed up searching through a file, finding lines that start with [timestamp] <[name]>, and copying those to another text file? Extra helpful if I can get it to find multiple names, but if I have to run through the process 2-3 times to get everything I'm looking for that's fine too.

If I need to use a programming language of some kind I'm willing to figure it out, I just need a point in the right direction. Otherwise I'm going to be condemned to my incredibly inefficient process of merging files using the command prompt, editing in tabs in something like Word, then pasting this whole thing in excel and sorting alphabetically to get all my lines in one place.

Thank you for reading this question and for any help you can give me!