Read in Integers Spearated by New Line From Text File
Reading and Writing Text Files
Overview
Teaching: 60 min
Exercises: 30 minQuestions
How tin I read in data that is stored in a file or write data out to a file?
Objectives
Exist able to open a file and read in the data stored in that file
Understand the departure between the file proper noun, the opened file object, and the data read in from the file
Be able to write output to a text file with simple formatting
Why do we want to read and write files?
Being able to open and read in files allows u.s. to piece of work with larger data sets, where it wouldn't be possible to type in each and every value and store them one-at-a-time as variables. Writing files allows u.s.a. to process our data and so save the output to a file then we can look at it later.
Right now, we will do working with a comma-delimited text file (.csv) that contains several columns of data. Notwithstanding, what y'all learn in this lesson can be applied to any general text file. In the next lesson, yous volition learn some other mode to read and process .csv data.
Paths to files
In order to open a file, nosotros demand to tell Python exactly where the file is located, relative to where Python is currently working (the working directory). In Spyder, we can exercise this by setting our electric current working directory to the binder where the file is located. Or, when we provide the file name, we tin give a complete path to the file.
Lesson Setup
We will work with the do file Plates_output_simple.csv.
- Locate the file Plates_output_simple.csv in the directory home/Desktop/workshops/bash-git-python.
- Copy the file to your working directory, home/Desktop/workshops/YourName.
- Brand sure that your working directory is besides set to the folder dwelling house/Desktop/workshops/YourName.
- As yous are working, brand sure that you save your file opening script(south) to this directory.
The File Setup
Allow's open and examine the structure of the file Plates_output_simple.csv. If you open the file in a text editor, you will see that the file contains several lines of text.
Even so, this is fairly difficult to read. If y'all open the file in a spreadsheet program such as LibreOfficeCalc or Excel, y'all can come across that the file is organized into columns, with each column separated past the commas in the image above (hence the file extension .csv, which stands for comma-separated values).
The file contains one header row, followed by eight rows of data. Each row represents a single plate image. If we look at the cavalcade headings, we tin can meet that nosotros have collected data for each plate:
- The name of the image from which the information was collected
- The plate number (there were four plates, with each plate imaged at two different time points)
- The growth condition (either control or experimental)
- The observation timepoint (either 24 or 48 hours)
- Colony count for the plate
- The boilerplate colony size for the plate
- The percentage of the plate covered by bacterial colonies
We will read in this information file and then work to analyze the information.
Opening and reading files is a three-step procedure
We will open and read the file in three steps.
- We will create a variable to hold the proper name of the file that we want to open.
- We will call a open to open the file.
- We will telephone call a function to actually read the data in the file and store it in a variable then that we can procedure it.
And so, there's one more footstep to do!
- When we are done, we should call back to shut the file!
You can think of these iii steps as being similar to checking out a book from the library. Start, you take to go to the catalog or database to find out which book you need (the filename). Then, yous take to go and get it off the shelf and open the book up (the open part). Finally, to proceeds any data from the book, yous take to read the words (the read function)!
Here is an example of opening, reading, and closing a file.
#Create a variable for the file name filename = 'Plates_output_simple.csv' #This is simply a cord of text #Open the file infile = open up ( filename , 'r' ) # 'r' says we are opening the file to read, infile is the opened file object that we will read from #Store the data from the file in a variable data = infile . read () #Impress the data in the file print ( data ) #close the file infile . close () Once nosotros take read the data in the file into our variable information, we can care for it like any other variable in our code.
Use consistent names to make your code clearer
Information technology is a good idea to develop some consistent habits about the way y'all open up and read files. Using the same (or similar!) variable names each fourth dimension will brand it easier for you to proceed track of which variable is the proper name of the file, which variable is the opened file object, and which variable contains the read-in data.
In these examples, nosotros volition use
filenamefor the text string containing the file name,infilefor the open up file object from which we tin read in data, anddatafor the variable holding the contents of the file.
Commands for reading in files
There are a diversity of commands that permit us to read in data from files.
infile.read() will read in the entire file as a single cord of text.
infile.readline() will read in one line at a time (each time you phone call this control, information technology reads in the next line).
infile.readlines() volition read all of the lines into a list, where each line of the file is an detail in the list.
Mixing these commands tin have some unexpected results.
#Create a variable for the file name filename = 'Plates_output_simple.csv' #Open the file infile = open up ( filename , 'r' ) #Print the first two lines of the file impress ( infile . readline ()) print ( infile . readline ()) #phone call infile.read() print ( infile . read ()) #close the file infile . close () Notice that the infile.read()control started at the third line of the file, where the starting time two infile.readline() commands left off.
Recall of information technology like this: when the file is opened, a pointer is placed at the elevation left corner of the file at the beginning of the first line. Any time a read function is called, the cursor or pointer advances from where it already is. The commencement infile.readline() started at the get-go of the file and advanced to the end of the first line. At present, the pointer is positioned at the beginning of the second line. The second infile.readline() advanced to the finish of the second line of the file, and left the pointer positioned at the showtime of the third line. infile.read() began from this position, and advanced through to the end of the file.
In general, if you lot desire to switch between the different kinds of read commands, you should close the file and then open it again to start over.
Reading all of the lines of a file into a list
infile.readlines() will read all of the lines into a listing, where each line of the file is an item in the list. This is extremely useful, considering once we take read the file in this way, we tin can loop through each line of the file and process it. This approach works well on data files where the information is organized into columns like to a spreadsheet, considering it is likely that we will want to handle each line in the same way.
The example below demonstrates this arroyo:
#Create a variable for the file name filename = "Plates_output_simple.csv" #Open the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines : #lines is a listing with each item representing a line of the file if 'control' in line : print ( line ) #print lines for control condition infile . close () #close the file when you're done! Using .dissever() to carve up "columns"
Since our data is in a .csv file, we can use the split control to separate each line of the file into a list. This tin be useful if nosotros desire to access specific columns of the file.
#Create a variable for the file name filename = "Plates_output_simple.csv" #Open the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines : sline = line . split up ( ',' ) # separates line into a list of items. ',' tells it to split the lines at the commas print ( sline ) #each line is now a list infile . close () #Always shut the file! Consistent names, again
At commencement glance, the variable name
slinein the instance to a higher place may non make much sense. In fact, we chose it to be an abbreviation for "dissever line", which exactly describes the contents of the variable.Yous don't take to use this naming convention if you don't want to, but you should piece of work to use consistent variable names beyond your code for common operations like this. It will make it much easier to open an old script and apace understand exactly what it is doing.
Converting text to numbers
When we called the
readlines()command in the previous code, Python reads in the contents of the file as a string. If we want our code to recognize something in the file as a number, we need to tell it this!For example,
bladder('5.0')volition tell Python to treat the text string 'v.0' every bit the number 5.0.int(sline[4])will tell our code to treat the text string stored in the fifth position of the list sline as an integer (non-decimal) number.For each line in the file, the ColonyCount is stored in the fifth column (index 4 with our 0-based counting).
Modify the lawmaking above to print the line only if the ColonyCount is greater than 30.Solution
#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines [ i :]: #skip the first line, which is the header sline = line . split ( ',' ) # separates line into a list of items. ',' tells information technology to divide the lines at the commas colonyCount = int ( sline [ iv ]) #store the colony count for the line as an integer if colonyCount > 30 : print ( sline ) #shut the file infile . close ()
Writing data out to a file
Oftentimes, we will want to write information to a new file. This is peculiarly useful if nosotros have done a lot of computations or information processing and nosotros want to exist able to save it and come up back to information technology later.
Writing a file is the same multi-stride process
Just like reading a file, we will open and write the file in multiple steps.
- Create a variable to hold the name of the file that we want to open. Often, this will be a new file that doesn't yet exist.
- Call a function to open the file. This time, we will specify that we are opening the file to write into information technology!
- Write the data into the file. This requires some careful attention to formatting.
- When we are washed, we should remember to close the file!
The lawmaking below gives an instance of writing to a file:
filename = "output.txt" #w tells python we are opening the file to write into it outfile = open ( filename , 'west' ) outfile . write ( "This is the starting time line of the file" ) outfile . write ( "This is the second line of the file" ) outfile . close () #Close the file when nosotros're done! Where did my file end up?
Any time y'all open a new file and write to it, the file will exist saved in your electric current working directory, unless yous specified a unlike path in the variable filename.
Newline characters
When you examine the file you merely wrote, y'all will come across that all of the text is on the same line! This is because we must tell Python when to commencement on a new line by using the special string grapheme '\n'. This newline graphic symbol will tell Python exactly where to start each new line.
The instance below demonstrates how to utilise newline characters:
filename = 'output_newlines.txt' #w tells python nosotros are opening the file to write into it outfile = open ( filename , 'w' ) outfile . write ( "This is the outset line of the file \northward " ) outfile . write ( "This is the second line of the file \n " ) outfile . shut () #Shut the file when we're done! Go open the file yous just wrote and and cheque that the lines are spaced correctly.:
Dealing with newline characters when you read a file
You may accept noticed in the terminal file reading case that the printed output included newline characters at the cease of each line of the file:
['colonies02.tif', '2', 'exp', '24', '84', '3.2', '22\n']
['colonies03.tif', '3', 'exp', '24', '792', '3', '78\north']
['colonies06.tif', 'two', 'exp', '48', '85', '5.2', '46\northward']Nosotros can go rid of these newlines by using the
.strip()function, which will get rid of newline characters:#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open the file infile = open ( filename , 'r' ) lines = infile . readlines () for line in lines [ 1 :]: #skip the outset line, which is the header sline = line . strip () #become rid of abaft newline characters at the end of the line sline = sline . divide ( ',' ) # separates line into a list of items. ',' tells it to split the lines at the commas colonyCount = int ( sline [ 4 ]) #store the colony count for the line as an integer if colonyCount > xxx : print ( sline ) #close the file infile . shut ()
Writing numbers to files
Only like Python automatically reads files in equally strings, the write()office expects to only write strings. If we want to write numbers to a file, we will demand to "cast" them as strings using the role str().
The code below shows an example of this:
numbers = range ( 0 , ten ) filename = "output_numbers.txt" #w tells python we are opening the file to write into information technology outfile = open up ( filename , 'w' ) for number in numbers : outfile . write ( str ( number )) outfile . close () #Shut the file when we're washed! Writing new lines and numbers
Become open and examine the file you just wrote. You volition encounter that all of the numbers are written on the same line.
Change the code to write each number on its own line.
Solution
numbers = range ( 0 , x ) #Create the range of numbers filename = "output_numbers.txt" #provide the file proper name #open the file in 'write' mode outfile = open up ( filename , 'w' ) for number in numbers : outfile . write ( str ( number ) + ' \northward ' ) outfile . shut () #Close the file when nosotros're washed!The file you just wrote should be saved in your Working Directory. Open the file and bank check that the output is correctly formatted with i number on each line.
Opening files in different 'modes'
When we accept opened files to read or write data, we accept used the role parameter
'r'or'w'to specify which "way" to open the file.
'r'indicates nosotros are opening the file to read information from it.
'w'indicates we are opening the file to write data into information technology.Exist very, very careful when opening an existing file in 'due west' mode.
'w'will over-write any data that is already in the file! The overwritten data will be lost!If you want to add on to what is already in the file (instead of erasing and over-writing it), yous can open the file in append mode past using the
'a'parameter instead.
Pulling it all together
Read in the data from the file Plates_output_simple.csv that we accept been working with. Write a new csv-formatted file that contains only the rows for control plates.
You will need to do the following steps:
- Open up the file.
- Use
.readlines()to create a list of lines in the file. Then close the file!- Open a file to write your output into.
- Write the header line of the output file.
- Use a for loop to permit you to loop through each line in the list of lines from the input file.
- For each line, check if the growth condition was experimental or control.
- For the control lines, write the line of information to the output file.
- Close the output file when yous're done!
Solution
Here's i way to do it:
#Create a variable for the file name filename = 'Plates_output_simple.csv' ##Open the file infile = open ( filename , 'r' ) lines = infile . readlines () #We will process the lines of the file later #shut the input file infile . close () #Create the file we will write to filename = 'ControlPlatesData.txt' outfile = open up ( filename , 'w' ) outfile . write ( lines [ 0 ]) #This will write the header line of the file for line in lines [ i :]: #skip the showtime line, which is the header sline = line . split ( ',' ) # separates line into a listing of items. ',' tells it to split the lines at the commas condition = sline [ 2 ] #store the condition for the line equally a string if condition == "control" : outfile . write ( line ) #The variable line is already formatted correctly! outfile . close () #Close the file when we're done!
Claiming Problem
Open and read in the data from Plates_output_simple.csv. Write a new csv-formatted file that contains but the rows for the command condition and includes only the columns for Time, colonyCount, avgColonySize, and percentColonyArea. Hint: yous tin use the .join() role to join a listing of items into a string.
names = [ 'Erin' , 'Mark' , 'Tessa' ] nameString = ', ' . join ( names ) #the ', ' tells Python to join the listing with each item separated by a comma + space print ( nameString )'Erin, Marker, Tessa'
Solution
#Create a variable for the input file name filename = 'Plates_output_simple.csv' ##Open the file infile = open ( filename , 'r' ) lines = infile . readlines () #We will procedure the lines of the file later on #close the file infile . close () # Create the file we will write to filename = 'ControlPlatesData_Reduced.txt' outfile = open ( filename , 'w' ) #Write the header line headerList = lines [ 0 ] . divide ( ',' )[ 3 :] #This will return the list of column headers from 'time' on headerString = ',' . join ( headerList ) #join the items in the list with commas outfile . write ( headerString ) #There is already a newline at the finish, so no need to add one #Write the remaining lines for line in lines [ i :]: #skip the outset line, which is the header sline = line . split ( ',' ) # separates line into a list of items. ',' tells it to split the lines at the commas condition = sline [ two ] #shop the colony count for the line every bit an integer if condition == "control" : dataList = sline [ 3 :] dataString = ',' . join ( dataList ) outfile . write ( dataString ) #The variable line is already formatted correctly! outfile . shut () #Close the file when we're done!
Key Points
Opening and reading a file is a multistep process: Defining the filename, opening the file, and reading the data
Information stored in files tin can exist read in using a variety of commands
Writing data to a file requires attention to data types and formatting that isn't necessary with a
print()statement
Source: https://eldoyle.github.io/PythonIntro/08-ReadingandWritingTextFiles/
Post a Comment for "Read in Integers Spearated by New Line From Text File"