Ticket #459 (closed defect: fixed)

Opened 8 months ago

Last modified 7 months ago

java.lang.NullPointerException (Too many files opened)

Reported by: Maverick Owned by: Maverick
Priority: highest Component: Core
Version: Latest version from repository Severity: blocking
Keywords: Cc:

Description (last modified by Maverick) (diff)

The bots in Trench Wars go panick after a while as soon as a bot can't open the twcore-exceptions.log file. A NullPointerException is thrown with the vague statement (Too many files opened).

java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at twcore.core.command.CommandInterpreter.handleEvent(Unknown Source)
	at twcore.core.HubBot.handleEvent(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.handleChatMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateNormalPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.net.ReliablePacketHandler.handleReliableMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateSpecialPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.Session.run(Unknown Source)
Caused by: java.lang.NullPointerException
	at twcore.core.util.Tools.getRecursiveFileInDirectory(Unknown Source)
	at twcore.core.CoreData.getBotConfig(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.HubBot.handleSpawnMessage(Unknown Source)
	... 13 more
java.io.FileNotFoundException: /home/bots/twcore/twcore-exceptions.log (Too many open files)
	at java.io.FileOutputStream.openAppend(Native Method)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
	at java.io.FileWriter.<init>(FileWriter.java:78)
	at twcore.core.util.Tools.printStackTrace(Unknown Source)
	at twcore.core.command.CommandInterpreter.handleEvent(Unknown Source)
	at twcore.core.HubBot.handleEvent(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.handleChatMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateNormalPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.net.ReliablePacketHandler.handleReliableMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateSpecialPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.Session.run(Unknown Source)
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at twcore.core.command.CommandInterpreter.handleEvent(Unknown Source)
	at twcore.core.HubBot.handleEvent(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.handleChatMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateNormalPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.net.ReliablePacketHandler.handleReliableMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateSpecialPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.Session.run(Unknown Source)
Caused by: java.lang.NullPointerException
	at twcore.core.util.Tools.getRecursiveFileInDirectory(Unknown Source)
	at twcore.core.CoreData.getBotConfig(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.HubBot.handleSpawnMessage(Unknown Source)
	... 13 more
java.io.FileNotFoundException: /home/bots/twcore/twcore-exceptions.log (Too many open files)
	at java.io.FileOutputStream.openAppend(Native Method)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
	at java.io.FileWriter.<init>(FileWriter.java:78)
	at twcore.core.util.Tools.printStackTrace(Unknown Source)
	at twcore.core.command.CommandInterpreter.handleEvent(Unknown Source)
	at twcore.core.HubBot.handleEvent(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.handleChatMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateNormalPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.net.ReliablePacketHandler.handleReliableMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateSpecialPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.Session.run(Unknown Source)
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at twcore.core.command.CommandInterpreter.handleEvent(Unknown Source)
	at twcore.core.HubBot.handleEvent(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.handleChatMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateNormalPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.net.ReliablePacketHandler.handleReliableMessage(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateSpecialPacket(Unknown Source)
	at twcore.core.net.GamePacketInterpreter.translateGamePacket(Unknown Source)
	at twcore.core.Session.run(Unknown Source)
Caused by: java.lang.NullPointerException
	at twcore.core.util.Tools.getRecursiveFileInDirectory(Unknown Source)
	at twcore.core.CoreData.getBotConfig(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.BotQueue.spawnBot(Unknown Source)
	at twcore.core.HubBot.handleSpawnMessage(Unknown Source)
	... 13 more

Attachments

fd.log (35.3 KB) - added by Maverick 8 months ago.
List of file descriptors being open.

Change History

Changed 8 months ago by Maverick

  • status changed from new to assigned

I googled this statement and found out that you can either increase the amount of file descriptors on the Operating System or check your code to make sure it closes all file handlers when you don't need it anymore.

TWCore should run without any necessary changes to the amount of file descriptors so I checked TWCore's code to make sure all file handlers are properly closed. A number of changes were made to make sure these file handlers are properly closed. See changeset [3510].

Changed 8 months ago by Maverick

Lets hope these changes solve the problems and make the bots more stable.

Changed 8 months ago by Maverick

Tip to check number of file handles per process:

  1. Get the Process ID of TWCore:
    ps ax | grep something-that-makes-your-process-unique
  1. Check the number of file handles using the process id:
    ls -l /proc/processpid/fd

Changed 8 months ago by Maverick

  • description modified (diff)

Changed 8 months ago by Maverick

List of file descriptors being open.

Changed 8 months ago by Maverick

Yesterday I removed the -server flag on starting the TWCore which has probably helped the problem. The stacktraces aren't shown on the past restarts anymore.

The output of ls -l is worrying though (it's attached). It shows alot of sockets being open while there are only 41 bots running.

I'm going to update the bots and re-add the -server flag then check on the bots tomorrow.

Changed 8 months ago by Maverick

fd.log shows 462 sockets being open :/

Changed 8 months ago by Maverick

Small update:
-server flag removed, still problems with "(Too many opened files)".

Conclusion:
it's not the -server flag, the fixes I did didn't fix the problem.

Changed 8 months ago by Maverick

I'm going to spread out the bots over multiple cores. This will hopefully solve some problems as the load will be spread amongst multiple processes.

Draft setup:

  • TWCore
    HubBot hosting only utility bots
  • TWCore-League
    HubBot hosting only MatchBots, TWLBots, TournyBot, DuelBot, TWRC
  • TWCore-Events
    HubBot hosting only event-related bots

Changed 7 months ago by Maverick

Final setup

TWCore - hosts all utility bots and bots not on other cores

  • Banner Boy [bannerboy]
  • GammaBot4 [bouncerbot]
  • TWDevChat1 / TWDevChat2 [chatrelaybot]
  • EventBot [eventbot] (disabled)
  • ForumBot [forumbot]
  • MessageBot [messagebot]
  • PubBot [pubbot]
  • PubHub [pubhub]
  • RadioBot [radiobot]
  • RoboHelp [robohelp]
  • StaffBot [staffbot]
  • TWDOpStats [twdopstats]
  • ZonerBot [zonerbot]

TWCore-Events - hosts all event-related bots

  • BaseBot / JavDuelBot / WBDuelBot / SpidDuelBot [bwjsbot]
  • TWBot1 / TWBot2 [elim]
  • JavElimBot [javelim]
  • RoboBot* [multibot]
  • GammaBot2 [octabot]
  • RoboGirl / RoboBoy [purepubbot]

TWCore-League - hosts all bots related to TWD / TWL and other leagues

  • DuelBot [duelbot]
  • MatchBot* [matchbot]
  • RaceBot [racebot]
  • Tournybot [tournybot]
  • TWDBot [twdbot]
  • TWLBot* [twl]
  • TWRC-Bot [twrcbot]

Changed 7 months ago by Maverick

First results are in:

  • TWCore
    Very stable core now. Uptime is 100% since the last reboot. Some slow database queries here and there but it's not resulting in bots going down.
  • TWCore-Events
    The core is fairly stable (apart from the !remove and !removetype commands which might crash the core). Unfortunately the Elimination bots keep "crashing" when ending an elimination game with high number of participants. This is most likely caused by the saving of statistics to the database at the end of the game. Fortunately, this is fixable and I'll investigate into this as soon as possible.
  • TWCore-League
    The core is far from stable, it's crashing lots of times. Causes include slow database queries and faulty file locking
    ("BotSettings configuration file (TWDD.txt) not found: /home/bots/twcore-league/bin/data/Rules/TWDD.txt (Too many open files)).
    Matchbot needs to be checked for bad file locking (reading files without releasing the file when done reading it) and other failures of releasing resources.

Changed 7 months ago by Maverick

  • status changed from assigned to closed
  • resolution set to fixed

The error message "Too many files opened" was most likely caused by the connection leaking problem as connections are also part of Linux's file descriptors.
This connection leaking has been fixed in changesets [3541] (and [3543]).

Hopefully most instability problems are a thing of the past now.

Add/Change #459 (java.lang.NullPointerException (Too many files opened))

Author


E-mail address and user name can be saved in the Preferences.


Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.