Server Busy - Times-outs and simply overwhelmed ...

Discussion in 'Website news & discussions' started by xcel, Jun 4, 2008.

  1. dwschoon

    dwschoon Active Member

    Do you have to pay the hosting company for the hardware? Any intel processor based on the "core" technology will mop the floor with any of those p4 class cpus. If you can build the system yourself, you can get much greater value for your money. If you could get a system with 1 current dual core xeon, as opposed to 2 p4 class xeons, you would be better off, and you would have the option of adding a second processor later if needed. I have been building quad core systems here at work for $500 or less, minus monitor. I realize that a server needs higher end components, but you shouldnt be relegated to 5 year old technology. I would also suggest that you get sata or scsi and not ide. Also, if you can get a raid 5 setup, you can get redundancy with better performance than raid 1. You just need to add another drive. I know our biggest bottleneck here at work seems to be hard drive speed.
     
  2. diamondlarry

    diamondlarry Super MPG Man/god :D

    It's one of those good problems to have. As I was telling Sean last night, at least we're getting too much traffic and not too little. Now that the new job seems to be working out, I may consider an ongoing monthly donation based on a percentage of my savings.
     
  3. jab

    jab TreehuggingDirtWorshipper

    I wondered what happened this morning! Great that so many people are checking in. :woot:
     
  4. Ophbalance

    Ophbalance Administrator Staff Member

    Is there anyway to separate the DB from the front end? You may be able to get away with a smaller dedicated DB box this way and you won't be pulling double duty on the server. Most enterprise's go with this type of setup.
     
  5. hobbit

    hobbit He who posts articles

    I would like to see evidence that someone's done some actual analysis
    on where the bottleneck is, before blindly throwing RAM and gigahertz
    [=money] at the problem. Is it the PHP frontend? Is it the backend
    database? How do the two talk to each other? Has someone looked
    into some of the throttling functionality that Apache can do? What
    else is the server box itself doing along with serving CleanMPG? Has
    someone gone through and ripped out all the junk server processes
    that don't need to be running? What is the load average on the
    machine at the times of "too busy", and the top processes? How
    much effort has gone into sysctl tuning on the network stack?
    .
    How about cutting down on the amount of fluff that gets sent with
    every requested page? Asking for /forums/ hands me 120K of stuff,
    almost the entire first half of which is unnecessary. For example,
    you don't have to squirt out ALL the paypal forms and private keys
    with every page, just have a simple "donations" link that takes
    people over to a different page that presents all that. Cut down
    on the amount of javascript muck that gets shoved down everybody's
    throat. Trim up the CSS if possible. Take out all the little
    language option .GIFs, each of which causes another HTTP hit.
    Lose the google-analytics stuff. If there's a way to skip generating
    a session ID for not-logged-in people, that might help too.
    .
    Do you guys have shell and root access to the box? If not, and
    if the people at 800hosting can't answer a good chunk of this,
    clearly some work needs to be done instead of buying faster boxes.
    I think a few hard questions could save some dough here. There,
    I just contributed about twenty bucks worth of free consulting.
    .
    _H*
     
  6. xcel

    xcel PZEV, there's nothing like it :) Staff Member

    Hi Hobbit:

    ___CleanMPG's simple P4 was hit with requests for 10 to 20 times the capacity of the box as it is configured. We cannot solve that kind of hit with a simple upgrade but we are coming up against the stops on a day to day basis as we put out data now. Sean was turning off a lot of features yesterday to make sure we survived but it was too late through the afternoon because of the traffic, we could not even get into the Root with our own login and password until after ~ 07:00 PM!

    ___I don't know much about the rest unfortunately :(

    ___Good Luck

    ___Wayne
     
  7. lyekka

    lyekka Well-Known Member

    I am looking into monthly donations as X amount of money per % over combined EPA for my vehicle.
     
  8. Ophbalance

    Ophbalance Administrator Staff Member

    You know, those prices quoted really aren't too shabby. I'd still recommend a separate DB server. Have you looked into co-location at all? It can be cheaper, but offers less 24/7 support. But there's enough geeks here that this may not be an issue ;).
     
  9. xcel

    xcel PZEV, there's nothing like it :) Staff Member

    Hi Lyekka:

    ___I think that might be a bit much because in time, it would make CleanMPG as well off as Microsoft, Walmart or Ikea :D :D :D

    ___Ophbalance, I have received some really good advice the past two days and what you said was one of them. It is not so much the HW but the bandwidth that makes this a pretty good deal overall. I am just not used to giving somebody upwards of $250.00 a month for a PC no matter how super whamodyne it is and from the specs, this is not that super whamodyne of a Server either. I suspect the SuperMicro was probably built a few years ago, the occupant moved on to something bigger and better and we are the second tenants taking over the ok looking place for an ok price but in a really nice neighborhood :rolleyes:

    ___Good Luck

    ___Wayne
     
    Last edited: Jun 5, 2008
  10. Dan

    Dan KiloTanked in post 153451

    Hats off to Wayne and Sean (and the tech with the stick of RAM) for weathering the storm. I know the feeling when your on the receiving end of unresponsive tech. I think anyone who was turned away will be right back as soon as they buy their next tank of gas.

    As an attempt to hypermile my browser, I've set my default style to VB3-Lite, and turned of images from loading from CleanMPG. I also spend a good deal of time on the Archive reading new posts since its much lighter on the database: http://www.cleanmpg.com/forums/archive/index.php/

    Even when the site goes "down", the archive is usually loadable.

    Best of luck and let me know if you want me to try to pull some strings for some server HW ;).

    11011011
     
  11. Dan

    Dan KiloTanked in post 153451

    Sorry guys, missed the posts past the first page so let me jump in here again.

    DASD (disk)
    -SAS is better that SATA, SATA is better than IDE. better=faster.

    -Raid 6 is best if the storage card can do it. With Raid 6 you need at least 4 drives. You could loose any two drives (seperately) without loosing data. Raid 6 is Raid 5 with a hot spare. With Raid 6 two drives are dedicated to overhead (redundancy), in Raid 5 only one drive is dedicated to overhead (redundancy).

    -Raid 1 is ok, but half of your drives are dedicated to redudnancy and you don't get a real porformance boost like you do from raid 5.

    -Raid 0 is suicide. One bad sector and the database could be hosed to the point that you have to retrieve from tape.

    Memory:
    -ECC is a must. As kooky as it sounds alpha partials flip bits in memory at a fairly regular basis. ECC handles this by catching and repairing the errors in the memory itself. The only risk is a double bit error (two alpha particles hit the same cache line). Not very common, but you'll likely get an NMI and kernel panic if it does (server bounces). Depending on the linux distro, your database may be shielded from corrupting application space (database) on detected multibits.

    -NonECC is suicide. Any singlebit error is undetectable. If that single bit happens on an execution path and your JMP 0x0A turns into JMP 0x1A your best case is a kernel panic, worse case is bad data makes it to a commit on the DB. Hopefully DB checksums would catch it, but the list is endless.

    Most databases are memory hungry. Typical database design is to gobble up as much memory as you can, then set up your own caching scheme to keep of disk at all costs. If your under 3 gigs execution width doesn't really matter, Above 3 gigs and you might want to look at a 64bit DB distro. It simplifies much of the DBs memeory management since it doesn't have to window as much and can code against a flat model.

    CPU
    -Core/Thread count - Going with multi-core, multi thread packages is helpful with smart http deamons since they can cut traffic nicely amongst cores. Be warned that some of this is midigated by the NIC driver. Some high end NICs have fallen down on interrupt distribution across cores, so if you can watch that kind of trace data, look that CPU usage bounces evenly under heavy network load as opposed to bunching up on the BSP. Hobbit is correct that core count is rarely the bottleneck though.

    Database
    If you can run any reports on dropped queries, that may be informative. My pet theory is that the DB fell over since each request to www.cleanmpg.com/ triggers a flurry of DB querys (to paint the "reply count" on all the home page articles. One interesting point is that even when the site was totally hammered the archive worked just fine. My pet theory on that is that the archive works of a small (relitively) data cache. If the DB drops a query, it just falls back to it's cache. It doesn't "require" live data to function.

    As far as the "whole enchilada" goes, I'd say focus less on reducing web traffic (although thats a good thing) and more on reducing DB traffic. Point all the spiders to the archive instead of the live pages (many vBulletin sites seem to do this). Possibly simplify the homepage so it can be loaded with minimal DB traffic (no reply count or unread count in articles).

    I'm afraid I'm pretty useless in application space, and technically I'm not supposed to speak linux, so your on your own, but if you can spin up a WS2008 box, and can give you a pretty good stress analysis.

    11011011
     
    Last edited: Jun 5, 2008
  12. pdk

    pdk Beacon of Sanity

    <puts on sysadmin hat>

    I'm going to have to disagree with you on a few points here, Dan. RAIDs 5 and 6 may be better for reads and reliability, but they are terrible for writes, especially small writes (due to parity calculations). And that's exactly the workload that appears for databases.

    I'm on some database mailing lists, and almost every time someone brings up RAID 5 the phrase "mortgaging your future" comes up. It gives a good amount of space and decent reliability, but the performance hits just aren't worth it.

    Generally, for database servers, I'd recommend RAID 10 with a hotspare. It provides near RAID 0 read performance, not much worse write performance, and extremely good reliability. The only issue is that is costs more per GB than RAID 5 since you almost need twice as many disks for the same usable space.

    In any case, if you want any sort of performance, hardware RAID is a must, and RAID 0 is indeed suicide.

    Remember most of all that RAID is not backups. It's redundancy, it's failover, but it doesn't help against the case where you do an accidental "rm -rf /" as root (note: don't do this, it erases everything on your machine). You will also need a data backup somehow, somewhere.

    Memory is a big plus, and the more the bettter, to a point of course. Note that you don't necessarily need a 64-bit OS to use >4 GB of memory, as several Linux distros offer a PAE or hugemem kernel to allow you to use >4 GB of RAM in your system. I'm not sure about commercial OSs, though, but i'd be surprised if they didn't.

    I agree wholeheartedly that ECC is always a good idea for important databases for all the reasons you pointed out.

    It's been my experiences that rarely are database and web servers CPU-bound. They're much more likely to be I/O-bound, so you'd want to invest in memory, fast disks, and fast network (Gigabit network is probably a must, but I really couldn't tell without seeing some hard data). More than likely, your processor was undergoing a lot of I/O wait during the timeout period yesterday.

    You'd probably need at least a dual-core processor, maybe quad-core if you need it depending on resource usage stats, but that might be serious overkill.

    I agree. My third rule of designing a database webapp is to minimize the number of queries, and to never have a linear number of queries on a page load (the first and second are "always check your inputs"). There's a lot of extra overhead with issuing queries, and the fewer you do, the better.

    I'd hope that vBulletin would be smart enough to grab all the data for a forum page in just a few one or two queries (including user info and post count) instead of a few queries per post, but I could be wrong.

    I, however, am not forbidden to speak Linux, and will do so freely, and I have experience in web applications and database management (PostgreSQL, though, not MySQL). I can offer some services if you'd like, just don't expect this to be anything more than a few hours a week sort of thing.

    And Dan, why aren't you supposed to speak Linux? It's a wonderful thing to speak. :D

    </sysadmin hat>
     
  13. Dan

    Dan KiloTanked in post 153451

    Too lazy to quote it all.... so PDK...

    Raid 0+1 w/hot-spare... I agree, best of both worlds. The Raid 6 configs I usually kick around are battery backed so write through are non-blocking to a degree. Just wanted to make sure that Raid 0 option wasn't used.

    Linux PAE... Frankly I've never really trusted PAE to provide the same performance as x64 native addressing. Outside my comfort zone, but WS originally split the x32 bit model into two halves. Apps could only gobble 2 GiB. PAE and some creative boot.ini switches fix it, but the solutions always seemed kludgey to me. I much prefer memory and page management happening in the kernel than app space.

    Memory... Yeap, fill them banks! Just look at some of the eccentric stuff too, fully buffered / buss speed / interleaving / NUMA. It all adds up.

    Linux... I keep thinking of that scene with Darth Vader and Luke on Endor... "It's too late for me..." I turned from the Linux side of the force long ago. I may know your mind a bit on this regard so I hazard to bring that debate to the forums (kinda more religion than mpg) ;). PM me and I can fill you in on why I turned from the Penguin long ago.

    11011011
     
  14. Right Lane Cruiser

    Right Lane Cruiser Penguin of Notagascar

    Interesting dialog, guys.

    All I can tell you is that I watched the load climb from about 4 to 253. Seriously, watching top I couldn't get it to recognize keystrokes after it hit 120 or so. Talk about watching helplessly!! :eek:
     
  15. laurieaw

    laurieaw Sorceress of the North

    i tried to read those last two posts and my eyes are glazing over. i am glad somebody understands that stuff.
     
  16. PaleMelanesian

    PaleMelanesian Beat the System Staff Member

    To clarify, it's the Core2 series that are the good ones. The Core processors were a stop-gap, based on the old *mumble-mumble* technology that was, quite simply, inferior. The new Core2 series are the real deal, though.

    How that compares to Xeon, I'm not sure.

    Any thought about optimizing the queries? I don't know how vB is built, but if it's doing a count(*) from ALL the posts (100,000+) to get a post count for this thread, it'd be better to store that in a separate table and update it as needed. Disk space is cheap, and as we saw, performance is not. Selecting one line is MUCH faster than a full-blown count.
     
  17. MikeN

    MikeN Well-Known Member

    Many good points... I had suggested to Wayne on the phone yesterday to get pricing on a SAS setup instead of sticking w/ SATA. Obviouslly, RAID 10 is great, but don't think he can justify the price right now (all relative right?). ECC is a given, although I didn't specifically talk about that with him (assumed in my mind).
     
  18. xcel

    xcel PZEV, there's nothing like it :) Staff Member

    Hi All:

    ___If CleanMPG ever really does become a hit on a day to day basis vs. the one hit wonder that we are currently, the super server’s, larger pipes and more bandwidth allowance will be a reality but today, I am thankful for the contributions that have come in to help pay for the upgrade. As of now, we have enough to cover the upgrade for a touch over 3 months which saves CleanMPG’s neck for another day.

    ___Thank you all for the help educating our fellow citizens who have not yet figured out what they can do to minimize the impact of high fuel costs and the problems arising from our addiction to the almighty BBl of oil.

    ___We will hopefully have the machine and the plan complete today althoguh nothing is quit finalized until the work of transferring begins. Wish Sean luck with the help of Brian Morris (BailOut) on the back end.

    ___Good Luck and Thank You to those that have contributed to date!

    ___Wayne
     
  19. xcel

    xcel PZEV, there's nothing like it :) Staff Member

    Hi All:

    ___Here we go :rolleyes: The new server is ordered, and should be up and running w/ the base OS install later this afternoon. Afterwards, the real work begins with the install and configuration of apps and utilities. Sometime later this week will come the “cross your fingers and hope not to die” multiple large DBase and data transfers. Functional testing and trouble shooting after that and hopefully we will have it online possibly by the weekend? My hands are shaking, my fingers are crossed and I have my hopes up that between Sean, Brian and Dan, CleanMPG will experience a minimum of downtime if any (possibly an early morning - hour long window on Saturday or Sunday morning in the 02:00 AM time frame) and all is well by 03:00 AM. Of course better laid plans have been known to go awry and we stay our present course until this thing is working …

    ___Anyone that remembers the show The Six Million Dollar Man probably remembers the opening lead in ...

    “We have the technology … Better than he was before. Better, stronger, faster.” :D

    ___Wish the site luck as it is going to need it.

    ___Good Luck

    ___Wayne
     
    Last edited: Jun 5, 2008
  20. EdGe7

    EdGe7 Member

    Wayne,

    just a suggestion, but you should add a permanent news headline to the homepage in regards to this news. It will help inform the new visitors so they stay informed and so we know what to expect come this weekend.

    Thanks
     

Share This Page